Functions CDN too slow to update to new commit

vibrunazo · September 4, 2019, 4:43pm

I have updated my functions with a new commit. If I call the function endpoint from a US proxy then the function is fine and updated. But if I call it without a proxy using my home IP (from Brazil), then I get an old version of function.

I understand sometimes CDNs can take some time to propagate all over the globe. But it has been over 4 hours since I last committed those changes and my functions were still not updated!

Is Netlify CDN always this slow? Is there something I can do from my end to force the CDNs to update faster?

fool · September 5, 2019, 1:03am

Hmm, we intend to invalidate the cache immediately at deploy time. It takes about 2 seconds on average to purge the cache, so there’s likely either a bug or a misconfiguration.

potential misconfiguration: Functions are proxied internally - does your function set a Cache-Control header? If so, we will respect it and cache the answer for however long you set it. That cache is PER CDN NODE so you could see inconsistent results based on what happened on the node you’re talking to (which could change constantly) in the past
potential bug: we may be caching the wrong content. I don’t expect this to magically go away without our help. What’s the function path so we can do some testing? Always useful when you submit a problem with stale content is also the value of the HTTP response header called x-nf-request-id, so we can zero in on your precise problematic request in our logs and we may be able to tell what happened even in the past.

Thanks for your help in troubleshooting!

vibrunazo · September 5, 2019, 1:42am

It’s just the Hello World example. It has no additional headers or anything special. The most straight-forward of the problematic ones is this:

https://gengarbobo.ca/.netlify/functions/testfunc3

When I call that URL from an US proxy, I get the correct response from the latest commit “Hello, World 4”

But when I call it from my home IP on Brazil with no proxies, then I get an old response from several hours and commits ago: “Hello, World”

Dennis · September 6, 2019, 3:39pm

Can you provide a link to the exact deploys that is showing this issue? I checked your site and I’m seeing the following issue: 2:47:03 PM: Skipping functions preparation step: /opt/build/repo/lambda not found
Which means our buildbot is not finding your functions and deploying the newer versions.

vibrunazo · September 6, 2019, 5:22pm

I’m sorry, I already gave up on Netlify and moved to Firebase because of this problem. So the link I provided before shouldn’t be working and the latest commit won’t work on Netlify because I already removed all the Netlify functions from the repo.

If you’re still trying to debug what went wrong. The last commit I used in Netlify before moving to Firebase was this one:

The log says the functions built fine and should be live.

The endpoint I was having problems with can be accessed here:
https://gengarbobo.netlify.com/.netlify/functions/testfunc3

But as of today (September 6th, 2 days after I made the post) it looks like the endpoint has finally updated. I just called it again and I’m finally getting the correct response from that last commit. It’s a shame that Netlify took longer to update my functions than it took me to look up for alternatives, learn firebase from scratch, move everything there and get what I was trying to do on Netlify working perfectly on Firebase ^^

fool · September 6, 2019, 10:19pm

I think we found the source of the problem today. Our proxy is how we connect functions running in AWS to your namespace; this is the same functionality that is described in our docs here: Redirects and rewrites | Netlify Docs

Today we found a good way to reproduce the problem for another customer, and determined that it was a bug in that proxying configuration that essentially cached the pointer to which AWS function to run (when you deploy a new version, the old one is not removed), which we are working on a fix for.

Sorry we didn’t figure that out soon enough for you to use our service, but thank you very much for the report and we’ll follow up here when we fix the bug.

abejith · April 11, 2020, 8:11pm

@fool Has the issue been fixed? Am facing it now. Stumbled on to this post while searching a solution for this. Please help.

perry · April 13, 2020, 5:33pm

Hey @abejith - thanks for checking in. We are really hoping for a fix for this, as we know it is really quite frustrating to deal with. I promise we’ll post here asap when there is some news.

martinratinaud · April 14, 2020, 11:27am

Same here

this leads to a non working website during several hours, that obliges us to deply the lambda functions first. Wait for 6 to sometimes 12 hours and then release the UI

This is very problematic.

Thanks

perry · April 14, 2020, 4:24pm

We will absolutely update you as soon as we have something to report on a fix for this!

martinratinaud · April 16, 2020, 5:55am

Hi @perry.
In no way I mean to rush you but this is a very bigger problem and I hop it’s in your top priority

I use Graphql on my lambda functions and everytime I deploy a new feature on my website, I had a UI component that access a new graphql object. As it often ends up on an old version of the API, the whole api crashes and thus my whole app crashes

This is really hurting my business and I’m not confident anymore in deploying lambda functions.

Do you have a workaround I can use now ?

Thanks for your quick answer and your great work

fool · April 17, 2020, 5:08pm

Hey Martin and sorry about the impact of this. If you’re already in this boat, time might cure things but that isn’t a sure thing or a great plan. Our operations team has to help us clear the stale cache for a more certain, and confirmable (by us - we have ways to check every CDN node programatically to verify which function is being served).

To avoid the problem entirely, you could try this workflow, which I realize is not great, but it is available today: If you change your function code, change the name of it. Our caching is by path, so changing the name with any code change would work around it. Like I said, I know this is pretty suboptimal, but it is possible today as a self-serve workaround.

If you’re having problems and report them with a URL that is wrong, that is the information we need to initiate a fix from our side. We will try to get the team to fix it, but our turnaround on doing that will not always be great for non-enterprise customers since we aren’t able to respond incredibly quickly to our free and Pro customers.

Longer term, we hope to have a rearchitected proxying service launched in the next couple of weeks, which will be the fix for this problem. It is in live testing right now, so this isn’t “planning to”, but “almost there”.

martinratinaud · April 17, 2020, 5:28pm

Hi @fool
Thanks for the workaround but as you mentionned, this is really not optimal.

I’m just unsure to understand how people (like me) can use netlify anymore with this kind of problem.
I really don’t mean to be rude (and I’m not paying much every month for your service so…) but Netlify with lambda function is now for me a solution that breaks my site every time I deploy a new version of my api.

So I guess I could do changes to my api, deploy them, wait for 2 days (or more), and hen deploy my UI but that is not production proof at all, nor productive.

Do you have a clear roadmap on the resolution of this bug? So that users can know if they should update their workfow?

I also am very surprised not more people are raising up this problem as I do, and I can’t stop thinking I’m doing something wrong.

Thanks again

fool · April 17, 2020, 5:53pm

Valid feedback, Martin, and thanks for making the time to share it. Most folks don’t change their functions very often, so most people are not affected by the bug, though of course that doesn’t help your use case!

I can assure you that our team looked very hard at this when it started occurring - spent several person days debugging and understanding the issue. The conclusion was that he weeks of work required to fix, was judged better to put towards the new service, since a fix OR that launch were judged to be about equal effort / time-to-ship, and fixing the bug would have delayed the launch of the service that was the better fix.

I’m no apologist - this is a bug and it clearly has terrible effects on your business - so as a Support Engineer I’d have to advise you not to use our functions product if it isn’t working for you. It is not my goal to keep you on Netlify at all costs; it is my goal for you to have a workflow you can use today.

I cannot promise a firm timeline on shipping the fix. This is what I can tell you:

we intend to deploy it in the near future; as I mentioned it is handling a very small percentage of production traffic already today
but testing will likely uncover problems that were unanticipated, which mean any promise I could make would not be a true one.

So - I can only promise, as we already did, to let you know here when the fix ships.

loongyh · April 21, 2020, 11:50pm

For those using Gatsby.js, here’s a “functions cache busting” implementation which generates and appends a different content hash to your function name each time the function changes, which you can then query in your pages with useStaticQuery:

// In your gatsby-node.js

const fs = require(`fs`)

exports.sourceNodes = async ({ actions, createNodeId, createContentDigest }) => {
  const { createNode } = actions
  const functions = {}
  fs.existsSync('functions') || fs.mkdirSync('functions')
  fs.readdirSync('src/functions').forEach(filename => {
    const basename = filename.split('.').slice(0, -1).join('.')
    const digest = createContentDigest(fs.readFileSync(`src/functions/${filename}`)).slice(0, 7)
    fs.copyFileSync(`src/functions/${filename}`, `functions/${basename}-${digest}.js`)
    functions[basename] = `${process.env.URL}/.netlify/functions/${basename}-${digest}`
  })
  createNode({
    ...functions,
    id: createNodeId(`functions`),
    parent: null,
    children: [],
    internal: {
      type: `Functions`,
      contentDigest: createContentDigest(functions)
    }
  })
}

On build time, a GraphQL functions node appears, which consists of the name of each function as keys and the function endpoint as values, e.g. https://example.com/.netlify/functions/myfunction-7e23bfc. You can utilise it in your page/components as follows:

// In any component/page.js

import React from "react"
import { graphql, useStaticQuery } from "gatsby"

const MyPage = () => {
  const { myfunction, myfunction2 } = useStaticQuery(
    graphql`
      query {
        functions {
          myfunction
          myfunction2
        }
      }
    `
  ).functions

async function queryEndpoint() {
  const response = await fetch(myfunction)
  const response2 = await fetch(myfunction2)
  console.log(response)
  console.log(response2)
}

export default MyPage

Put your Netlify Functions .js files in src/functions and see them automatically appear in functions with an appended hash suffix on build time. Remember to update your netlify.toml to use the latter directory:

// In your netlify.toml
[build]
  functions = "functions/"

perry · April 27, 2020, 5:22pm

Hey @loongyh! thanks for writing that up and sharing it. We are actively working on removing the underlying cause for this issue, and hope to have an update we can share soon. For now, your solution should definitely work for the bug as we understand it!

loongyh · April 28, 2020, 12:10pm

This workaround perhaps can only be applied if it’s only going to be used for a single frontend application that’s going to be deployed along with the functions, i.e. static site generators. If it’s meant to be a public API to be used by separate applications then the URL needs to be static.

Dennis · May 4, 2020, 9:28am

Correct. It’s a workaround until we can deploy a fix for the issue. You could probably use this workaround in conjunction with a _redirects file that you generate to have the new function names to have consistent API endpoints. Your _redirects file could have something like the following:

/api/* /.netlify/functions/my-function-HASH 200

You’ll want to have a script that will replace that hash with the correct hash. Maybe that would work for you in the meantime?

loongyh · May 4, 2020, 4:35pm

Awesome idea to use _redirects to alias the dynamic endpoints. For everyone else, irregardless of implementation, the general solution in summary - write a script to generate a hash, append it to the end of your function name, and then if you need a static endpoint URL use the _redirects. Perhaps may be worth including in the functions documentation just for the time being before the fix is deployed.

tibotiber · May 5, 2020, 3:26am

Hi, I’m facing the same issue. This impacts my API on every deploy as I have a function proxying to my API service and performing a commit ref comparison to force clients to upgrade and ensure data integrity. This is a really difficult situation to deal with.

Nonetheless, I want to point out that I appreciate the transparency provided here on this issue, on what should be expected fix wise, and on the decision process leading to why it takes a while. Thanks @fool for providing such details.

I will try disabling the commit-ref check in dev, and adding a hash in the function name for prod deploys. Subscribing to updates on this.