Inconsistent issues with Netlify Function in production

I’m hitting issues where for some deploys, our Netlify Function works in production (i.e. the /.netlify/functions/crowdin endpoint returns a 200) but for other deploys, it doesn’t (the endpoint returns a 404).

Our project uses VuePress. Here’s the function:
https://github.com/ethereum/ethereum-org-website/blob/dev/lambda/crowdin.js
Here’s the netlify.toml:

Running netlify dev locally & accessing the /languages page, the function endpoint (http://localhost:8888/.netlify/functions/crowdin) works as expected. It also was working in production. One recent production deploy, the endpoint (https://ethereum.org/.netlify/functions/crowdin) started 404ing. I tried the “Trigger deploy” --> “Clear cache and deploy site” as an experiment & it started working again. Now after each deploy it seems to be a roll of the dice.

Logs from a recent deploy. It seems to consistently build the endpoint:

1:21:02 PM: Executing user command: yarn build
1:21:02 PM: yarn run v1.13.0
1:21:02 PM: $ run-p build:**
1:21:03 PM: $ vuepress build docs
1:21:03 PM: $ netlify-lambda build lambda
1:21:04 PM: netlify-lambda: Building functions
1:21:04 PM: wait Extracting site metadata...
1:21:05 PM: tip Apply local theme at /opt/build/repo/docs/.vuepress/theme...
1:21:05 PM: tip Apply theme local ...
1:21:05 PM: tip Apply plugin container (i.e. "vuepress-plugin-container") ...
1:21:05 PM: tip Apply plugin @vuepress/register-components (i.e. "@vuepress/plugin-register-components") ...
1:21:05 PM: tip Apply plugin vuepress-plugin-anonymous-67b88ead ...
1:21:05 PM: tip Apply plugin @vuepress/active-header-links (i.e. "@vuepress/plugin-active-header-links") ...
1:21:05 PM: tip Apply plugin @vuepress/last-updated (i.e. "@vuepress/plugin-last-updated") ...
1:21:05 PM: tip Apply plugin sitemap (i.e. "vuepress-plugin-sitemap") ...
1:21:05 PM: Hash: 6de00007fa8699e7255c
1:21:05 PM: Version: webpack 4.41.2
1:21:05 PM: Time: 811ms
1:21:05 PM: Built at: 11/08/2019 9:21:05 PM
1:21:05 PM:      Asset      Size  Chunks             Chunk Names
1:21:05 PM: crowdin.js  32.2 KiB       0  [emitted]  crowdin
1:21:05 PM: Entrypoint crowdin = crowdin.js
1:21:05 PM:  [0] ../node_modules/axios/lib/utils.js 8.28 KiB {0} [built]
1:21:05 PM:  [1] ../node_modules/axios/lib/helpers/buildURL.js 1.63 KiB {0} [built]
1:21:05 PM:  [3] ../node_modules/axios/lib/helpers/bind.js 256 bytes {0} [built]
1:21:05 PM:  [4] ../node_modules/axios/lib/cancel/isCancel.js 102 bytes {0} [built]
1:21:05 PM:  [5] ../node_modules/axios/lib/defaults.js 2.55 KiB {0} [built]
1:21:05 PM: [13] ../node_modules/axios/lib/core/mergeConfig.js 1.69 KiB {0} [built]
1:21:05 PM: [14] ../node_modules/axios/lib/cancel/Cancel.js 385 bytes {0} [built]
1:21:05 PM: [15] ./crowdin.js 770 bytes {0} [built]

But at the end of the logs, the output differs. Here’s logs of a successful deploy:

8:59:32 AM: Function Dir: /opt/build/repo/docs/.vuepress/dist/lambda
8:59:32 AM: TempDir: /tmp/zisi-5dc44cc07c85c804ebc15d77
8:59:33 AM: Prepping functions with zip-it-and-ship-it 0.3.1
8:59:34 AM: [ { path: '/tmp/zisi-5dc44cc07c85c804ebc15d77/crowdin.zip',
8:59:34 AM:     runtime: 'js' } ]
8:59:34 AM: Prepping functions complete

Equivalent logs of an unsuccessful deploy (triggering another deploy on same exact branch & commit history):

1:22:06 PM: Function Dir: /opt/build/repo/docs/.vuepress/dist/lambda
1:22:06 PM: Skipping functions preparation step: /opt/build/repo/docs/.vuepress/dist/lambda not found

Any idea what’s causing this issue? Thanks.

Hi @samajammin and sorry to be slow to get back to you. Seems like we might have had a bug in serving that function. I can’t be sure from our internal logs but I see some 404’s back when you wrote in, but none since soon after that.

Have you seen that happen again since? In the future if you see this, we’d likely have to help resolve it from our side BUT there is a quick workaround for you in that you could rename the function and redeploy (after updating your code references to point to the new function) and it would likely avoid the issue with the new name.

1 Like

Still having occasional issues with this. I just tried renaming the function & the issue persists. Any idea @fool?

Here’s the full logs from the production deploy after I changed the function name:

Just triggered a “Clear cache and deploy site” & now the Function is working again… full logs:

This seems to be consistently failing on deploys now until I trigger a “Clear cache and deploy site”. @fool have any other users notices caching issues around Netlify functions?

Not specifically, no. The failure mode we’ve seen is around our CDN caching software caching pointers to old functions, rather than anything to do with how you build. Clearing cache during build will have no effects on function runtime, unless the function itself is built differently with and without cache. Can you try outputting a checksum of the function file as the last step during your build, for a build with cache and one without, to see if they differ?

If they do, we’ll suggest you debug on the build side; if they don’t, then it might be a bug on our side.

Figured out the issue - it was a bone-headed setup on my end :sweat_smile:, sorry for taking up your time.

My build steps were running in parallel & outputting to the same /dist directory, so occasionally my app build was overwriting my lambda build output. The solution was to run them sequentially:

1 Like

@futuregerald @fool I am experiencing inconsistent uptime from Netlify functions. For context, I have a browser extension that hits my Netlify functions endpoint. If I stop testing it for awhile, it will start working again. Running Netlify functions locally work well, so that has deduced the error to uptime on Netlify. Please let me know what you need / how I can resolve this.
Thanks!

hey @jsl15c - this might be related/similar, it might not. I would suggest you create a new post with more information for us - such as, your netlify name, what you have already tried, some code code we can look at, etc etc. You can include a link that references this thread if you want.

Sounds like you are maybe hitting a timeout of some kind - hard to say with the information provided.