Build system outage - 24 Apr 14:17- UTC "netlify-heartbeat"

answered
#1

Hi folks,

Our build system is experiencing an outage as mentioned on our status page: https://www.netlifystatus.com/incidents/gwy8mg31x66z

We’ll update that status page (which also updates our Status twitter @netlifystatus) as things develop.

This has no impact on serving files, only on creating new builds.

We’ll also follow up here with a retrospective post after we’ve repaired the outage and had time to review the root causes.

1 Like
Failed to upload heartbeat object: netlify-heartbeat
Deploy is currently broken (4-24-19 10:30 EST)
Netlify-heartbeat error
pinned #2
#3

Errors related to AWS:
Failed to upload heartbeat object: netlify-heartbeat: InvalidAccessKeyId: The AWS Access Key Id you provided does not exist in our records.

Checked Gitter and many others seeing the same issue.

#4

I just tried to push my project from git, it worked 5 min earlier. Made a text changed, saved and added to git then re-pushed and I’m getting this error now almost instantly when trying to build.

10:38:50 AM: Build ready to start

10:38:54 AM: Failing build: Failed to configure caching

10:38:54 AM: Failed to upload heartbeat object: netlify-heartbeat: InvalidAccessKeyId: The AWS Access Key Id you provided does not exist in our records.
status code: 403, request id: #######, host id:#########

#5

Briefly summarize the issues you have been experiencing.
I’m deploying a Nuxt website hosted on Github to Netlify. For some reason all builds fail with this message:

4:34:13 PM: Build ready to start
4:34:26 PM: Failing build: Failed to configure caching
4:34:26 PM: Failed to upload heartbeat object: netlify-heartbeat: InvalidAccessKeyId: The AWS Access Key Id you provided does not exist in our records.	status code: 403, request id: E34E855…, host id: Y01hidBnS8m7jW5Vu06+…

I’m not using AWS and could not find anything on Google for netlify-heartbeat.

Please provide a link to your live site hosted on Netlify
The deployed website should be here: https://scenario-finder.netlify.com/

What have you tried as far as troubleshooting goes? Do you have an idea what is causing the problem?
I set up a new deployment and also reversed the commit that started the error.

Do you have any other information that is relevant, such as links to docs, libraries, or other resources?
The source code on Github is here: https://github.com/SensesProject/facet-navigation

#7

Hey folks (cc @biggs @f2iin @sbuys),

I moved your posts into this thread. Our team is working to address the problem but the top post mentions our statuspage and twitter where you’ll get the latest breaking developments.

1 Like
#8

I got the same issue:

https://app.netlify.com/sites/huchen/deploys/5cc07a72e0bdd0b36a6ee5c0

11:02:10 PM: Waiting to build. Currently running 1 concurrent builds on your account
11:02:15 PM: Build ready to start
11:02:24 PM: Failing build: Failed to configure caching
11:02:24 PM: Failed to upload heartbeat object: netlify-heartbeat: InvalidAccessKeyId: The AWS Access Key Id you provided does not exist in our records. status code: 403, request id: xxx, host id: xx

#9

Hi all, we’re aware of the issue and definitely working on it as fast as we can - we know it is the same issue for many of you. More as soon as we have more information.

#10

We just got a temporary fix in place and the team is monitoring and working on a more long-term fix. We’ll update status page soon with an all clear, and post the promised learning review once we’ve had a chance to talk about it as a team tomorrow.

2 Likes
#11

Working now on my end - thanks!

1 Like
#12

Deploys started to work but have gone down again :frowning:

#13

same issue, or did you receive a different error message? Any other details?

We are not seeing reports of any more trouble, as far as we can tell, things are up and running again. Do keep in mind that builds might be slow as the system recovers, so delays are possible still.

#14

I get this message when i try to deploy

#15

Hmm, how annoying, but it looks like it might be unrelated.

Please start a new thread in #topics:deploying-building, unfold the whole log (the right arrow, click that and you should see more information) and copy in your error message and we will try to troubleshoot, thanks.

unpinned #16
closed #17
opened #18
#19

Yup, as Perry mentioned, that is definitely unrelated and we’d need things like a link to your deploy in our admin UI rather than a very small screenshot like that to be able to advise.

#20

Hi folks! Thanks for your patience while my team reviewed the outage yesterday and determined the root cause, proximate cause, and next steps.

The root cause of the outage was some outdated code that attempted to fall back to a no-longer-configured fallback build cache storage system, and the proximate cause was a burst of failures to handle build caches correctly at our primary build provider which triggered that fallback path.

Our immediate fix was to remove that bad configuration to allow our primary provider to resume operating, as their errors had been transient and were passed by that point. Our long-term fix, already partially deployed with the rest in progress, was to implement a no-cache option for our build system to fall back to next time, as we can build without cache if needed.

2 Likes