New builds fail with error fetching branch, but "clear cache & deploy site" succeeds

I have a public repo for a website which is built/served by Netlify as

I regularly fails to fetch the repo (reading other posts I guess the issue is related to the fact git submodule for the hugo theme) but it was not the case till recently AND it always succeeds if I select “Retry deploy” --> “Clear cache & deploy site”.
Of course doing this manually for every commit I push, so I am looking for help/hints/suggestions because what I found around here in the Netlifu community did not help.
FYI website repo branch I am working on (the one linked to PR#22) is in synch with the latest commit on master of the theme.
website repo:
working branch:
theme repo:

theme repo latest commit is fbaa99e like the one pointed out by the themes/pru-theme in the website repo pru-theme @ fbaa99e

As said any helps is super welcome!

Hey there!

Have you checked out this useful guide?

Yes @Pierparker I checked all the bullets there.

Unfortunatelly nothing applies to my case: what is even more puzzling is that my setup is nothing different from the typical hugo setup.

The log of the failing build is:

3:41:06 PM: Build ready to start
3:41:08 PM: build-image version: 6dfe19d15f524c85d6f9bf7df9fb30b0a9f0a61a
3:41:08 PM: build-image tag: v3.3.10
3:41:08 PM: buildbot version: 6bb3f784302b4ad90de13035b247a363a8bee34a
3:41:09 PM: Fetching cached dependencies
3:41:09 PM: Starting to download cache of 3.9GB
3:41:32 PM: Finished downloading cache in 23.785336165s
3:41:32 PM: Starting to extract cache
3:42:13 PM: Finished extracting cache in 40.276861973s
3:42:13 PM: Finished fetching cache in 1m4.433310665s
3:42:13 PM: Starting to prepare the repo for build
3:42:13 PM: Preparing Git Reference pull/22/head
3:42:25 PM: Error fetching branch: pull/22/head
3:42:25 PM: Failing build: Failed to prepare repo
3:42:25 PM: Failed during stage ‘preparing repo’: exit status 1
3:42:25 PM: Finished processing build request in 1m17.04258875s

which once the cache is cleared becomes a success:

4:25:00 PM: Build ready to start
4:25:02 PM: build-image version: 6dfe19d15f524c85d6f9bf7df9fb30b0a9f0a61a
4:25:02 PM: build-image tag: v3.3.10
4:25:02 PM: buildbot version: 6bb3f784302b4ad90de13035b247a363a8bee34a
4:25:03 PM: Fetching cached dependencies
4:25:03 PM: Starting to download cache of 254.9KB
4:25:03 PM: Finished downloading cache in 103.188999ms
4:25:03 PM: Starting to extract cache
4:25:03 PM: Failed to fetch cache, continuing with build
4:25:03 PM: Starting to prepare the repo for build
4:25:03 PM: No cached dependencies found. Cloning fresh repo
4:25:03 PM: git clone
4:26:42 PM: Preparing Git Reference pull/22/head
4:26:47 PM: Found Netlify configuration file netlify.toml in site root

I have tried the docker image to debug locally, and it works there.
All in all it looks like that if the cache is not cleared the fetching of the relevant pull request from github fails…

Does anybody in support (maybe @fool ?) know how to debug this type of error?

9:40:50 AM: Error fetching branch: pull/22/head

Try checking this out!

Thanks @Pierparker but the commits are ok. In fact clearing the cache and redeployin works!
And I double checked the commits and they are aligned: parent repo branch points to latest of submodule.

So it is still a mystery!

So, this isn’t a usual error. I think the reason you don’t see it locally is that the local setup doesn’t try to run what our CI does precisely, since you start from the repo and we’re failing to clone. Here’s exactly why from some internal logs:

Error fetching branch: pull/22/head: From
 * branch            refs/pull/22/head -> FETCH_HEAD
Fetching submodule themes/pru-theme
   82184b9..fbaa99e  master     -> origin/master
fatal: remote error: upload-pack: not our ref 3c7be4357bb827f8f0e0508834931fc48a1f8f59
Errors during submodule fetch:

I don’t know what that means, but maybe you can tell based on the fact that we get it from running:

git clone && cd pru-theme && git submodule update -f --init

so perhaps it is something around your .gitmodules config? I bet you could reproduce locally just running that command in a directory that you don’t already have things checked out.

Thank you @fool I will try to reproduce locally.
Strangely it seems NOT to fail when you clear the cache and redeploy…what do your internal logs say for the same clone step when the cache is cleared: each recent failure is followed by a clear cache/redeploy.

Thanks again.
I’ll report back my findings soon

The cloning step is successful for me, here is the log:

espin@leon ~/kaos                                                                                                                                          [9:33:18]
(base) > $ git clone && cd pru-theme && git submodule update -f --init
Cloning into 'pru-theme'...
remote: Enumerating objects: 34, done.
remote: Counting objects: 100% (34/34), done.
remote: Compressing objects: 100% (23/23), done.
remote: Total 1041 (delta 18), reused 24 (delta 11), pack-reused 1007
Receiving objects: 100% (1041/1041), 4.46 MiB | 1.89 MiB/s, done.
Resolving deltas: 100% (604/604), done.

On the specific laptop I am now using, I did not have SSH keys set up, so I added them and on my github settings too. I do not think this is the reason of the failure because it has worked in the past and it works if you clear the cache…but I mentioned just to provide complete info.

@fool anything else I could try?
Thanks again for considering all these requests of mine!

Hey @espinielli,
If you’re still running into this, I wanted to share this suggestion in a comment on the build-image repo in case you want to give it a shot (i.e., string together the submodule command @fool suggested plus your Hugo build command as one long build command)?

On a separate note, it is surprising to see a missing 4GB cache solve the problem:
failed deploy

3:41:09 PM: Fetching cached dependencies
3:41:09 PM: Starting to download cache of 3.9GB

successful deploy

4:25:03 PM: Fetching cached dependencies
4:25:03 PM: Starting to download cache of 254.9KB

Dear @jen,
Thanks for the suggestion,
at the time I did try stringing together the submodule command but it did not solve the issue at hand.

I will check in future deploys with similar configuration whether the problem persists.

Thank you very much for your feedback!