Disabling X-ROBOTS-TAG header on hosted site

Hello, I have a gatsby site at https://www.brandonbaker.me/ that I’d like to be indexed for SEO, however I’m seeing that Google is unable to crawl the page due to the x-robots-tag disallowing it. Could you please help me with removing that header so my site can be indexed? Thank you!

hey bbaker,

to the best of my knowledge this shouldn’t be happening on published sites, but it does happen automatically on deploy previews (we’re expecting that you don’t want those to get indexed.)

A little more on that here:

Can you confirm that you are seeing this outside of a deploy preview?

Yep, I can confirm that I’m seeing this on my live, production site.

This is only activated by us on deploy previews (from PR’s). When I fetch your site, I don’t see that header:

< Cache-Control: public, max-age=0, must-revalidate
< Content-Type: text/html; charset=UTF-8
< Date: Thu, 13 Feb 2020 00:22:49 GMT
< Etag: "7ded67ff6a28c159755258fdc6d54b08-ssl"
< Strict-Transport-Security: max-age=31536000
< Age: 0
< Transfer-Encoding: chunked
< Connection: keep-alive
< Server: Netlify
< X-NF-Request-ID: 8dcd5b57-441e-43c5-9efa-d691d7bbdea0-1500804

Could you tell me the value of the x-nf-request-id HTTP response header, of a request that shows it for you? (check out this article for how to harvest it easily: [Common Issue] Netlify Support asked for the 'x-nf-request-id' header? Why are they asking for it and how do I find it?)

Hi, I was not seeing this issue on any actual response header, but the issue was reported by lighthouse on my gatsby cloud builds and in google search console.
However, now when I go to google search console it says that the indexing is blocked for “‘noindex’ detected in ‘robots’ meta tag”. When I inspect the ‘robots’ meta tag however, I see that it is set to all, so I’m wondering if there is some new issue there.

Thanks for the help

Hi, @bbaker6225. I believe I have found a reason that an old version of the content is being shown. (I’m assuming that this robots meta tag used to be in the HTML and has since been removed. I say this because Netlify wouldn’t modify that HTML so we would not be the reason it was there.)

I see prerendering is enabled for this site. Our documentation on prerendering says this:

Our built-in prerendering service will cache prerendered pages for between 24 and 48 hours; this is not adjustable.

If you have updated the site content but the prerendered version is out of date, it will update within (at most) 48 hours. If it does not update, please let us know.

Hi @luke , I searched the forum for relevant thread, not wanting to start a new one if possible, this is the closest i found

So my question is this. I have recently launched a new website and am about to embark on some PPC and other SEO goodness. I thought i would do a quick inventory of what is indexed on google already and have discovered that my site is indexed on google under its netlify subdomain!! (For at least one page, the contact page!). It appears both under correct custom domain and its netlify subdomain. I have gone in and added a redirect rule to make sure anyone finding the site on netlify subdomain gets redirected to corrected URL, I have also checked to make sure all canonical names are using the custom domain and they are so no worry about duplicate content. However I don’t want the site found in google under its netlfiy subdomain, what is the preferred strategy to mitigate for this and why does it happen (for future reference)

https://unruffled-sammet-a254ca.netlify.com/ is netlify subdomain in question

@DaveHarrison, yes, Google indexes everything it can find.

So, one question might be: “How did they find your site?”

Answer: We don’t know and that would be a great question for Google. We certainly don’t ask them to do so.

We also don’t prevent it except in certain cases. We do include a x-robots-tag: noindex for deploy previews (but not for branch deploys).

You can prevent Google from indexing whole sites and/or specific pages. If you don’t want your content to be indexed by Google these are their recommended methods:

https://support.google.com/webmasters/answer/93710?hl=en

Regarding not having content served (and therefore not indexed) on the Netlify subdomain, for that the 301 forced redirect rule (which you already mentioned) is the recommended solution for that:

https://<SUBDOMAIN>.netlify.com/* https://example.com/:splat 301!

Now that this subdomain has been been indexed, the Google removal process can be found here:

https://support.google.com/webmasters/answer/9689846

If there are other questions about this, please let us know.