Bug (and workaround): 404 page returns HTTP 200

From the Redirect options documentation:

You can set up a custom 404 page for all paths that don’t resolve to a static file. This doesn’t require any redirect rules. If you add a 404.html page to your site, it will be picked up and displayed automatically for any failed paths.

This seems to work just fine for pages that don’t exist and for most special files, but fetching /404 or /404.html returns HTTP 200 instead of HTTP 404. As far as I’m aware, this is bad practice, and it may cause Google Search to be confused as to whether it should index such a soft 404 error page or not.

I did figure out a workaround by adding the following lines in my _redirects file, but it’s a bit silly:

/404.html /404 404!
/404 /404.html 404!

Note that self-redirects are ignored for some reason, hence why I had to write the rules like that.

A proper fix would be appreciated.

@SmashManiac Hmm, a 404 page is still a page, and if you request it and it loads, that would seem to be a valid response. The existence of the 404 page does not constitute an error. The 404 page is where visitors are sent when there is an error elsewhere. Therefore, it seems that the best practice would be that visiting the 404 page does not generate an (additional) error, right?

I understand where you’re coming from, but there are a few flaws with that logic.

First, it does make sense at first glance that when requesting directly a file called 404.html, and such a file exists on the server, that the contents of this file should be returned with no error regardless of context. The issue is that 404.html has a special meaning to Netlify. As such, Netlify should only consider this file as a private configuration file like _headers or _redirects, instead of also making it available as a fetchable resource.

Second, with a normal 404 error, no redirect occur on the client side. Netlify may have documented that feature under “Redirect options” and have it customizable with _redirects rules, but it’s not what actually happens. Indeed, the user is not redirected to a different URL during this process. In fact, sites that actually do redirect to a different URL to show a 404 error will start running into SEO issues as the bad page may be indexed by search engines. As such, saying “The 404 page is where visitors are sent when there is an error elsewhere.” is simply incorrect.

Finally, and that was my original point, is that you end up with an HTTP header that contradicts the HTTP body. A human may not see the former, but it causes a lot of problems to bots. In particular, if you were to submit a sitemap containing a soft 404 page to Google Search, it would normally flag it as an error.

I hope it clarifies the situation.

@SmashManiac Not even a little. We’re going to have to agree to disagree on this one.

Uh… this is not a matter of opinion.

Even if you don’t personally agree with some of the principles I’ve raised in my previous post, the fact remains that Netlify returns a contradictory HTTP response in this case, which causes an SEO issue, and that should be fixed.

In case you had not noticed, I did include evidence of this problem in my original post; it’s the soft 404 errors link from the Google Search Console documentation, included here again for convenience. The only reason I spoke in conditional is because such detection mechanisms by bots rely on heuristics.

Here’s another link from the same documentation, which explains how a soft 404 can trigger an error. You can see all possible errors and warnings Google Search Console can flag about indexing coverage on that page.

And here’s more details from a third-party about soft 404s, which contains an actual screenshot of said error in Google Search Console.

I hope these extra references clarify the validity of my bug report.

thanks for all this, @SmashManiac! I’ll get some :eyes: on this and we can take a closer look - you raise some valid points. More soon.

1 Like

Have to stand behind SmashManiac on this one, I think - a 404 is a 404, by definition hard-returns code 404, and then may also present a 404-orientated web page…

1 Like

Hi, @narrationsd and @SmashManiac. I’ve filed a feature request to change this behavior and always return a 404 for the the path /404 or /404.html.

If/when this feature becomes a reality, we’ll post a follow-up here to let you know about it. Please reply anytime if there is more information to share about this feature request and/or if there are any questions.

If you are someone else visiting this page and you would also like to see this feature request become a reality, please add a post below adding your “+1 vote”. If you do this, we will keep the feature request updated so that our project managers have accurate information about the level of interest in this feature.

6 Likes

I’m having a similar issue, was there any update?

Hi, @amac. The issue is still open. We will post an update here if the feature becomes available.

Just adding my +1 vote here. I would also find this feature helpful.

Sorry for the bump, but I got a PM from SavvyBot thinking that this issue has been solved. I would like to clarify that it hasn’t yet as far as I’m aware.

that’s fine! we are trying to get more people to mark things as solved - she’ll ask twice and then she’ll assume that there isn’t a solution and leave it alone, so i think you can ignore - of course, we’d prefer it if there were a solution :slight_smile:

Hey @SmashManiac

Apache HTTP Server shows exactly the same behaviour. Visit a non-existent page, the 404 file is returned with a 404 status code. Visit the 404 page directly (regardless of the name of the 404 page) and a 200 status code is returned.

To add to this:
If you configure Apache incorrectly by setting the 404 document to 404.html but have called the file missing.html, it then returns the message

Not Found
The requested URL was not found on this server.
Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request.

Note the last line. The 404 page needs to return a 200 status otherwise an additional error is returned. Likely this is enshrined in standards.

It seems for Netlify to make a change away from this behaviour would go against general practice.

Hey @coelmay, the behavior you’re describing is specific to Apache web servers, and I could not find any web standard corroborating your assumption. In fact, the Wikipedia page on HTTP 404 even references Apache web server configuration errors as an example of this very issue, along with a solution:

Soft 404s are problematic for automated methods of discovering whether a link is broken. Some search engines, like Yahoo and Google, use automated processes to detect soft 404s.[4] Soft 404s can occur as a result of configuration errors when using certain HTTP server software, for example with the Apache software, when an Error Document 404 (specified in a .htaccess file) is specified as an absolute path (e.g. http://example.com/error.html) rather than a relative path (/error.html).[5]

So yeah… maybe not the best example. :wink:

Note that I cannot confirm if the solution I referenced actually works or not for this specific example though.

As with @gregraven, I am still going to disagree with you on this one.

@coelmay , can you at least explain why you still disagree despite all of the references I shared? As I already explained in a previous post, SEO issues with soft 404 pages is well-documented and not a matter of opinion, so I can’t think of any reason how there could be any remaining disagreement.

@coelmay
I am very curious as well how you can disagree with this. It is clear soft 404 are a problem for SEO. Google says this about the matter: " It’s a bad user experience to return a 200 (success) status code, but then display or suggest an error message or some kind of error on the page. Users may think the page is a live working page, but then are presented with some kind of error. Such pages are excluded from Search."

1 Like

Could you explain a use case for your users going to /404.html on your site? If they go to 404.html manually, do they not expect to see a page showing 404 error?

I stand with @gregraven and @coelmay. Since Netlify did not make an error in serving /404.html, sending a 404 status code does not seem valid to me.

The soft 404 issue being talked about here says that a 404 page is returned for a page that’s not found but a 200 status code is sent, or in other words, “server” sends a OK status code, but client-side page shows an error. This could result in the page being excluded from indexing. As for why one would want their 404 page to be indexed, I’m yet to understand. If it’s being excluded automatically, that sound like a good thing to me.