[Common Issue] Understanding and debugging prerendering

With the advent of modern web hosting aka the JAMStack, some robots and crawlers need “help” navigating client-side links. This problem first presented itself initially when navigation was implemented using Javascript instead of HTML, but these days the OpenGraph standard and unfurling by social sharing services seems to be in many cases more important to our customers. (For more on this, see a further reading link we have added to the bottom of this post.) Prerendering is one solution to this problem.

Netlify provides a prerendering service described in some detail in this article. This post aims to help if it doesn’t “just work” and you’d like to debug or understand how to test.

Which requests are prerendered?

Prerendered content is only displayed in certain situations. Specifically it:

  • will only be used for a Netlify website once prerendering is enabled on your site’s build & deploy settings page
  • will only be returned in response to a web request when there is a User-Agent sent with the request that needs this special content, or a special URL is used (see below for more details around manual testing and the special URL).

There are a few dozen user agents that a request can present that will cause a prerendered response, for instance Twitterbot and facebookexternalhit/1.0 . All crawlers that we are aware of which need prerendering receive this treatment automatically, but if yours seems to be missing, please mention it below so we can potentially add it to the list!

Testing the service with your content

To test things out manually, you can also request a prerendered URL - see particularly the part about _escaped_fragment_ in our prerendering documentation.

So, to test prerendering, this is what I do. You can use any User-Agent that we prerender for, so I use twitterbot in this example.

curl -A twitterbot https://www.sitename.com/

or

curl https://www.sitename.com/?_escaped_fragment_=

This should show different output than not using a special User Agent or the ?_escaped_fragment_= style URL, which will demonstrate that the service is in effect.

Some caveats about the prerender service

  1. We cache the prerendered content for 24 to 48 hours, so you will want to make sure to test with different URLs each time if you are making changes and republishing a lot while you get things dialed in. Most importantly, this cache is NOT invalidated by a new deploy of your site!

  2. You may also need to use window.prerenderReady in your javascript to indicate to our service when it is safe to consider the page fully rendered to take the snapshot for the services that need it, as mentioned in the prerender.io docs here. If this isn’t set, we take a snapshot after 10 seconds regardless of the state of the page.

  3. When testing with Google Webmaster tools, please be aware that there can be an issue using the “Fetch as Google” functionality. This tool does not check for the fragment meta tag and re-request the page with the ?escaped_fragment= query parameter as it does in production: Googlebot itself does not have that issue. So, when using “Fetch as Google” you’ll need to append the ?_escaped_fragment_ query parameter on the end of the URL that you are testing, and that should show your prerendered page. If that works and if you have the following code snippet in the <head> of your HTML, then the real Googlebot will work just fine.

        <meta name="fragment" content="!">
    

Next Steps

That’s a lot of advice, but it’s possible that it still won’t resolve your issue. So what can you do next if your content doesn’t seem to be rendering correctly after enabling the prerender service? After checking out the below caveats to make sure it’s not related to one of those, here’s a bit more advice:

First, be aware that your code needs to provide the OpenGraph tags that are needed for social sharing services. A lot of frameworks and plugins (e.g. react-helmet) handle setting them for you, but “a wrong og tag” is generally not the fault of the prerender service but rather your code. You can examine the source code that we serve as described above to debug what is happening.

If instead you’re seeing errors (HTTP 5xx responses), then it is likely that your code is not rendering at all. Since we don’t expose prerender logs to customers, using a local copy of Netlify’s prerender code (forked from prerender.io, and different!) and following the local-use instructions in its readme is the best way to debug those issues in depth. This will show any syntax errors that are encountered as well as letting you examine the output including any OpenGraph tags.

A customer wrote up his journey in great detail with a lot of practical advice in this medium article here - it explains a lot about the world of rendering and unfurling and testing using third party tools. If you’re struggling to understand why prerendering is needed and how prerendering works with social sharing articles, this is a great starting place.

2 Likes

This no longer works for the facebookexternalhit/1.1 useragent, or bing.
It currently works for Googlebot and the Twitterbot/1.0 useragent.

Please help bring this back.

Hi bryan, I’ve earmarked this for investigation.
We’ll update this thread once we’ve looked into it!

hi guys, i’m not sure if bryan is talking about the same thing, but your managed version of prerender seemd to suffer from this bug. (i’ve seen the same issue when examining as googlebot emulator) and workaround described in this issue also brings back facebook

this is fixed in prerender 5.5.1

edit: netlify are already tracking this here:

1 Like

Hi @alexisdray,

Yea, we’re a few versions behind, but we definitely want to get our prerender service updated. Thanks for letting us know!

1 Like

Another caveat about enabling Prerendering is that OG tags may not be read by Facebook crawler. It will say something like “the meta tags are in the body” because the HTML is messed up and FB crawler cannot identify the HEAD.

We are also experiencing this issue with Facebook. The tested curl results are suffering from the html string prepending the markup and when you look at the scrapped results from Facebook debugger its omitting the head completely and placing all the meta data/tags within the body. Therefore not finding any of the open graph tags.

Here is an example from the “what our scrapper sees” on Facebook Debugger:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
<p>html</p>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
<meta charset="utf-8">

I’m Assuming by the funky p tagged html this is all stemming from the same issue.

However, I did find this response post within the react-helmet issues:

Stating that a GTM trigger was causing an iframe to be injected after the html tag and breaking things. We are using Helmet and GTM so I might dig a little on that, but thought I’d mention incase it sparked any ideas/thoughts.

Otherwise any idea on the status of an update/fixes for pre-rendering?

Hey folks, good news! We shipped that fix for the rogue string html before your code yesterday at about 5pm Pacific and it has been running smoothly since.

3 Likes

Netlify is Netli-fly :sunglasses:

2 Likes

Pre-rendering appears to remove JavaScript tags from pages. That makes a lot of sense when the pre-rendered page and the original page share a URL (say, my-site.com/post/5), but when using the slightly old-fashioned #! URL structure which Google used to recommend (eg, my-site.com/#!/post/5), it breaks sites completely. Here’s what happens with a proper URL (no hash symbol):

  • Normal browser: my-site.com/post/5 loads with normal JS and looks normal.
  • With GoogleBot: my-site.com/post/5 is sent to the server, pre-rendered, and served without JavaScript.

Now with a hash:

  • Normal browser: my-site.com is the portion of the URL sent to the server. JavaScript reads #!/post/5 portion of the URL and loads it dynamically.
  • GoogleBot, without JavaScript: Sometimes GoogleBot tries to index without JavaScript. In this case, it rewrites the URL to use _escaped_fragment_ and pre-rendering returns the correct page.
  • GoogleBot, with JavaScript: Increasingly, GoogleBot prefers to use normal JavaScript like a client rather than _escaped_fragment_. I know this because my URLs are showing up as “not selected as canonical” in the Search Console despite the fact that the _escaped_fragment_ versions look very different. The pre-rendering service only sees the my-site.com portion of the URL, so pre-renders the homepage regardless of the hash portion of the URL. Because JavaScript is removed, the hash portion of the URL is completely ignored and GoogleBot sees every page exactly the same way.

Possible solutions:

  • Provide two pre-rendering modes:
    • Without hash; always remove JavaScript from HTML
    • With hash; only remove JavaScript from _escaped_fragment_ URLs.
  • Provide a checkbox to enable or disable <script>-tag removal.

Of course, the real solution is probably for me to move away from the old-style #! URLs.

Yea, I think the real solution is indeed moving away from the #! routing style. One thing to try is adding <meta name="fragment" content="!"> to your page to see if that helps as a possible short-term workaround.