[Support Guide] Understanding and debugging prerendering

Our staff can flush the cache for you, but it is not likely we’ll see your request in a timely-enough fashion to do so. Unfortunately you didn’t include the URL here so I couldn’t “just do it” when I did happen to see this quickly…

Hello @fool,

It looks like Facebook does not use pre-rendered page.
When we try with Facebook Sharing Debugger it retrieves the default page instead of the pre-rendered version with the meta tags. Pre-render page works for all other crawlers I tested. We are within the limit of 1MB cut off facebook suggests.

By running curl to simulate the Facebook crawler we get the pre-rendered page. The real facebook crawler doesn’t.

Could you tell us a URL that shows this behavior, please?

Thanks for the fast response, here is a link you can test:

OK. We are definitely sending the prerendered content to facebook (you can test:

curl -v -A facebot https://share.100mentors.com

…returns different content than:

curl -v https://share.100mentors.com

So I guess you’ll need to work with Facebook to debug what’s happening if it doesn’t work like you’re expecting, and if you didn’t have any luck following the debug steps above (which will likely help you expose the problem)

The docs state the prerender is cached for 24-48 hours.

Does this happen for every deploy? Or does the cache only get updated once a day or every other day regardless of the latest deploy?

The cache is created on demand - we get a request for a prerendered page not already in cache, we attempt to prerender it and if successful, store it in that 24-48h cache (which is implemented something like: “is this asset in our prerender cache >24 hours old? Then, we will schedule it for removal sometime in the next 24 hours”.)

This cache is handled 100% independent of any deploy - whatever is live at the moment of the request is received & prerendered, it is cached for 24-48h regardless of how many times or changes you may deploy after that.

Hello and thanks for the reply.
We figured out what was happening.
Facebook was crawling the root page because og:url was set to our root page instead of the full url.
Other crawlers that worked probably don’t check the og:url.
Prerendering is working fine!

Are there any plans to support an endpoint like POST /recache?

In some cases, it might be the best way to just set the cache time to infinite (instead of 24 - 48h) and then send a request on POST /recache from the API server when any relevant data have changed. This approach makes sure the cached page is never out of sync and it might be more efficient overall also.

Or is it expected to use prerender.io if this is needed?

That’s an interesting suggestion, but not compatible with our current implementation (since it would break the existing setup’s behavior for tens of thousands of existing sites relying on automatic expiry). I have added the suggestion to our larger feature request around “letting customers manipulate the prerender cache” since it would be a very nice optional feature for folks who are as savvy as you :slight_smile:

1 Like

Hi all and thanks for this prerender service, the idea is fantastic!
I’m trying to understand if it can work for my needs and I found a strange problem.
I have a react app with a normal browser router (connected-react-router by the way).
When I load the prerendered home page from netlify all works fine.
When I try to load a subpage, I get the error 404.
curl -A twitterbot https://eloquent-morse-90cfdb.netlify.app/
works
curl -A twitterbot TennisTalker
gives 404

All the 2 urls are working from direct link, the problem appears only from prerender.
I hope I’ve added all the info.

Thanks a lot for the help

This URL does not work for me with or without prerendering:

$ curl -v https://eloquent-morse-90cfdb.netlify.app/giocatore/128760 2>&1 | grep HTTP
> GET /giocatore/128760 HTTP/1.1
< HTTP/1.1 404 Not Found

If you could let me know a working URL that returns different status codes for with the -A twitterbot and without, will be happy to dig in for us :slight_smile:

Well, thanks for the answer!
I haven’t checked before because from browsers it works like a charm.
At this point, the problem is something related to the routing management with netlify when the request is made by curl.
Do you have any hint of why a react app can work from browser but not from curl?
Thanks a lot

Hey @emish89, you might need this redirect rule for history pushstate:

Let us know if that helps!

1 Like

All works fine now, thanks :slight_smile:

First – prerender feature is very cool! Thanks for the hard work.

Second – we’ve had issues with scraping in the past, so we block ajax requests from AWS and other hosting provider IPs. Unfortunately I think that is also blocking your prerenderer’s ajax requests. Do your prerender ajax requests come from a fixed set of IPs that I could specifically un-block those?

Welcome to our community, @gslecht! We run our prerender workers in lambda functions, which do not have a consistent IP range. In general “blocking or allowing IP’s” is not possible with our service as most of our addresses are dynamic, so I’d guess your analysis is right.

Not sure how you’d be configuring our service to block anything though? Unless your page loads contents from another service that you do control - we don’t have any blocks from serving our contents to our own prerender workers :wink:

Thanks @fool, I thought that might be the case – we do the same. And yes, we’re currently loading content from another service that we control the blocking on. Oh well, thanks!

I have netlify prerendering enabled on my Gatsby App.
I need to hit an API when my page loads. That API returns url to the OG image, so the API call is necessary. But the same API counts the number of visits. So the problem here is that every bot crawler adds to the page visits count.
I could work it around if somehow I could get the crawler user agent in the API call but apparently, the user agent contains info about the prerender.io service. Is it somehow possible to get the user agent of the actual caller of the API instead of the prerender service?

The User-Agent for our prerender service is varying versions of HeadlessChrome. Checking for this User-Agent substring and ignoring those would probably be a good proxy for “prerender or person?”

HeadlessChrome/ (with the /. Followed by a version string and preceded by some other stuff).

I’ve seen customers do this in the client side JS to avoid it being fetched at all:

const isPrerender = /HeadlessChrome/.test(window.navigator.userAgent)

(and then doing different things if prerender or not).

Since the prerender service is making the call, not the user’s browser (there may be no browser if it is a bot), we don’t have any way to pass that on; it’s a completely new request we make on behalf of $crawler.