Paths with accents not found and cause redirect

Hi,

We are building a new Gatsby website, and have some accented characters in some of our paths.

We use a redirect proxy to our previous website, when a path is not found.

In production, all paths with accented characters are not found. I’ve downloaded. the zip archive containing the whole build, and I can confirm that the pages are built, and present.

Example of path (not working, redirect to old website) : Les p'tits pâtés de flan, et autres recettes pour enfants par Chefclub Kids | chefclub.tv

(again the file /fr/…/les-ptits-pâtés-de-flan-et-si-on-pique-niquait-sur-la-plage/index.html does exist in the archive)

Exemple of branch deploy (working) : https://5fc5827f0b512100075b996c--gatsby-chefclub.netlify.app/fr/recettes/kids/03a6a8e8-11fe-473a-ace4-e00ae652d309/les-ptits-pâtés-de-flan-et-si-on-pique-niquait-sur-la-plage

Any idea ?

Thanks !

Hey @brunorzn,
Can you please try URL encoding the path in your redirect rule? No worries if the page name has accents (i.e. héllø.html) but it seems like our redirects engine does not like them as part of the redirect “from” rules. So your rule for the page you shared would have this “from” path:

https://www.chefclub.tv/fr/recettes/kids/03a6a8e8-11fe-473a-ace4-e00ae652d309/les-ptits-p%C3%A2t%C3%A9s-de-flan-et-si-on-pique-niquait-sur-la-plage/les-ptits-p%C3%A2t%C3%A9s-de-flan-et-si-on-pique-niquait-sur-la-plage

Let us know whether that helps!

Hello @jen

Thank you for your answer.

Our “from” rule is very simple:

[[redirects]]
  from = "/*"
  to = "https://original.chefclub.tv/:splat"
  status = 200
  [redirects.headers]
    X-From = "Netlify"
    signed = "REDIRECT_JWT_SECRET"
    X-Forwarded-Host = "www.chefclub.tv" 

So no accent in this config file.

I can try this if you think it can help diagnose the problem, but please note that we have thousands of pages (many, many of them with accents).

Also, renaming our pages is not a solution either :frowning:

Any other tips ?

Ah, I wonder if this is actually a trailing slash problem? For me, these two go to different places:
https://www.chefclub.tv/fr/recettes/kids/03a6a8e8-11fe-473a-ace4-e00ae652d309/les-ptits-p%C3%A2t%C3%A9s-de-flan-et-si-on-pique-niquait-sur-la-plage/ (yes slash at the end)
vs.
https://www.chefclub.tv/fr/recettes/kids/03a6a8e8-11fe-473a-ace4-e00ae652d309/les-ptits-p%C3%A2t%C3%A9s-de-flan-et-si-on-pique-niquait-sur-la-plage (no slash at the end)

In this case, I am wondering if you can work around this by adding force = true, so:

[[redirects]]
  from = "/*"
  to = "https://original.chefclub.tv/:splat"
  status = 200
  force = true
...

Using force = true would do this opposite : we will always redirect to our old website whereas we are trying to deploy new pages (I’ve just tried it, and everything was indeed redirected to our old website)

And when I remove the “/” at the end of the url, it always re-appears, because of that redirect :frowning:

@jen something very strange just happened (just like it probably did before on deploy branches)

After trying (then reverting) the force = true test… the new deployed website now has the page is correctly working.

I really don’t know what happened : both the codebase and the config are exactly the same

OK, I’m now adding 2 new pages to the build, one with accent, one without accent. Let’s see what happens.

[UPDATE]

Test results:

  • The page with accent which was previously working was not working anymore for some time (< 5 minutes - was redirected to old website) - and it’s now working again.

  • The new page with an accent, was not working for some time (<5 minutes - was redirected to old website) - and it’s now working

  • The new page without any accent is was working (directly)

So, we were still able to see the new pages this morning (8 hours have passed), rarely (3 people saw the new pages, in private navigation), but now everyone sees the old ones again.

The pages with no accents are always working (no redirect).

This is most certainly a problem on Netlify side. What worries me a lot is that it’s a very erratic behaviour.

Here is a simple way to test it :

curl -I $url 2> /dev/null | head -n 1" 

With $url being a path without accent:

curl -I https://www.chefclub.tv/fr/recettes/kids/b2529015-c75a-4750-a90d-56e953ded3d1/tiramisu-dalmatiens-101-dalmatiens-qui-donne-la-banane 2> /dev/null | head -n 1
HTTP/2 200

This is the correct behaviour.

But with $url being a path with an accent:

curl -I https://www.chefclub.tv/fr/recettes/kids/03a6a8e8-11fe-473a-ace4-e00ae652d309/les-ptits-p%C3%A2t%C3%A9s-de-flan-et-si-on-pique-niquait-sur-la-plage 2> /dev/null | head -n 1
HTTP/2 302

This is not the correct behaviour. Netlify is redirecting to our old website whereas it should display the new page. Again, this page does exist, I’ve double checked it in the zip archive of the build.

Please note that the 302 code is not the result of Netlify redirect (we always force a code 200) : on our old website, we force a redirect when no end slash is present at the end of the URL. So that’s a very direct way to test on which site we are.

We are supposed to rollout our new pages now. Many of them contains accents.

What can we do ?

Thanks a lot !

@jen

So I did a little test last night. Without changing anything to the codebase or config, I’m just triggering actions on our production deployment.
I have a script that does a cUrl on a url with some accents (always the same url, see previous posts) and logs the result, every 10 seconds.

I’ve plotted a little graph. In red the actions I took.

Results means :

  • 0 if Netlify doesn’t find the requested paths, and thus does a proxy redirect
  • 1 if Netlify does find the requested path, and displays the correct page

Before the first point and after the last point, it always stays at 0.

For a URL without accents, results are always 1.

Here is the graph.

Hey @brunorzn,
Thanks so much for all of your research. I’ve opened an issue internally about this behavior and hope to get back to you when we know more.

1 Like

I can confirm we have some issues in our redirect implementation when it comes to characters that would be urlencoded.

Can you try to make Gatsby generate the output files urlencoded? This could help our engine treat them properly.
In the meantime we will look into improving our handling of this.

Hi @marcus,

Thank you for your answer.

We indeed tried to generate the output files urlencoded, just to test, and it does not work :

  • urlencoded files still contains special characters (%) that seem to be badly handled by Netlify
  • if files are urlencoded, then the “%” in url should also be encoded (ie: %A9%23 becomes %25A9%25A3), so it’s clearly not what we want.

We also tried to use redirects (rewrites), from (urlencoded) path with accents to path without accents and it works.

But this is not a viable solution for us. It will break a lot of things on our (Gatsby) side and addressing those problems will require a lot of (complex / hacky) work.

Do you have any idea on how to improve this on your side, any roadmap ?
Is there anything we can do to help make things go faster ?

Thanks a lot,

Bruno

The only thing i can think about right now is to properly slugify those files. this usually means removing modifier characters so that only ASCII characters are left. You can use a module like slugify to produce those files/urls.
This seems to be what most more opinionated SSGs like e.g. Hugo do by default.

Thanks.

That’s exactly what we did in that “rewrites” solution : we use slugs and/or IDs as paths/filenames.

But if we do that, we will add a lot of unwanted complexity :

  • internally, Gatsby redirects (client side, not in SSR obviously) to the “true” path, so it can find everything it needs easily (like its page-data.json file), so we have to find a way to specify to Gatsby the “canonical url”, and I’m not sure that’s even possible. Maybe we will have to do that manually after the build, in every files, which is not at all clean. Also means a lot more of rewrites. At this time we don’t have a clean strategy to handle this.

  • same goes for and prefetch, in the worst case scenario, we still can use a href links, but that’s not what we aim for.

  • this will lead to a huge (thousands of lines) redirects file.

this definitely shouldn’t happen via _redirects.

i was under the impression Gatsby lets you freely choose the url for a page when generating them in gatsby-node.js?
that would be the correct way to get proper filenames and url references. it should also play well with the page-data.json files

Yes, we can define any URL in gatsby-node.js and get rid of all special characters but we need the URL to contain special characters.

Again, some context : we are migrating a website with thousands of URLs, many of them containing accents, and changing them is not an option…

If you think this should definitely not happen via _redirects (and I tend to agree with you), I really don’t know what our options are anymore :frowning:

could the redirect to the properly slugified url happen in your old backend?

i’d have recommended a netlify function as a fallback for a not found path to produce a redirect to the proper slug, but that might not be possible with your setup of proxying to the old site

Thanks again for taking the time to reply.

Do you mean, for instance :

/netlify/holà (not found, rewrites) 
   -> /oldsite/holà (found, slugify, rewrites) 
      -> /netilfy/hola

If it’s the case, then we kind of lose the interest of using your CDN, don’t you think ?

A Netlify function could work if we were able to detect which paths should call it (that’s doable). We would be in that big “redirect” option, but without the _redirects file. Looks a bit scary regarding performances though.

Do you have any idea if this problem is to be tackled soon in your roadmap ? This may help us chose the right decision.

Thanks again !

if you issue a 301 redirect instead of proxying to netlify you only get a very small latency hit. search engines will also cache 301 responses. the content would still come from the CDN.

i’d consider this based on how many of your pages have special characters and i’d recommend adopting a strategy where you gradually move to slugified urls so you wouldn’t need the custom redirect anymore at some point.
i guess the main problem is search engines having indexed the current urls, but the 301 redirect would update their indexes properly so that they should only show the new/slugified urls in the future.

Thanks a lot. We’ll definitely consider this solution.

I don’t know what slugified Korean URLs look like though ¯\(ツ)

https://www.chefclub.tv/ko/레시피/kids/d1b9e9fe-315d-4ad4-8fa9-c141d33a82d7/공룡-크림파이-공룡-크림파이-야채듬뿍-상큼-짭짤한-크림-케이크/

Please ping me here if you have any other idea that comes to your mind :slight_smile:

Thanks again Marcus !