Dumb i18n site

I don't know how to call this sort of internationalization scheme on a website so I coined it for my own use. Explaining this gets repetitive so I made this article to link to every time I wanted to talk about it.

Dumb i18n site

I don't know how to call this sort of internationalization scheme on a website so I coined it for my own use. Explaining this gets repetitive so I made this article to link to every time I wanted to talk about it.

What is this kind of article doing in a blog about game dev? This is my favorite kind of site to use for game's home page! It should maximize accessibility as it is common that you publish the game to multiple countries.

It has these characteristics :

No question asked, URL to choose the language

It is a site that use purely URL to choose which language to display. (e.g. example.com/th/about to display Thai version.) No cookies, no IP address detection can triumph over it.

Each language of each page exists for real as a separated HTML file. (But of course computer should do this for you, not manually.)

Direct edits OK

User can directly edit URL to go to a different language if they think that language code exists. This is ineffective on some sites that are "not dumb enough" and insist on displaying a certain language depending on your location, your browser, some settings deep in the cog wheel menu, or your cookie. And usually they will not tell you how the decision works. (I think this is frustrating rather than convenience.)

Unsupported languages

Unsupported language returns not found error (e.g. example.com/asdf/about) rather than a random redirection to default language. If all your language pages are really a separated HTML files, this should happen automatically.

All <a> links inside a specific language should contain the destination of that language for real. e.g. example.com/th/about has a link to go to example.com/th/contact but not a "neutral"  example.com/contact that you may need to rely on other tricks to finally come back to the right langauge.

When user hovers the mouse, they should see that clearly appearing on the browser's corner.

We make the crawler life as easiest as possible.

Neutral page

If user type "neutral URL" (no language slug) of any page, that page exists for real as an HTML file as well. The language is of a fixed default language (e.g en). Therefore it is a duplicated file of one of the language you supported. Again, machine should perform this duplication on building the website, not you.

Those neutral URL may use rewrite technique (serve same content, URL unchanged) according to something like navigator's language or visitor's IP address to instead show a content from language specific page, while not touching the URL. If rewriting fails (no specific page supported), then show the neutral HTML as a fallback.

Note that with rewrites instead of redirection, user can copy the URL and share it without making their own country appears in the URL if they just visited the page and hadn't navigate anywhere else yet.

But as we use rewrites, all the links in that page will go to language specific page according to "honest <a> links" rule. Navigating once from the rewritten page will turn back into "honest" version with language in the URL.

The neutral page allows putting your homepage URL without country code on a printed media and use it (e.g. a poster) in multiple geolocations.

Also, search engine crawlers may appears to come from multiple countries. If you do this right, the neutral URL will present different content to the crawler. This is great, as Japanese user coming from the search engine will see Japanese content, but the URL is example.com and not example.com/jp. (Though any navigation performed next will adds jp)

  • Some static hosting site has some config exposed to help you do this. For example Firebase Hosting has i18n rewrites.

Rich metadata, crawlers are welcomed

Rely on search engine so user of a specific country get to the page on the first time with the right language.

It knows all its alter-ego

Each page has <link rel="alternate" hreflang="x-default" href="HOST/NEUTRAL_PATH" /> and all other <link rel="alternate" hreflang="LANGUAGE" href="HOST/LANGUAGE/NEUTRAL_PATH" /> so all pages knows each other.

Each page has <meta property="og:url" content="REAL_HOST/LANGUAGE/NEUTRAL_PATH" /> of only its own, with the right language code. This make it works with Facebook and Twitter.

Read more about it here. Use the HTML tags way.

Localized <head> content

  • Localized <meta name="keywords" content="keyword1, keyword2, ..."/>.
  • Localized <title> and <meta property="og:title" content="..."/> and <meta name="twitter:title" content="..."/> (max 70 characters for Twitter).
  • Localized <meta name="description" content="..."/> (for Google, search engines), <meta property="og:description" content="..."> (Open Graph, Facebook) and <meta name="twitter:description" content="..."/> (max 200 characters for Twitter).
  • Localized social image, for example if they contain a lot of text or your product logo if you localized it.
  • The Open Graph image <meta property="og:image" content="..."/>
  • The Twitter image <meta name="twitter:image" content="..."/>
  • Alt image description for the visually impaired : <meta property="og:image:alt" content="..." /> and <meta name="twitter:image:alt" content="..." />.
  • If your page is for an app, localized <meta name="twitter:app:name:iphone" content={appName} />, <meta name="twitter:app:name:ipad" content={appName} />, and <meta name="twitter:app:name:googleplay" content={appName} /> for use with app-type card on Twitter.

What about <html lang="">?

Actually all tags are able to accept lang attribute. This would be ideal if all of your <div> soup has lang but I think it would be a major pain. (MDN)

So how about just put the right lang=" " inside <html>? It maybe a good idea but note that Google recommends nothing about it in this page. Therefore I think this is not an issue if you can't do it from your framework.

Honest language switcher

If there is no dedicated "switch language page" and each page has a dropdown, modal or a list to select all other versions, all those are plain <a> link that goes to the same page but with different language. (e.g. Clicking on "JAPAN" while on example.com/th/about links you to example.com/jp/about, not resetting to example.com/jp.)

That is an inverse of honest <a> links, where it preserves the language but go to a different page.

That means if you go to a different page, the component for switching language which may just looked the same as in previous page, actually has all its link adjusted.

Again! This might sounds painful to do it manually but you should use the power of modern web programming to do it for you in your front end framework of your choice.

Only just enough strings sent to client if possible

You might be using i18n packages that could accept an object kinda this shape :

{
  "en": {
    "term1": "value1",
    "term2": "value2"
  },
  "th": {
    "term1": "value1",
    "term2": "value2"
  },
  "jp": {
    "term1": "value1",
    "term2": "value2"
  }
}

Then using "current language" switch, any key request (e.g. format("term1")) will get the right string.

If your site is not hydrated and any  <a> link traversal is really going to a new page, the HTML can contain the most minimal strings needed for that one page in that one language.

If you need interactivity, for example the site hydrates and changing page now swaps DOM instead, then try to lazily add more strings. Because now if you consider switching language a hydrated action also, you will need strings of all pages of all languages ready. Popular solution is to preload more only when user clicks the link or even when hovering over the link.

Strings are usually small, so I think this rule is not that critical. I doubt even having all strings, of all pages, of all languages at the first load is even that costly for most site.

Example

Anyways, I have been trying to make a site of my own game to follow this pattern as much as possible : https://duelotters.com/. You can take a look for example, or even view the HTML source if each language is really dumb or not.

Note that the source looks alien since it is built with Svelte + Sapper, and the code in there is designed to perform direct DOM edit as is the characteristic of Svelte. (As opposed to DOM-diffing with React or Vue.) You need to figure out on your tooling stack of choice how to achieve the "dumb" status. Next.JS, for example, already has i18n routing which should help you work out.

And that's all! Maybe it doesn't sounds "dumb" after you see all the repetitive but-slightly-different HTML files needed for each page, but it is in a sense that it works rather in a straightforward way from visitor point of view. You are the site's developer should take the pain... luckily modern site building tools are great.