When I started the series, I didn’t even plan this particular post. But as I drafted the upcoming posts, I realized, we cannot avoid taking a look at what we are actually dealing with. One should know, what he is talking about. Thus, here’s the necessary theoretical background…
The one thing we need to be aware is that we are dealing with two different contexts: The server part, defined by the .NET Framework, and the client side, defined by HTML et al. These contexts differ in the terms they use, in their customs, and in their technical scope.
.NET maintains the necessary information via the CultureInfo class:
“The CultureInfo class specifies a unique name for each culture, based on RFC 4646. The name is a combination of an ISO 639 two-letter lowercase culture code associated with a language and an ISO 3166 two-letter uppercase subculture code associated with a country or region.”
In short, we are talking about "en-US", "de-DE", and so on (ignoring special cases). One distinction is made regarding neutral cultures (associated only with the first part, e.g. "en" and "de"), and specific cultures, associated with the country or region. Still, neutral cultures are still maintained in CultureInfo instances, including information beyond the language. They generally rely on the "major representative" of that language. i.e. Germany for German (sorry Austrians ;-)), and the United Kingdom of Great Britain and Northern Irland … no wait… that former colony of theirs , for English.
It should be noted, that CultureInfo deals with all aspects regarding regions: It acts as language selector, provides date and time formats, even the calendar is addressed.
Regarding localized content (like strings for labels), .NET uses a system of resources and satellite assemblies, that are accessed via the ResourceManager class, either directly or through generated code. (I will assume that this is basic .NET knowledge and not go into further details about it.)
All in all, a comprehensive and consistent system.
HTML traditionally only addresses languages (not date or number formats):
“The lang attribute’s value is a language code that identifies a natural language spoken, written, or otherwise used for the communication of information among people.”
HTML is also far more open in regard to how a language is identified. This includes, but goes beyond what .NET supports:
“Here are some sample language codes:
- "en": English
- "en-US": the U.S. version of English.
- "en-cockney": the Cockney version of English.
- "i-navajo": the Navajo language spoken by some Native Americans.
- "x-klingon": The primary tag "x" indicates an experimental language tag”
However, the focus of HTML is also limited to languages, to the point of actively ignoring any other localization demand:
“The golden rule when creating language tags is to keep the tag as short as possible. Avoid region, script or other subtags except where they add useful distinguishing information. For instance, use ja for Japanese and not ja-JP, unless there is a particular reason that you need to say that this is Japanese as spoken in Japan, rather than elsewhere.”
And, indeed, you’ll find that most localized HTML or CSS code you may come across (in samples and documentation) uses two-letter language codes.
Alas, with HTML5 and the new input controls, the focus on "language" is no longer sufficient. A date picker does not only change the weekday names, but also the date format and the first day of the week. The way this issue is addressed by the W3C however seems a bit helpless and places the issue on the browser vendors:
“Browsers are encouraged to use user interfaces that present dates, times, and numbers according to the conventions of either the locale implied by the input element’s language or the user’s preferred locale.”
Well, a little further down they are refreshingly honest:
“There’s still a risk that the user would end up arriving a month late, of course, but there’s only so much that can be done about such cultural differences…”
Regarding localized content, HTML only allows denoting the language by the lang attribute. You can mix different languages in one document, but HTML itself does not do anything further. CSS selectors on the other hand can be used to attach styles depending on the language.
For an LOB application, using "language" in the limited sense of HTML is far to shortsighted, thus I will use regions (respective specific cultures on the server, respective tags with language code and country or region), whenever possible. This may seem odd in HTML or CSS, but so what?
And the next post will contain some code, promise.
That’s all for now folks,