en: Facts, argumentation, implementation, and poll related to language codes on Wikipedia.

The Wikipedia community is committed to including any and all languages for which there are Wikipedians willing to do the work. We are aware that many of the world's 6,500 languages are not well-represented on computers or the web, and we are committed to working with language speakers and computing organizations to support as many languages as possible.

One standard for marking the languages used in 'net documents isRFC 3066.For the most part, this specifies using ISO 639-1's two-letter codes where available, ISO 639-2's three letter codes where two-letter codes are not available, and another set of codes (or regional/dialect specifiers tacked onto the above) where possible.

These codes are used in HTTP Accept-language and Content-type headers, in the HTML 'lang' attribute, and XML 'xml:lang' attribute. They're also used as the first element of the hostname for eachWikipedia's language editions:fr.wikipedia.org,nah.wikipedia.orgetc.



Existing language codes and coverage


There exist at least7602languages on the world, because that is the number of SIL codes. It is assumed that 90% of the world's languages are likely to disappear by 2050[1]

  • en:ISO 639-3used by SIL/ethnologue (3-letter)
    • maximum: ca17,000codes
    • current:7,602languages (as reference how many languages approximately exist)
  • ISO 639-1 (2-letter)
    • maximum: ca676codes
    • current: 180

Language codes that look like country codes


There are approximately 50 "conflicting" language and country codes, listed inLanguage codes/Conflicts.A "conflict" occurs when a country uses the same code as a language that is not widely used in that country. Theoretically, country and language codes are orthogonal so the conflict does not exist.



2 letter language codes are too similiar to ISO 3166-1 country codes


This could have a mnemonic advantage; however, it could provoke confusion when the country and language codes conflict. For example, be.wikipedia.org could mean Belorussian Wikipedia or Wikipedia Belgium 2-letter subdomain often used by companies to run country-specific websites. SeeLanguage codes/Conflictsfor a full list.

  • PRO 3-letter-language-code:
    • needless confusion and FAQ-writing can be avoided
    • users will not think that they get country specific content
    • in the long run ability to provide country specific content via a 2-letter-system. e.g. nl.wikipedia.org as entry for netherlands.

Small languages


Small languages without 2-letter code will get the 3-letter code. This is not nice. It's like saying: You are small, you get the longer URL, everybody will see that you are not in the group of the big languages. 3-letter code is better here.

Most of the world's languages don't have a three-letter code, either! However, even the 2-letter codes cover the vast majority of speakers. I'm in no hurry to mess with things overmuch just yet, but I'd be perfectly willing to make 3-letter codes available in the sort term as aliases, and any 3-letter-only languages that people want Wikipedia in can be set up using the 3-letter codes. --Brion VIBBER05:33 7 May 2003 (UTC)
This seems to me to be a case of excessive political correctness. I am American but speak fluent Swedish, less than fluent Norwegian and read but do not speak Danish. My Swedish and Danish friends often speak of their languages as beingsmall languages,which they are relative to English (Swedish is the largest of the Nordic languages with c. 11 million speakers.) They nonethless have 2-letter codes, as do hundreds of smaller languages. The practical question is whether the speakers of any language which has been assigned a 3-letter code have proposed to start a Wikipedia.Robertgreer19:09, 30 December 2007 (UTC)[reply]

Tags for the Identification of Languages, RFC-3066 language code assignments

Basic gist:

  • Use 2-letter codes from ISO 639-1 where they exist (en, fr, eo)
  • Fall back to 3-letter codes from ISO 639-2 where there isn't one (ger, art, cel)
  • Fall back to IANA-defined tags elsewise (i-tsu)
  • Use country or region/dialect/subgroup subtags where necessary to distinguish some of the more general codes (sgn-US, cel-gaulish, art-loglan)

Wikipedia is young and there is no need to repeat the mistake of using ISO 639-1. Fallback rule is nice, but it is easier if one does not need the rule at all, because one only uses 'one' code system and not 'two' like RFC. That does not mean not to allow aliases. Of course the old 2-letter codes can still be used, but should maybe get the status of depriciated, like we know it from HTML tags. We should allow every outsider to enter wikipedia by 3-letter code. 'Aliases would be fine.'Tobias Conradi19:35 8 May 2003 (UTC)

Agreement with Tobias. My modification would be to fall back to 3-letter codes if the 2 letter code is also a country code for a nation that does not (largely) use the language. For example, "be" is Belgium, but also Belorussian, so we should use "bel" instead. SeeLanguage codes/Conflicts--Kowey19:03, 21 Dec 2003 (UTC)

I agree whit Brion. Please do not force existing Wikipedia's to use a different language code.Giskart10:54 7 May 2003 (UTC)



Details about aliases /redirects


will the content be available in to ways, or will there be a server redirect? If so in which direction?Tobias Conradi19:35 8 May 2003 (UTC)

I was intending to use redirects, so a visit to eghttp://epo.wikipedia.org/wiki/Interretowould send you tohttp://eo.wikipedia.org/wiki/Interreto.This doesn't have to be permanent in that direction, but it would maintain the status quo; note that using redirects rather than just having aliases should cut down on weird things like login cookies not being available when accessed from the alternate URL, or search engines crawling and indexing the site multiple times. --Brion VIBBER21:02 8 May 2003 (UTC)
I would vote forhttp://eo.wikipedia.org/wiki/Interretoas redirect tohttp://epo.wikipedia.org/wiki/Interreto.Then we have clear interfaces (forever?) and people get used to 3-letter. Will there be problems with this way?
Well, it's ugly as heck and inconsistent with usage of language codes elsewhere on the net. I'd rather not do it that way, and others have expressed the same opinion (see Giskart's comment above). And again, 3-letter codes won't cover all possibilities. Some languages will require dialect/region specifiers on top of it, or don't have any 3-letter code to work with at all, so consistency isn't going to be achieved. --Brion VIBBER00:36 9 May 2003 (UTC)

Where to use the codes


Thoughts on language integrationproposes that language codes be put as part of the URL path and not the domain name:

Personally, i feel this would create a lot less confusion over country codes. --Kowey10:14, 6 Jan 2004 (UTC)

This also would make the URL shorter because the "wiki" in the path is abandoned.Tobias Conradi

The html-code would be smaller for links to other wikis, because domain is allways the sameTobias Conradi14:19, 20 Sep 2004 (UTC)

Informal Poll


Please make it known if you belong to one of the Wikipedias onLanguage codes/Conflictsbecause this likely affects you more than people on en, fr, etc.



Things that everybody agrees on

  • Wikipedias are for languages, not countries
  • We should use redirects to keep things compatible
  • HTTP headers and XML/HTML attributes should definitely follow the RFC

Keep 2 letter codes (RFC)


Switch to 3 letter codes

  1. Kowey- Malay (ms); no confusion more user-friendly (or don't stick it in the domain name)
  2. Tobias Conradi;( "or don't stick it in the domain name" is NOT a solution ")
  3. Nightstallion(?)16:28, 8 August 2006 (UTC)[reply]

Don't care


See also
