Wikipedia:Language recognition chart

WP:LRC

Thislanguage recognition chartpresents a variety of clues one can use to help determine the language in which a text is written.

Characters

The language of a foreign text can often be identified by looking up characters specific to that language.

ABCDEFGHIJKLMNOPQRSTUVWXYZ (Latin Alpha bet)
- and no other –English,Indonesian,Latin,Malay,Swahili,Zulu
- àäèéëïĳöü –Dutch(Except for the ligature ĳ, these letters are very rare in Dutch. Even fairly long Dutch texts often have nodiacritics.)
- áêéèëïíîôóúûAfrikaans
- êôúû –West Frisian
- ÆØÅæøå –Danish,Norwegian
- singlediacritics,mostlyumlauts
  - ÄÖäö –Finnish(BCDFGQWXZÅbcfgqwxzå are found only in names and loanwords, occasionally also ŠšŽž)
  - ÅÄÖåäö –Swedish(occasionally é)
  - ÄÖÕÜäöõü –Estonian(BCDFGQWXYZcfqwxyz are found only in names and loanwords, occasionally also ŠšŽž)
  - ÄÖÜẞäöüß –German
- Circumflexes
  - ÇÊÎŞÛçêîşû –Kurdish
  - ĂÂÎȘȚăâîșț –Romanian
  - ÂÊÎÔÛŴŶÁÉÍÏâêîôûŵŷáéíï –Welsh;(ÓÚẂÝÀÈÌÒÙẀỲÄËÖÜẄŸóúẃýàèìòùẁỳäëöüẅÿ used also but much less commonly)
  - ĈĜĤĴŜŬĉĝĥĵŝŭ –Esperanto
- Three or more types of diacritics
  - ÇĞİÖŞÜçğıöşü –Turkish
  - ÁÐÉÍÓÚÝÞÆÖáðéíóúýþæö –Icelandic
  - ÁÐÍÓÚÝÆØáðíóúýæø –Faroese
  - ÁÉÍÓÖŐÚÜŰáéíóöőúüű –Hungarian
  - ÀÇÉÈÍÓÒÚÜÏàçéèíóòúüï· –Catalan
  - ÀÂÆÇÉÈÊËÎÏÔŒÙÛÜŸàâæçéèêëîïôœùûüÿ –French;(Ÿ and ÿ are found only in certain proper names)
  - ÁÀÇÉÈÍÓÒÚËÜÏáàçéèíóòúëüï (· only in Gascon dialect) –Occitan
  - ÁÉÍÓÚÂÊÔÀãõçáéíóúâêôà (ü Brazilian and k, w and y not in native words) –Portuguese
- ÁÉÍÑÓÚÜáéíñóúü ¡¿ –Spanish
- ÀÉÈÌÒÙàéèìòù –Italian
- ÁÉÍÓÚÝÃẼĨÕŨỸÑG̃áéíóúýãẽĩõũỹñg̃ -Guarani(the only language to use g̃)
- ÁĄĄ́ÉĘĘ́ÍĮĮ́ŁŃ áąą́éęę́íįį́łń (FQRVfqrv not in native words) –Southern Athabaskan languages
  - ’ÓǪǪ́ āą̄ēę̄īį̄óōǫǫ́ǭúū –Western Apache
  - 'ÓǪǪ́ óǫǫ́ –Navajo
  - ’ÚŲŲ́ úųų́ –Chiricahua/Mescalero
- ąłńóżLechitic languages
  - ąćęłńóśźżPolish
  - ćśůźSilesian
  - ãéëòôùKashubian
- A, Ą, Ã, B, C, D, E, É, Ë, F, G, H, I, J, K, L, Ł, M, N, Ń, O, Ò, Ó, Ô, P, R, S, T, U, Ù, W, Y, Z, Ż –Kashubian
- ČŠŽ
  - and no other –Slovene
  - ĆĐ –Bosnian,Croatian,Serbian Latin
  - ÁĎÉĚÍŇÓŘŤÚŮÝáďéěíňóřťúůý –Czech
  - ÁÄĎÉÍĽĹŇÓÔŔŤÚÝáäďéíľĺňóôŕťúý –Slovak
  - ĀĒĢĪĶĻŅŌŖŪāēģīķļņōŗū –Latvian;(ŌŖ and ōŗ no longer used in most modern day Latvian)
  - ĄĘĖĮŲŪąęėįųū –Lithuanian
- ĐÀẢÃÁẠĂẰẲẴẮẶÂẦẨẪẤẬÈẺẼÉẸÊỀỂỄẾỆÌỈĨÍỊÒỎÕÓỌÔỒỔỖỐỘƠỜỞỠỚỢÙỦŨÚỤƯỪỬỮỨỰỲỶỸÝỴ đàảãáạăằẳẵắặâầẩẫấậèẻẽéẹêềểễếệìỉĩíịòỏõóọồổỗốơờởỡớợùủũúụưừửữứựỳỷỹýỵ –Vietnamese
  - ꞗĕŏŭo᷄ơ᷄u᷄ –Middle Vietnamese
- ā ē ī ō ū – May be seen in someJapanesetexts inRōmajior transcriptions (see below) orHawaiianandMāoritexts.
- é –Sundanese
- ñ -Basque
أ ا ب ت ث ج ح خ د ذ ر ز س ش ص ض ط ظ ع غ ف ق ك ل م ن ه ؤ و ئ ى ي ءArabic script
- Arabic,Malay(Jawi),Kurdish(Soranî),Panjabi / Punjabi,Pashto,Sindhi,Urdu,others.
- پ چ ژ گ –Persian(Farsi)
Brahmic familyof scripts
- Bengali script
  - অ আ কা কি কী উ কু ঊ কূ ঋ কৃ এ কে ঐ কৈ ও কো ঔ কৌ ক্ কত্‍ কং কঃ কঁ ক খ গ ঘ ঙ চ ছ জ ঝ ঞ ট ঠ ড ঢ ণ ত থ দ ধ ন প ফ ব ভ ম য র ৰ ল ৱ শ ষ স হ য় ড় ঢ় ০ ১ ২ ৩ ৪ ৫ ৬ ৭ ৮ ৯
  - used to writeBengaliandAssamese.
- Devanāgarī
  - अ आ इ ई उ ऊ ऋ ॠ ऌ ॡ ऍ ऎ ए ऐ ऑ ऒ ओ ओ क ख ग घ ङ च छ ज झ ञ ट ठ ड ढ ण त थ द ध न प फ ब भ म य र ल ळ व श ष स ह ० १ २ ३ ४ ५ ६ ७ ८ ९ प् पँ पं पः प़ पऽ
  - used to write, either along with other scripts or exclusively, severalIndianlanguages includingSanskrit,Hindi,Maithili,Magahi Marathi,Kashmiri,Sindhi,Bhili,Konkani,BhojpuriandNepalifromNepal.
- Gurmukhi
  - ਅਆਇਈਉਊਏਐਓਔਕਖਗਘਙਚਛਜਝਞਟਠਡਢਣਤਥਦਧਨਪਫਬਭਮਯਰਲਲ਼ਵਸ਼ਸਹ
  - primarily used to writePunjabias well asBraj Bhasha,Khariboli(and otherHindustanidialects),SanskritandSindhi.
- Gujarati script
  - અ આ ઇ ઈ ઉ ઊ ઋ ઌ ઍ એ ઐ ઑ ઓ ઔ ક ખ ગ ઘ ઙ ચ છ જ ઝ ઞ ટ ઠ ડ ઢ ણ ત થ દ ધ ન પ ફ બ ભ મ ય ર લ ળ વ શ ષ સ હ ૠ ૡૢૣ
  - used to writeGujaratiandKachchi
- Tibetan script
  - ཀ ཁ ག ང ཅ ཆ ཇ ཉ ཏ ཐ ད ན པ ཕ བ མ ཙ ཚ ཛ ཝ ཞ ཟ འ ཡ ར ལ ཤ ས ཧ ཨ
  - used to writeStandard Tibetan,Dzongkha(Bhutanese), andSikkimese
АБВГДЕЖЗИКЛМНОПРСТУФХЦЧШ (Cyrillic Alpha bet)
- ЙЩЬЮЯ
  - Ъ –Bulgarian
  - ЁЫЭ
    - Ў, no Щ, І instead of И (Ґ in some variants) –Belarusian
    - rarely Ъ –Russian
  - ҐЄІЇ –Ukrainian
- ЉЊЏ, Ј instead of Й (Vuk Karadžić's reform)
  - ЃЌЅ –Macedonian
  - ЋЂ –Serbian
- ЄꙂꙀЗІЇꙈОуꙊѠЩЪꙐЬѢЮꙖѤѦѨѪѬѮѰѲѴҀ –Old Church Slavonic,Church Slavonic
- Ӂ – Romanian inTransnistria(elsewhere in Latin)
ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩ αβγδεζηθικλμνξοπρσςτυφχψω (Greek Alphabet) –Greek
אבגדהוזחטיכלמנסעפצקרשת (Hebrew Alpha bet)
- and maybe some odd dots and lines above, below, or inside characters –Hebrew
- פֿ; dots/lines below letters appearingonlywith א,י, and ו –Yiddish
- no dots or lines around the letters, and more than a few words end with א (i.e., they have it at the leftmost position) –Aramaic
- Ladino
Chữ Hán văn hóa vòng – Some East Asian Languages
- and no other –Chinese
- with あいうえおのHiraganaand/or アイウエオノKatakana–Japanese
위키백과에 (note commonplace ellipses and circles)Korean
ㄅㄆㄇㄈㄉㄊㄋㄌㄍㄎㄏ etc. -- ㄓㄨˋㄧㄣㄈㄨˊㄏㄠˋ (Bopomofo)
- ㄪㄫㄬ -- notMandarin
កខគឃងចឆជឈញដឋឌឍណតថទធនបផពភមសហយរលឡអវអ្កអ្ខអ្គអ្ឃអ្ងអ្ចអ្ឆអ្ឈអ្ញអ្ឌអ្ឋអ្ឌអ្ឃអ្ណអ្តអ្ថអ្ទអ្ធអ្នអ្បអ្ផអ្ពអ្ភអ្មអ្សអ្ហអ្យអ្រអ្យអ្លអ្អអ្វ អក្សរខ្មែរ (Khmer Alpha bet) -Khmer
Ա Բ Գ Դ Ե Զ Է Ը Թ Ժ Ի Լ Խ Ծ Կ Հ Ձ Ղ Ճ Մ Յ Ն Շ Ո Չ Պ Ջ Ռ Ս Վ Տ Ր Ց Ւ Փ Ք Օ Ֆ (Armenian Alpha bet) –Armenian
ა ბ გდ ევ ზ ჱ თ ი კ ლ მ ნ ჲ ო პ ჟ რ ს ტ ჳ უ ფ ქ ღ ყ შ ჩ ც ძ წ ჭ ხ ჴ ჯ ჰ ჵ ჶ ჷ ჸ (Georgian Alpha bet) –Georgian
กขฃคฅฆงจฉชซฌญฎฏฐฑฒณดตถทธนบปผฝพฟภมยรฤลฦวศษสหฬอฮฯะา฿เแโใไๅๆ๏๐๑๒๓๔๕๖๗๘๙๚๛ (Thai script) -Thai
AEIOUHKLMNPW'Hawaiian Alpha bet- Hawaiian
ⴰⴱⴲⴳⴴⴵⴶⴷⴸⴹⴺⴻⴼⴽⴾⴿⵀⵁⵂⵃⵄⵅⵆⵇⵈⵉⵊⵋⵌⵍⵎⵐⵑⵒⵓⵔⵕⵖⵗⵘⵙⵚⵛⵜⵝⵞⵠⵡⵢⵣⵤⵥⵦⵧTifinagh,a script used forTamazight(Berber)
ꦄꦅꦆꦇꦈꦉꦊꦋꦌꦍꦎꦏꦐꦑꦒꦓꦔꦕꦖꦗꦘꦙꦚꦛꦜꦝꦞꦟꦠꦡꦢꦣꦤꦥꦦꦧꦨꦩꦪꦫꦬꦭꦮꦯꦰꦱꦲJavanese Script,also written in Arabic and English script- very similar toBalinese scriptin letters
ᮃᮄᮅᮆᮇᮈᮉᮊᮋᮌᮍᮎᮏᮐᮑᮒᮓᮔᮕᮖᮗᮘᮙᮚᮛᮜᮝᮞᮟᮠSundanese script,also written in Arabic and English script

Latin Alpha bet (possibly extended)

Romance languages

Lots ofLatinroots.

French(Français)

Accented letters:â ç è é ê î ô û,rarelyë ï;ùonly in the wordoù,àonly at the ends of a few words (includingà). Neverá í ì ó ò ú.
Angle quotation marks: « » (though "curly-Q" quotation marks are also used); dialogue traditionally indicated by means of dashes.
Common short words:la,le,les,un,une,des,de,du,à,au,et,ou,où,sur,il,elle,ils,se,je,vous,que,qui,y,en,si,ne,est,sont,a,ont.
Many apostrophised contractions for common pronouns and particles, i.e. wordsl'ord',less oftenc',j',m',n',s',t',or rarelyz'— only before a word starting by a vowel or, in some cases, anh.
Common digraphs and trigraphs:
- Vowels digraphs:au,ai,ei,ou.Word-final-ez.
- Vowels digraphs (nasals):an,en,in,on,rarelyun.For all of these, thenbecomembeforeb,porm(e.g.embouchure,never *enbouchure).
- Vowel trigraphs:eau,ein,ain,oin.
- Consonant digraphs:ch,gu-.Rarelysh.Semi-consonant-ill-.
Letterswandk,are rare and used only in loanwords, most often from Germanic languages (e.gwhisky).
Ligaturesœandæare conventional but are rarely used (a few words are well known, e.g.œil,œuf(s),bœuf(s),most other are scientific/technical and borrowed from Latin).
Words ending in-aux,-eux,or-oux.

Spanish(Español)

Characters: ¿ ¡ (inverted question and exclamation marks), ñ
All vowels (á, é, í, ó, ú) may take an acute accent
The letterucan take a diaeresis (ü), but only after the letter g
Some words frequently used: de, el, del, los, la(s), uno(s), una(s), y
No apostrophised contractions
No use of grave accent
Letterskandware rare and only used in loanwords (e.g.walkman)
Word beginnings: ll- (check not Welsh or Catalan) double L (ll)
Word endings: -o, -a, -ción, -miento, -dad
Angle quotation marks: « » (though "curly-Q" quotation marks are also used); dialogue often indicated by means of dashes

Italian(Italiano)

Almost every native word ends in a vowel. Example exceptions includenon,il,per,con,del.
Common one-letter word:è.
Common word:perché.
Letter sequences:gli,gn,sci.
Lettersj,k,w,xandyare rare and used only in loanwords (e.g.whisky).
Word endings:-o,-a,-zione,-mento,-tà,-aggio.
Grave accent (e.g., on à) almost always occurs in the last letter of words.
Double consonants (tt,zz,cc,ss,bb,pp,ll,etc.) are frequent.

Catalan(Català)

Characters: à, è, é, í, ï, ò, ó, ú, ü, ç, ·
Character combinationtz(also common in Basque, however) andl·l
Syllables and words ending in-aig,-eig,-oig,-uig,-aix,-eix,-oix,-uix
Letter sequences:tx(also common in Basque, however) andtg
Letteryis only used in the combinationnyand loanwords
Letterskandware rare and only used in loanwords (e.g.walkman)
Word endings:-o,-a,-es,-ció,-tat,-ment
Word beginning:ll-(also common in Spanish and Welsh, however)
Common words:això,amb,mateix,tots,que

Romanian(Română)

Characters: ă â î ș ț
Common words: și, de, la, a, ai, ale, alor, cu
Word endings: -a, -ă, -u, -ul, -ului, -ție (or -țiune), -ment, -tate; names ending in -escu
Double and triple i: copii, copiii
Note that Romanian is sometimes written online with no diacritics, making it harder to identify. A cedilla is sometimes used on S (ş) and on T (ţ) instead of the correct diacritic, the comma (above).

Portuguese(Português)

Characters: ã, õ, â, ê, ô, á, é, í, ó, ú, à, ç
Common one-letter words: a, à, e, é, o
Common two-letter words: ao, as, às, da, de, do, em, os, ou, um
Common three-letter words: aos, com, das, dos, ele, ela, mas, não, por, que, são, uma
Common endings: -ção, -dade, -ismo, -mente
Common digraphs: ch, nh, lh; examples: chave, galinha, baralho.
The letters k, w and y are rare. They are found mostly in loanwords, e.g.:keynesianismo,walkie-talkie,nylon.
Most singular words end in a vowel, l, m, r, or z.
Plural words end in -s.

Walloon(Walon)

Characters: å, é, è, ê, î, ô, û
Common digraphs and trigraphs: ai, ae, én, -jh-, tch, oe, -nn-, -nnm-, xh, ou
Common one-letter words: a, å, e, i, t', l', s', k'
Common two-letter words: al, ås, li, el, vs, ki, si, pô, pa, po, ni, èn, dj'
Common three-letter words: dji, nén, rén, bén, pol, mel
Common endings: -aedje, -mint, -xhmint, -ès, -ou, -owe, -yî, -åcion
Apostrophes are followed by a space (preferably non breaking one), eg:l' omeinstead ofl'ome.

Galician(Galego)

Similar to Portuguese; the indefinite article "unha" (fem. plural), the suffix -ción and a heavier usage of the letter "x" usually sign Galician.
Definite articles o (masc. sing.), os (masc. plural), a (fem. sing.), as (fem. plural)
Common diagraphs: nh (ningunha)
The letters j, k, w and y are not in the Alpha bet, and appear only in loanwords

Germanic languages

English

words:a,an,and,in,of,on,the,that,to,is,what,I(Iis always capital when talking about oneself)
letter sequences:th,ch,sh,ough,augh,qu
word endings:-ing,-tion,-ed,-age,-s,-’s,-’ve,-n’t,-’d
vast majority of words end with a consonant, or sometimes with an e. Some common exceptions:who,to,so,no,do,a,and a few names likeJulia.
diacritics or accents only in loanwords (piñata)

Dutch(Nederlands)

letter sequencesij(capitalized asIJ,and also found as a ligature,Ĳorĳ),ei,ou,au,oe,doubled vowels (but notii),kw,ch,sch,oei,ooi,aaianduw(especiallyeeuw,ieuw,auw,andouw).
all consonants, excepth,j,q,v,w,xandzcan be doubled.
the lettersc(except in the sequence(s)ch),q,xandyare almost only found in loanwords.
words:het, op, en, een, voor(and compounds ofvoor).
word endings:-tje,-sje,-ing,-en,-lijk,
at the start of words:z-, v-, ge-
t/moccasionally occurs between two points in time or between numbers (e.g. house numbers).

Afrikaans(Afrikaans)

Words:'n,as,vir,nie.
Similar toDutch,but:
- the common Dutch letterscandzare rare and used only in loanwords (e.g.chalet);
- the common Dutch vowelijis not used; instead,iandyare used (e.g.-lik,sy);
- the common Dutch word ending-enis rare, being replaced by-e.

German(Deutsch)

umlauts (ä, ö, ü), ess-zett (ß)
letter sequences:ch,sch,tsch,tz,ss,
common words:der,die,das,den,dem,des,er,sie,es,ist,ich,du,aber
common endings:-en,-er,-ern,-st,-ung,-chen,-tät
rare letters:x,y(except in loanwords)
lettercrarely used except in the sequences listed above and in loanwords
long compound words
a period (.) after ordinal numbers, e.g.3. Oktober
many capitalised words in the middle of sentences since German capitalizes all nouns.

Swedish(Svenska)

letters å, ä, ö, rarely é
common words:och,i,att,det,en,som,är,av,den,på,om,inte,men
common endings:-ning,-lig,-isk,-ande,-ade,-era,-rna
common surname endings:-sson,-berg,-borg,-gren,-lund,-lind,-ström,-kvist/qvist/quist
long compound words
letter sequences:stj,sj,skj,tj,ck,än
no use of charactersw,zexcept for foreign proper nouns and some loanwords butxis used, unlike Danish and Norwegian, which replace it withks
doubling of consonants common, but doubling of vowels very rare

Danish(Dansk)

letters æ, ø, å
common words:af, og, til, er, på, med, det, den;
common endings:-tion,-ing,-else,-hed;
long compound words;
no use of characterq,w,xandzexcept for foreign proper nouns and some loanwords;
to distinguish from Norwegian: uses letter combinationøj;frequent use ofæ;spellings of borrowed foreign words are retained (in particular use ofc), such ascentralstation.
doubling of consonants common, but doubling of vowels very rare

Norwegian(Norsk)

letters æ, ø, å
common words:av, ble, er, og, en, et, men, i, å, for, eller;
common endings:-sjon,-ing,-else,-het;
long compound words;
no use of characterc,w,zandxexcept for foreign proper nouns and some loanwords;
two versions of the language:Bokmål(much closer to Danish) andNynorsk– for exampleikke, lørdag, Norge(Bokmål) vs.ikkje, laurdag, Noreg(Nynorsk); Nynorsk uses the wordòg;printed materials almost always published in Bokmål only;
to distinguish from Danish: uses letter combinationøy;less frequent use ofæ;spellings of borrowed foreign words are ‘Norsified’ (in particular removing use ofc), such assentralstasjon.
doubling of consonants common, but doubling of vowels very rare

Icelandic(Íslenska)

lettersá, ð, é, í, ó, ú, ý, þ, æ, ö
common beginnings:fj-,gj-,hj-,hl-,hr-,hv-,kj-,andsj-,
common endings:-ar(especially-nar),-ir(especially-nir),-ur,-nn(especially-inn)
no use of characterc,q,w,orzexcept for foreign proper nouns, some loanwords, and, in the case ofz,older texts.
doubling of consonants common, but doubling of vowels very rare

Faroese(Føroyskt)

lettersá, ð, í, ó, ú, ý, æ, ø
letter combinations:ggj,oy,skt
to distinguish from Icelandic: does not use é or þ, uses ø instead of ö (occasionally rendered as ö on road signs, or even ő).
doubling of consonants common, but doubling of vowels very rare

Baltic languages

Latvian(Latviešu)

usesdiacritics:ā, č, ē, ģ, ī, ķ, ļ, ņ, ō, ŗ, š, ū, ž
does not have letters: q, w, x, y
no longer uses ō or ŗ in modern language
extremely rare doubling ofvowels
rare doubling ofconsonants
a period (.) after ordinal numbers, e.g.2005. gads
common words:ir,bija,tika,es,viņš

Lithuanian(Lietuvių)

visual abundance of letters ą, č, ę, ė, į, š, ų, ū, ų
does not have letters q, w, x
extremely rare doubling ofvowelsandconsonants
many varying forms (usually endings) of the same word, e.g. namas, namo, namus, namams, etc.
generally long words (absence of articles and fewer prepositions in comparison to Germanic languages)
common words:ir,yra,kad,bet.

Slavic languages

Polish(Polski)

consonant clustersrz, sz, cz, prz, trz
includes: ą, ę, ć, ś, ł, ń, ó, ż, ź
wordsw, z, we, i, na(several one-letter words)
wordsjest, się
words beginning withbył, będzie, jest(forms ofcopulabyć,"to be" ).

Czech(Čeština)

visual abundance of lettersž š ů ě ř
wordsje, v
to distinguish from Slovak: does not use ä, ľ, ĺ, ŕ or ô; ú only appears at the beginning of words.

Slovak(Slovenčina)

visual abundance of lettersž š č;
uses: ä, ľ, and ô and (very rarely) ĺ and ŕ;
typical suffixes:-cia,-ť;
to distinguish from Czech: does not use ě, ř or ů.

Croatian(Hrvatski)

similar to Serbian
letters-digraphsdž, lj, nj
does not have q, w, x, y
typical suffixes:-ti,-ći
special letters: č, ć, š, ž, đ
common words: a, i, u, je
to distinguish from Serbian: sequences-ije-and-je-are common; verbs ending in-irati,-iran

Serbian(Srpski/Српски)

Serbian Latin

similar to Croatian
letters-digraphsdž, lj, nj(lj and nj are somewhat more common than dž, although not by much)
no q, w, x, y
typical verb suffixes-ti,-ći(infinitive is much less used than in Croatian)
foreign words might end in-tija,-ovan,-ovati,-uje
special letters: đ (rare), č, š (common), ć, ž (less common)
common words: a, i, u, je, jeste
future tensesuffix-iće,-ićeš,-ićemo,-ićete(not found in Croatian)
vowel sequences-ije-and-je-are very often in Serbian that is spoken in Bosnia and Herzegovina, Montenegro and Croatia (ijekavica), but it does not appear in Serbia because each of those sequences are substituted with-e-(ekavica).

Serbian Cyrillic

uses Џ, Ј, Љ, Њ, Ђ, Ћ
does not use Щ, Ъ, Ы, Ь, Э, Ю, Я, Ё, Є, Ґ, Ї, І, Ў
to distinguish from Macedonian: does not use Ѕ, Ѓ, Ќ

Celtic languages

Welsh(Cymraeg)

lettersŴ, ŵused in Welsh
wordsy, yr, yn, a, ac, i, o
letter sequenceswy, ch, dd, ff, ll, mh, ngh, nh, ph, rh, th, si
letters not used:k, q, v, x, z
letter only used rarely, in loanwords:j
commonly accented letters:â, ê, î, ô, û, ŵ, ŷ,although acute (´), grave (`), and dieresis (¨) accents can hypothetically occur on all vowels
word endings:-ion, -au, -wr, -wyr
yis the most common letter in the language
wbetween consonants (win fact represents a vowel in the Welsh language)
circumflex accent (^) is by far the commonest diacritical mark, although diacritics are often omitted altogether

Irish(Gaeilge)

vowels with acute accents:á é í ó ú
words beginning with letter sequencesbp dt gc bhf
letter sequencessc cht
no use of the letter J, K, Q, V, W.
frequent bh, ch, dh, fh, gh, mh, th, sh
to distinguish from (Scottish) Gaelic: there may be words or names with the second (or even third) letter capitalized instead of the first:hÉireann.

Scottish Gaelic(Gàidhlig)

vowels with grave accents:à è ì ò ù(éandóstill occasionally seen but usage is now discouraged)
letter sequencessg chd
frequent bh, ch, dh, fh, gh, mh, th, sh
to distinguish from Irish: prefixes are hyphenated, so capitals in the middle of words generally do not occur:an t-Oban.

Albanian(Shqip)

unique letters:ë,ç.
ëis the most common letter in the language.
the letterwis not used except in loanwords.
dh,gj,ll,nj,rr,sh,th,xh,andzhare considered one letter instead of two.
common words: po, jo, dhe, i, të, me

Maltese(Malti)

unique letters: ċ, ġ, ħ, għ, ħ, ż
semitic origin, fairly intelligible with Arabic
uses il-xxx for the definite article

Iranian languages

Kurdish(Kurdî / كوردی)

uses circumflex ( ^ ): ê, î, û and cedilla ( ¸ ): ç, ş
the wordxwe(oneself, myself, yourself etc.) appears frequently and is highly specific (xwcombination)
( I, i ) is the most common letter in the language
uses eight vowels (a, e, ê, i, î, o, u, û)
impossible to find a word without any vowel
has lots of compound words

Finno-Ugric languages

Finnish(Suomi)

distinct letterså,äandö;but neverõorü(ytakes the place ofü)
b,f,z,šandžappear inloanwordsandproper namesonly; the last two are substituted withshorzhin some texts
c,q,w,x,åappear in (typically foreign) proper names only
outside of loanwords,dappears only between vowels or inhd
outside of loanwords,gonly appears inng
outside of loanwords, words do not begin with two consonants; this is reflected in the general syllable structure, where consonant clusters only occur across syllable boundaries, except in some loanwords
common words:sinä,on
common endings:-nen,-ka/-kä,-in,-t(plural suffix)
common vowel combinations:ai,uo,ei,ie,oi,yö,äi
unusually high degree of letter duplication, both vowels and consonants will be geminated, for exampleaa,ee,ii,kk,ll,ss,yy,ää
frequent long words

Estonian(Eesti)

distinct letters:õ,ä,öandü;but neverßorå
similar to Finnish, except:
- letteryis not used, except in loanwords (üis the corresponding vowel)
- lettersbandg(without precedingn) are found outside of loanwords
- occasional use ofšandž,mainly in loanwords (plus combinationtš)
- loanwords more common generally than in Finnish, mainly loaned from German
- words end in consonants more frequently than in Finnish, word-finalb,d,vbeing particularly typical
- letterdis much more common in Estonian than in Finnish, and in Estonian it is often the last letter of the word (plural suffix), which it never is in Finnish
- doubleöömore common than in Finnish; other doubles can includeõõ,üü,rarelyhh(for Germanch) and evenšš
common words:ja,on,ei,ta,see,või.

Hungarian(Magyar)

letters ő and ű (double acute accent) unique to Hungarian
accented lettersáandéfrequent
letter combinations:cs, dz, dzs, gy, ly, ny, sz, ty, zs(all classed as separate letters),leg‐, ‐obb(note:szalso common inPolish)
common words:a, az, ez, egy, és, van, hogy
letterkvery frequent (plural suffix)

Eskimo–Aleut languages

Greenlandic(Kalaallisut)

long polysynthetic words (a single word can number 30+ letters)
relatively abundantn,q(not necessarily followed byu),u
ubiquitous double consonants and vowels (aa,ii,qq,uu,more rarelyee,oo)
vowelsa,i,uconspicuously more frequent thane,o(which are only found beforeqandr)
no diphthongs except occasional word-finalai,only consonant combinations besides double consonants and(n)ngconsist ofr+ consonant
old spellings (now abolished in spelling reform) sometimes included acute accent, circumflex and/or tilde:Qânâqvs.Qaanaaq.

Southern Athabaskan languages

vowels with acute accent,ogonek(nasal hook), or both: á, ą, ą́
doubled vowels: aa, áá, ąą, ą́ą́
slashedl:ł (check not Polish!)
nwith acute accent: ń
quotation mark: ' or ’
sequences: dl, tł, tł’, dz, ts’, ií, áa, aá
may have rather long words

Navajo(Diné bizaad)

In addition to the above,

doesnotuseu,ú,orų

(Mescalero / Chiricahua) (Mashgaléń / Chidikáágo)

In addition to the above,

uses: u, ú, ų
doesnotuseo,ó,orǫ

Guaraní

lots of tildes over vowels (including y) and n
tilde over g: g̃—it's the only language in the world to use it. Example words:hagũaandg̃uahẽ.
b, d, and g usually do not occur without m or n before (mb, nd, ng) unless they're Spanish loan words.
f, l, q, w, x, z extremely rare outside loan words
does not use c without h: ch

JapaneseinRomaji(Nihongo/ Nhật Bản ngữ)

words:desu, aru, suru,esp. at end of sentences;
word endings:-masu, -masen, -shita;
letters: Japanese almost always alternates between a consonant and a vowel. Exceptions aredigraphsshiandchi,affricatetsu,gemination(two of the same consonant in a row) andpalatalization(a consonant followed by the lettery).
a macron or circumflex may be used to indicate doubled vowels, eg.Tōkyō
common words:no, o, wa, de, ni

(Note: Romaji is not often used in Japanese script. It is most often used for foreigners learning the pronunciation of the Japanese language.)

Hmong(Hmoob) written inRomanized Popular Alphabet

Almost all written words are quite short (one syllable).
Syllables (unless they are pronounced with mid tone) end in a tone letter: one ofb s j v m g d,leading to apparent "consonant clusters" such as-wj
wcan be the main vowel of a syllable (e.g.tswv)
Syllables can begin with sequences such ashm-, ntxh-, nq-.
Syllables ending in double vowels (especially-oo, -ee) possibly followed by a tone letters (as inHmoob"Hmong" ).

Vietnamese(tiếng Việt)

Roman characters with more than one diacritical mark on the same vowel. Seeabove.
Almost all written words are quite short (one syllable, mostly less than six characters long).
Words beginning withngorngh
Words ending withnh
common words:cái, không, có, ở, của, và, tại, với, để, đã, sẽ, đang, tôi, bạn, chúng, là

Vietnamese Quoted-Readable (VIQR)

The following characters (often in combination) after vowels: ^ ( + ' `? ~.
DD, Dd, or dd
The following character before punctuation: \

VietnameseVNIencoding

The digits 1-8 after vowels
The digit 9 after a D or d
The following character before numbers: \

VietnameseTelex

The following characters after vowels: s f r x j
The following vowels, doubled up: a e o
The letterwafter the following characters: a o u
DD, Dd, or dd

Chinese, Romanized

Standard Mandarin(Hiện đại tiêu chuẩn Hán ngữ)

In general, Mandarin syllables end only in vowels or n, ng, r; never in p, t, k, m

Pinyin

Words beginning with x, q, zh
Tone marks on vowels, such as ā, á, ǎ, à
- For convenience while using a computer, these are sometimes substituted with numbers, e.g. a1, a2, a3, a4

Wade–Giles

Words do not begin withb, d, g, z, q, x, r
Words beginning withhs
Many hyphenated words
Apostrophes after initial letters or digraphs, e.g.t'a, ch'i

Gwoyeu Romatzyh

Many unusual vowel combinations such as ae, eei, ii, iee, oou, yy, etc.
Insertion of r, e.g. arn, erng, etc.
Words ending in nn, nq

Southern Min / Min-Nan(Bân-lâm-gí/Bân-lâm-gú) inPe̍h-ōe-jī

Many hyphenated words.
Words can end in p, t, k, m, n, ng, h; never r
Roman characters with many diacritical marks on vowels. Unlike Vietnamese, each character has at most one such mark.
Unusual combining characters, namely · (middle dot, always aftero) and | (vertical bar). ¯ (macron) is also common.

Austronesian languages

Malay(bahasa Melayu) andIndonesian(bahasa Indonesia)

May contain the following:
Prefixes:me-, mem-, memper-, pe-, per-, di-, ke-
Suffixes:-kan, -an, -i
Others (these almost always written in lowercase):yang, dan, di, ke, oleh, itu

MalayandIndonesianare mutually intelligible to proficient speakers, although translators and interpreters will generally be specialists in one or other language. SeeComparison of Standard Malay and Indonesian.

Frequent use of the letter 'a' (comparable to the frequency of the English 'e').

Polynesian languages

Most Polynesian languages use A E F G H I K L M N O P R S T U V andʻ(sometimes written ' or Q)

- L: Nuclear Polynesian languages (Tongan,Samoan,Tuvaluan,Tokelauan...) as infale
- R: Eastern Polynesian languages (NZ Māori,Tahitian,Cook Islands Māori,Rapa Nui...) as infare
- K: most Polynesian languages exceptHawaiian,Samoan, Tahitian
- H: most Polynesian languages except Samoan
- WH: NZ Māori (whenua)
Consonants always separated by one or more vowels (fenua,Haʻapai,ʻolelo)
Short and long vowels, written either with a macron (āēīōū) or by replication (aa, ee, ii, oo, uu)
Frequent diphtongs (oiaue,māori)
Words always end with a vowel
Loanwords are translitterated (like in Japanese):Sesu Kilisito=Jesus Christ,polokalama=program)
Frequent English or French loanwords (depending on colonial history)

Tongan(lea fakatonga)

A E F H I K L M N NG O P S T U V ʻ
ng (Tonga), h, endings in -onua (fonua)
articlete
frequent words: 'o, te, ki, mei, i, faka-
English loanwords

Samoan(gagana samoa)

A E F G I L M N O P S T U V ʻ
no K letter, uses okina (ʻ) or nothing instead (fakain Tongan isfaʻain Samoan)
frequent use of L (le)
frequent words:o,e,le,se,a,i,ma

Wallisian(lea faka'uvea)

A E F G H I K L M N O P S T U V ʻ
distinguish from Tongan: g instead of ng (tokaga)
articlete
h is more frequent than s (tahi)
frequent words: ko, te, ki, mai, i, o, ne'e, e, mo, faka-
French loanwords

East Futunan(lea fakafutuna)

A E F G H I K L M N O P S T U V ʻ
articlele
frequent words: ko, le, ki, mei, i, o, mo, faka-
distinguish from Wallisian: S is more frequent than H (tasi)
distinguish from Samoan: letter K
French loanwords

Turkic languages

Note that some Turkic languages likeAzeriandTurkmenuse a similarLatin Alpha bet(oftenJaŋalif) and similar words, and might be confused with Turkish. Azeri has the letters Əə, Xx and Qq not present in the Turkish Alpha bet, and Türkmen has Ää, Žž, Ňň, Ýý and Ww. Latin Characters uniquely (or nearly uniquely) used for Turkic languages: Əə, Ŋŋ, Ɵɵ, Ьь, Ƣƣ, Ğğ, İ, and ı. All Turkic languages can form long words by adding multiple suffixes.

Turkish(Türkçe/Türkiye_Türkçesi)

Turkish Alphabet

Lowercase: a b c ç d e f g ğ h ı i j k l m n o ö p r s ş t u ü v y z

Uppercase: A B C Ç D E F G Ğ H I İ J K L M N O Ö P R S Ş T U Ü V Y Z

Common words

bir— one, a
bu— this
ancak— but
oldu— was (happened)
şu— that

Misc.

The letter "j" is only used in loanwords.
Words never begin with "ğ"
Look for common word endings. Tense changes in Turkish verbs are created by adding suffixes to the end of the verb. Pluralizations occur by adding-larand-ler.
- Common Tense Changes:-yor-mış-muş-sun
- Possessivity/person:-im-un-ın-in-iz-dur-tır
- Example:Yaptı,"[He] did it";Yapis the verb stem meaning "to do",-mışindicates the perfect tense,-tırindicates the third person (he/she/it).
- Example:Adalar,"Islands";Adais a noun meaning "island",-larmakes it plural.)
- Example:Evimiz,"Our house";Evis a noun meaning "house",-imindicates the first-person possessor, which-izthen makes plural.)

Azeri(Azərbaycanca)

Azeri can be easily recognized by the frequent use ofə.This letter is not used in any other officially recognized modern Latin Alpha bet. In addition, it uses the lettersxandq,which are not used in Turkish.

Common words:və,ki,ilə,bu,o,isə,görə,da,də
Frequent use of diacritics:ç,ğ,ı,İ,ö,ş,ü
Words ending in-lar,-lər,-ın,-in,-da,-də,-dan,-dən
Words never beginning withğorı
Words rarely beginning with two or more consonants
Transliteration of foreign words and names, e.g.Audrey Hepburn=Odri Hepbern

Chinese(Tiếng Trung)

No spaces, except between punctuation marks and (sometimes) foreign words.
Arabic numerals (0-9) sometimes used
Punctuation:
- Period. (not.)
- Serial comma, (distinguished from the regular comma, )
- Ellipse…… (six dots)
Nohiragana,katakana,orhangul
May be written vertically

Simplified Chinese(Giản thể) vsTraditional Chinese(Phồn thể)

Note: Many characters were not simplified. As a result, it is common for a short word or phrase to be identical between Simplified and Traditional, but it is rare for an entire sentence to be identical as well.

Common radicals different between Traditional and Simplified:

Simplified:讠钅 thực mịch môn(e.g.Ngữ bạc cơm kỷ hỏi)
Traditional:訁釒 thực 糹 môn(e.g.Ngữ bạc cơm kỷ hỏi)

Common characters different between Traditional and Simplified:

Simplified:Quốc sẽ này tới đối khai quan môn khi cái thư trường vạn biên đông xe ái nhi
Traditional:Quốc sẽ này tới đối khai quan môn khi cái thư trường vạn biên đông xe ái nhi

Standard written Chinese (based on Mandarin) vs written Vernacular Cantonese

Note: Apart from Hong Kong, there are also Cantonese-speakers in southern Mainland China, Malaysia and Singapore^[1],so written Cantonese can be written in either Simplified or Traditional characters.

Common characters in Vernacular Cantonese that do not occur or seldom occur in Mandarin:

Khái tả cám lê 啲 ngô cừ miết dã 嗰 mão liếc

Some of the above characters are not supported in all character encodings, so sometimes the khẩu radical on the left is substituted with a0oro,e.g.

o đã 0 đã

Sometimes, different Chinese characters are used to express the same meaning in Cantonese and Mandarin. If you use the one commonly used in Cantonese to express the same meaning when you are speaking or writing Mandarin, a native speaker may be confused or even find it difficult to understand, and vice versa. Some examples are: (Cantonese vs Mandarin)

Thực vs ăn (eat) uống vs uống (drink) xí vs trạm (stand) đông lạnh vs lãnh (cold) lạc vs hạ (down) vs xuyên (wear) đọc vs niệm (read) nháo vs mắng (scold) kế vs tính (calculate) mễ vs đừng (do not) hành vs đi (walk/go) trước vs mới (then)

There are Chinese words used to construct vocabularies used in Cantonese that are not or seldomly implemented in modern Mandarin. Some examples are: (Cantonese vs Mandarin)

Suốt ngày vs cả ngày (always) khuynh kế vs nói chuyện phiếm (talk) làm lại vs đi làm (go to work) ôn thư vs ôn tập (study) phim nhựa vs video (video) cách ly vs bên cạnh (nearby) khởi phòng vs cái lâu (build a house) nghe ngày vs ngày mai (tomorrow) ba bế vs kiêu ngạo (arrogant) làm ước lượng vs hoàn thành (finished) định hệ vs vẫn là (or) anh đẹp trai vs soái ca (handsome male) chung ý vs thích (like) sắc bén vs lợi hại (powerful) cùng chôn vs cùng / cập (and) li tuyến vs điên (crazy) tuyết quầy vs tủ lạnh (fridge)

Cantonese vocabularies constructed by Cantonese words are used in daily life in southern China and are not used in modern Mandarin. Some examples are:

Mễ cám (don't be like this) hảo mão (ok?) chơi dã (to play tricks) làm dã (to work) liếc diễn (to watch a film/movie) ngô biết (don't know) chôn lê (come) 嗰 cái (that) cám khái dã (such thing) cừ địa (they) mị sự / miết sự (what?) mão dã (nothing) 嗰 trận (at that moment) càng lê càng nhiều (more and more) ta khái (mine) ngạnh hệ (of course) 𥄫(to peek) lâm cừ (love him/her) xách tí ta (take it to me) 嘥 phơi (everything is wasted) ngươi 啱(you are right) 𢫏 trụ (to cover something) 冚 phủng 唥(all) khấm thật (to press something tightly) 瞓 giác (to sleep) 掟 thạch tử (to throw a tiny stone) xa [a modal word to express comtemption] 噃[a modal word for reminding or warning someone] 詏 giao (to argue) hảo điểu (very angry) tâm ấp (feeling depressed in heart) 𧨾 nữ tử (to please a girl) đến cám nhiều sao (only this much) làm tốt tả (done something well)

Finally, when terms are introduced from other countries(especially the US and the UK) to China, Cantonese and Mandarin often get different translations, where Cantonese often translates according to pronunciation of the terms in English and Mandarin often translates according to the meaning of the terms. Some examples are: (Cantonese vs Mandarin)

Sĩ (dik1 si2, has no direct meaning, translated according to the English pronunciation.) vs xe taxi (chū zū chē, meaning cars for renting.), translated from Taxi.
Xe buýt (baa1 si2, has no direct meaning, translated according to the English pronunciation.) vs xe bus (gōng chē, meaning public cars.), translated from Bus.
Nhiều sĩ (do1 si2, has no direct meaning, translated according to the English pronunciation.) vs thổ ty (tǔ sī, has no direct meaning, translated according to the English pronunciation.), translated from Toast.
Tao (sou1, has no direct meaning, translated according to the English pronunciation.) vs tú (xìu, has no direct meaning, translated according to the English pronunciation), translated from Show.
Sĩ nhiều (si2 do1, has no direct meaning, translated according to the English pronunciation) vs tiểu điếm (xiǎo diàn, meaning small shop), translated from Store.
𨋢(lip1, has no direct meaning, translated according to the English pronunciation) vs thang máy (shēng jiàng jī, meaning machine that elevates and lowers itself), translated from Lift/Elevator.
Bẻ bái (baai1 baai3, has no direct meaning, translated according to the English pronunciation) vs tái kiến (zài jiàn, meaning see you again), translated from Byebye/Goodbye.

Japanese(Nhật Bản ngữ)

Katakana(カタカナ) andhiragana(ひらがな) characters mixed withkanji( chữ Hán )
No spaces
Number system = Arabic Numerals (1,2,3 etc.)
Punctuation:
- Period.
- Comma, (, also used in double byte)
- Quotation marks “”
Occasional small characters beside large ones, eg. しゃりゅしょってシャリュショッテ
Double tick marks (known as daku-on) appearing at upper right of characters, eg. でがずデガズ
Empty circles (known as handaku-on) appearing at upper right of characters, eg. ぱぴパぴ
Frequent characters: のをはが
Originally written vertically(books, school, etc.) but mostly appears horizontal online.

Korean(한국어/조선말)

Western-style punctuation marks
Western-style spacing
Hangulletters(phonetic) ex: ㅂ(b in book) ㅈ(j in jump) ㅅ(s in sock)ㅊ(ch in champion) ㅍ(p in pox)
Hangul letters used to form syllable blocks; e.g. ㅅ s + ㅓ o + ㅇ ng = 성 song
Circles and ellipses are commonplace in Hangul; are exceedingly rare in Chinese.
General appearance has relatively-uniform complexity, as contrasted with Chinese or Japanese.

Khmer language ភាសារខ្មែរ

Khmer is written using the distinctiveKhmer Alpha bet.

rarely uses spaces
Letters have a distinctively "taller" shape than other Brahmic scripts.
UsesKhmer numeralsin writing ១ ២ ៣ ៤ ៥ ៦ ៧ ៨ ៩.
Has smaller version of consonants placed below main consonants that may appear clustered
Has 24 diacritics denoting syllable rhymes - ា ិ ី ឹ ឺ ុ ូ ួ ើ ឿ ៀ េ ែ ៃ េា ៅ ុំ ំ ាំ ះ ុះ េះ ោះ
Uses this as a full stop: ។

Greek(Ελληνικά)

Modern Greek is written withGreek Alpha betinmonotonic,polytonicoratonic,either according to Demotic (Mr.Triantafilidis) grammar orKatharevousagrammar. Some people write inGreeklish(Greek with Latin script) which is either Visual-based,orthographicorphoneticor just messed-up (mixed). The only official orthographic forms of Greek language are Monotonic and Polytonic.

Normal Modern Greek (Greek Monotonic)

wordsκαι, είναι;
Each multi-syllable word has one accent/tone mark (oxia): ά έ ή ί ό ύ ώ
The only other diacritic ever used is the tréma: ϊ/ΐ, ϋ/ΰ, etc.

Pre-1980s Greek (Greek Polytonic)

Katharevousa,Dimotiki(Triantafylidis' grammar)

Diacritics: ά, ᾶ, ἀ, ἁ, and combinations, also with other vowels.
Some texts, especially in Katharevousa, also have ὰ, ᾳ, in combination with other diacritics.

Ancient Greek

Diacritics: ά, ὰ, ᾶ, ἀ, ἁ, ᾳ, and combinations, also with other vowels; ῥ; tilde (ᾶ) often appears more like a rounded circumflex
some texts feature lunate sigma (looks like c) instead of σ/ς

Greek Atonic

Was common in some Greek media (television);
You will see Greek characters without accents/tones;
words:και, ειναι, αυτο.

Greek inGreeklish

Automated conversion software for Greeklish->Greek conversion exists. If you notice a Greeklish text it may be useful for the Greek el.wikipedia (after conversion).
Keep in mind: in Greeklish more than one character may be used for one letter. (example: th forΘ(theta)).

Orthographic Greeklish

wordskai, einai.

Phonetic Greeklish

wordske, ine;
Omega appears as o;
ei, oi appear as i;
ai appears as e.

Visual-based Greeklish

Omega (Ω or ω) may appear as W or w;
epsilon (E) may appear as3;
Alpha (A) may appear as4;
theta (Θ) may appear as8;
upsilon (Y) may appear as\|/;
gamma (γ) may appear asy
More than one character may be used for one letter.

Messed-up (Mixed) Greeklish

wordskai, eine;
combines principles of phonetic, visual-based and orthographic Greeklish according to writer'sidiosyncrasy;
The most commonly used form of Greeklish.

Armenian(Հայերեն)

Armenian can be recognized by its unique 39-letter Alpha bet:

Ա Բ Գ Դ Ե Զ Է Ը Թ Ժ Ի Լ Խ Ծ Կ Հ Ձ Ղ Ճ Մ Յ Ն Շ Ո Չ Պ Ջ Ռ Ս Վ Տ Ր Ց Ւ Փ Ք ԵՎ(և) Օ Ֆ

Georgian(ქართული)

Georgian can be recognised by its unique Alpha bet (note some characters have fallen out of use).

ა ბ გ დ ე ვ ზ ჱ თ ი კ ლ მ ნ ჲ ო პ ჟ რ ს ტ ჳ უ ფ ქ ღ ყ შ ჩ ც ძ წ ჭ ხ ჴ ჯ ჰ ჵ ჶ ჷ ჸ

Cyrillic Alpha bet

Bolding denotes letters unique to the language

Slavic languages

Belarusian(беларуская)

uses: ё, і, й, ў, ы, э, ’
features:шчused instead ofщ
the only Cyrillic language not to feature и.

Bulgarian(български)

uses: ъ, щ, я, ю, й
words: със, в
features: many words end in definite article –ът, –ят, –та, –то, –те

Macedonian(македонски)

uses: ј, љ, њ, џ,ѓ,ќ,ѕ
words: во, со
features:рis usually found between consonants, for exampleпрвин

Russian(русский)

uses: ё, й, ъ (rarely), ы, э, щ

Serbian(српски)

uses: ј, љ, њ, џ,ђ,ћ
does not use: ъ, щ, я, ю, й
words: је, у
features: large consonant clusters, for exampleсрпски

Ukrainian(українська)

uses: є, и, і, ї, й, ґ, є щ, ’
does not use: ъ, ё, ы, э

Mongolian

uses: ө, ү
does not use: ё, й, к, щ, ъ, ы, ь, ю, я
used only in names or borrowed words: в, е, з, ф, ц

Montenegrin

uses: З́, С́

Ossetian

uses: ӕ

Arabic Alpha bet

All languages using the Arabic Alpha bet are written right-to-left.
A number of other languages have been written in the Arabic Alpha bet in the past, but now are more commonly written in Latin characters; examples includeTurkish,SomaliandSwahili.

Arabic(العربية)

reversed question mark: ؟
short vowels are not written, so many words are written with no vowel at all
common prefix: -الـ
common suffix: ة -ـة-
words: إلى، من، على

Persian(فارسی)

Except in very rare case, verbs are at the end of a phrase.

common verbs: کرد، بود، شد، است، میشود
uses: پ، چ، ژ، گ
words: که، به

Urdu(اردو)

uses: ‮ٹ‎، ڈ‎، ڑ‎، ں، ے
many words ending in ے
words: اور، ہے
to distinguish from Arabic: in many texts, Urdu is written stylistically with words ‘slanting’ downwards from top-right to bottom-left (unlike the ‘linear’ style of Arabic, Persian etc.).

Syriac Alphabet

Syriac(ܐܬܘܪܝܐ)

short vowels are not usually written so many words are written with no vowel at all
three styles of writing (estrangela, serto, mahdnaya) and two different ways of representing vowels
basic Alpha bet in Estrangela style is: ܐ ܒ ܓ ܕ ܗ ܘ ܙ ܚ ܛ ܝ ܟ ܠ ܡ ܢ ܣ ܥ ܦ ܨ ܩ ܪ ܣ ܬ
basic Alpha bet in Serto style is:ܬ‎,ܫ‎,ܪ‎,ܩ‎,ܨ‎,ܦ‎,ܥ‎,ܣ‎,ܢ‎,ܡ‎,ܠ‎,ܟ‎,ܝ‎,ܛ‎,ܚ‎,ܙ‎,ܘ‎,ܗ‎,ܕ‎,ܓ‎,ܒ‎,ܐ‎
basic Alpha bet in Madnhaya style is:ܬ‎,ܫ‎,ܪ‎,ܩ‎,ܨ‎,ܦ‎,ܥ‎,ܣ‎,ܢ‎,ܡ‎,ܠ‎,ܟ‎,ܝ‎,ܛ‎,ܚ‎,ܙ‎,ܘ‎,ܗ‎,ܕ‎,ܓ‎,ܒ‎,ܐ‎

Dravidian languages

All Dravidian languages are written from left to right.
All Dravidian languages have different scripts. But similarity can be found in their orthography.

Kannada

Kannada has a 49 letter Alpha bet.

Tamil

common word endings:ள்ளது, கிறது, கின்றன, ம்
common words: தமிழ், அவர், உள்ள, சில
Tamil has a unique 30-letter Alpha bet. With the help of diacritics, as many as 247 letters can be written.

அ ஆ இ ஈ உ ஊ எ ஏ ஐ ஒ ஓ ஔ க ங ச ஞ ட ண த ந ப ம ய ர ல வ ழ ள ற ன

Telugu

Telugu has 56 characters (Aksharamulu) including vowels (Achchulu) and consonants (Hallulu). Telugu uses eighteen vowels, each of which has both an independent form and a diacritic form used with consonants to create syllables. The language makes a distinction between short and long vowels.

అ ఆ ఇ ఈ ఉ ఊ ఋ ౠ ఌ ౡ ఎ ఏ ఐ ఒ ఓ ఔ అం అః క ఖ గ ఘ ఙ చ ఛ జ ఝ ఞ ట ఠ డ ఢ ణ త థ ద ధ న ప ఫ బ భ మ య ర ఱ ల ళ వ శ ష స హ

౦ ౧ ౨ ౩ ౪ ౫ ౬ ౭ ౮ ౯

Bengali

The Bengali Alpha bet or Bangla Alpha bet (Bengali: বাংলা বর্ণমালা, bangla bôrnômala) or Bengali script (Bengali: বাংলা লিপি, bangla lipi) is the writing system, originating in the Indian subcontinent, for the Bengali language and is the fifth most widely used writing system in the world. The script is used for other languages like Assamese, Maithili, Meithei and Bishnupriya Manipuri, and has historically been used to write Sanskrit within Bengal.

Bengali

Bengali has unique 50 letter Alphabet.

The Bengali script has a total of 9 vowel graphemes, each of which is called a স্বরবর্ণ swôrôbôrnô "vowel letter". The swôrôbôrnôs represent six of the seven main vowel sounds of Bengali, along with two vowel diphthongs. All of them are used in both Bengali and Assamese languages.

অ আ ই ঈ উ ঊ ঋ এ ঐ ও ঔ

The Bengali script has a total of 39 Consonants. Consonant letters are called ব্যঞ্জনবর্ণ bænjônbôrnô "consonant letter" in Bengali. The names of the letters are typically just the consonant sound plus the inherent vowel অ ô. Since the inherent vowel is assumed and not written, most letters' names look identical to the letter itself (the name of the letter ঘ is itself ghô, not gh).

ক খ গ ঘ ঙ চ ছ জ ঝ ঞ ট ঠ ড ঢ ণ ত থ দ ধ ন প ফ ব ভ ম য র ল শ ষ স হ ড় ঢ় য় ৎ ঃ ং ঁ

has 10 diacritics denoting syllable rhymes -

া ি ী ু ূ ৃ ে ৈ ো ৌ

Assamese

The Assamese script has a total of 9 vowel graphemes, each of which is called a স্বরবর্ণ swôrôbôrnô "vowel letter" too.

অ আ ই ঈ উ ঊ ঋ এ ঐ ও ঔ

has a total of 39 Consonants. Consonant letters are called ব্যঞ্জনবর্ণ bænjônbôrnô "consonant letter" in Bengali.

ক খ গ ঘ ঙ চ ছ জ ঝ ঞ ট ঠ ড ঢ ণ ত থ দ ধ ন প ফ ব ভ ম য ৰ ল শ ষ স হ ড় ঢ় য় ৎ ঃ ং ঁ

has 10 diacritics denoting syllable rhymes -

া ি ী ু ূ ৃ ে ৈ ো ৌ

Canadian Aboriginal syllabics

In modern writing,Canadian Aboriginal syllabicsare indicative ofCree languages,Inuktitut,orOjibwe,though the latter two are also written in alternative scripts. The basic glyph set is ᐁ ᐱ ᑌ ᑫ ᒉ ᒣ ᓀ ᓭ ᔦ, each of which may appear in any of four orientations, boldfaced, superscripted, and with diacritics including ᑊ ᐟ ᐠ ᐨ ᒼ ᐣ ᐢ ᐧ ᐤ ᐦ ᕽ ᓫ ᕑ. Thisabugidahas also been used forBlackfoot.

Other North American syllabics

Cherokee

Cherokee writing features a unique syllabary consisting of the following characters:

ᎡᎢᎣᎤᎥᎦᎧᎨᎩᎪᎫᎬᎭᎮᎯᎰᎱᎲᎳᎴᎵᎶᎷᎸᎹᎺᎻᎼᎽᎾᎿᏀᏁᏂᏃᏄᏅᏆᏇᏈᏉᏊᏋᏌᏍᏎᏏᏐᏑᏒᏓᏔᏕᏖᏗᏘᏙᏚᏛᏜᏝᏞᏟᏠᏡᏢᏣᏤᏥᏦᏧᏨᏩᏪᏫᏬᏭᏮᏯᏰᏱᏲᏳᏴ.

Artificial languages

Esperanto(Esperanto)

words:de,la,al,kaj
Six accented letters:ĉ Ĉ ĝ Ĝ ĥ Ĥ ĵ Ĵ ŝ Ŝ ŭ Ŭ,their correspondingH-systemrepresentationch Ch gh Gh hh Hh jh Jh sh Sh u Uor their correspondingX-systemrepresentationcx Cx gx Gx hx Hx jx Jx sx Sx ux Ux
words ending ino,a,oj,aj,on,an,ojn,ajn,as,os,is,us,u,i,aŭ

Klingon(tlhIngan Hol)

When written in the Latin Alpha bet Klingon has the unusual property of a distinction in case;qandQare different letters, and other letters are either always (e.g. D, I, S) or never (e.g. ch, tlh, v) written in upper case. This causes a large number of words that look quite strange to people who aren't used to it, for example:yIDoghQo',tlhIngan Hol(with mixed case).
The apostrophe is fairly frequent, especially at the end of a word or syllable.
Common suffixes:-be', -'a'
Common words:'oH,Qapla'
May use one or more apostrophes in the middle of a word:SuvwI″a'

Lojban(lojban.)

(almost) all lowercase;
common wordslo,mi,cu,la,nu,do,na,se;
paragraphs delimited withni'oand sentences delimited with.i(ori);
many five-letter words in consonant-vowel shapeCCVCVorCVCCV;
many short words with apostrophes between vowels, likeko'api'oetc.;
usually no punctuation except for dots;
may use commas in the middle of words (typically proper nouns).

Toki Pona(toki pona)

Alpha bet is all lowercase except names/loanwords
no diacritics
only usesunvoiced consonantsin writing, e.g.p,t,k

Full Alpha bet: p, t, k, s, m, n, l, j, w, a, e, i, o, u

common wordsli,mi,e,sina,ona,jan
often sounds like a simplified and phonetic form of English or Swedish
many two-syllable words

External links

Language Identification Web Service,language detection API, 100+ languages supported
Google Translate,Google's translation service.
Xerox,an online language identifier, 47 languages supported
Language Guesser,a statistical language identifier, 74 languages recognized
NTextCat - free Language Identification API for.NET (C#):280+ languages available out of the box. Recognizes language and encoding (UTF-8,Windows-1252,Big5,etc.) of text.Monocompatible.

^https:// oakton.edu/user/4/billtong/chinaclass/Language/cantonese.htm

[1] ttps:// oakton.edu/user/4/billtong/chinaclass/Language/cantonese.htm

[1]

Characters

Latin Alpha bet (possibly extended)

French(Français)

Spanish(Español)

Italian(Italiano)

Catalan(Català)

Romanian(Română)

Portuguese(Português)

Walloon(Walon)

Galician(Galego)

Dutch(Nederlands)

West Frisian(Frysk)

Afrikaans(Afrikaans)

German(Deutsch)

Swedish(Svenska)

Danish(Dansk)

Norwegian(Norsk)

Icelandic(Íslenska)

Faroese(Føroyskt)

Latvian(Latviešu)

Lithuanian(Lietuvių)

Polish(Polski)

Czech(Čeština)

Slovak(Slovenčina)

Croatian(Hrvatski)

Serbian(Srpski/Српски)

Welsh(Cymraeg)

Irish(Gaeilge)

Scottish Gaelic(Gàidhlig)

Albanian(Shqip)

Maltese(Malti)

Kurdish(Kurdî / كوردی)

Finnish(Suomi)

Estonian(Eesti)

Hungarian(Magyar)

Greenlandic(Kalaallisut)

Navajo(Diné bizaad)

(Mescalero / Chiricahua) (Mashgaléń / Chidikáágo)

JapaneseinRomaji(Nihongo/ Nhật Bản ngữ)

Hmong(Hmoob) written inRomanized Popular Alphabet

Vietnamese(tiếng Việt)

Vietnamese Quoted-Readable (VIQR)

VietnameseVNIencoding

VietnameseTelex

Chinese, Romanized

Standard Mandarin(Hiện đại tiêu chuẩn Hán ngữ)

Southern Min / Min-Nan(Bân-lâm-gí/Bân-lâm-gú) inPe̍h-ōe-jī

Malay(bahasa Melayu) andIndonesian(bahasa Indonesia)

Tongan(lea fakatonga)

Samoan(gagana samoa)

Wallisian(lea faka'uvea)

East Futunan(lea fakafutuna)

Turkish(Türkçe/Türkiye_Türkçesi)

Turkish Alphabet

Common words

Misc.

Azeri(Azərbaycanca)

Chinese(Tiếng Trung)

Simplified Chinese(Giản thể) vsTraditional Chinese(Phồn thể)

Standard written Chinese (based on Mandarin) vs written Vernacular Cantonese

Japanese(Nhật Bản ngữ)

Korean(한국어/조선말)

Greek(Ελληνικά)

Normal Modern Greek (Greek Monotonic)

Pre-1980s Greek (Greek Polytonic)

Ancient Greek

Greek Atonic

Greek inGreeklish

Orthographic Greeklish

Phonetic Greeklish

Visual-based Greeklish

Messed-up (Mixed) Greeklish

Armenian(Հայերեն)

Georgian(ქართული)

Belarusian(беларуская)

Bulgarian(български)

Macedonian(македонски)

Russian(русский)

Serbian(српски)

Ukrainian(українська)