Jump to content

CJK characters

From Wikipedia, the free encyclopedia
Translation of "That old man is 72 years old" inVietnamese,Cantonese,Mandarin(insimplifiedandtraditional characters),Japanese,andKorean.

Ininternationalization,CJK charactersis a collective term forgraphemesused in theChinese,Japanese,andKorean writing systems,which each includeChinese characters.The termCJKValso includesChữ Nôm,the Chinese-originlogographicscript formerly used for theVietnamese language.

Character repertoire[edit]

Standard Mandarin Chinese and Standard Cantonese are written almost exclusively in Chinese characters. Over 3,000 characters are required for generalliteracy,with up to 40,000 characters for reasonably complete coverage. Japanese uses fewer characters—general literacy in Japanese can be expected with 2,136 characters. The use of Chinese characters in Korea is increasingly rare, although idiosyncratic use of Chinese characters in proper names requires knowledge (and therefore availability) of many more characters. Even today, however, South Korean students are taught1,800 characters.

Other scripts used for these languages, such asbopomofoand theLatin-basedpinyinfor Chinese,hiraganaandkatakanafor Japanese, andhangulfor Korean, are not strictly "CJK characters", although CJK character sets almost invariably include them as necessary for full coverage of the target languages.

ThesinologistCarl Leban (1971) produced an early survey of CJK encoding systems.

Until the early 20th century,Classical Chinesewas the written language of government and scholarship in Vietnam. Popular literature inVietnamesewas written in thechữ Nômscript, consisting of Chinese characters with many characters created locally. Since the 1920s, the script since then used for recording literature has been the Latin-basedVietnamese alphabet.[1][2]


The number of characters required for complete coverage of all these languages' needs cannot fit in the 256-character code space of 8-bitcharacter encodings,requiring at least a 16-bit fixed width encoding or multi-byte variable-length encodings. The 16-bit fixed width encodings, such as those fromUnicodeup to and including version 2.0, are now deprecated due to the requirement to encode more characters than a 16-bit encoding can accommodate—Unicode 5.0 has some 70,000 Han characters—and the requirement by the Chinese government that software in China support theGB 18030character set.

Although CJK encodings have common character sets, the encodings often used to represent them have been developed separately by different East Asian governments and software companies, and are mutually incompatible.Unicodehas attempted, with some controversy, to unify the character sets in a process known asHan unification.

CJK character encodings should consist minimally of Han characters plus language-specific phonetic scripts such aspinyin,bopomofo,hiragana, katakana and hangul.

CJK character encodings include:

The CJK character sets take up the bulk of the assignedUnicodecode space. There is much controversy among Japanese experts of Chinese characters about the desirability and technical merit of theHan unificationprocess used to map multiple Chinese and Japanese character sets into a single set of unified characters.[citation needed]

All three languages can be written bothleft-to-right and top-to-bottom(right-to-left and top-to-bottom in ancient documents), but are usually considered left-to-right scripts when discussing encoding issues.

Legal status[edit]

Libraries cooperated on encoding standards forJACKPHYcharacters in the early 1980s. According toKen Lunde,the abbreviation "CJK" was a registeredtrademarkofResearch Libraries Group[3](which merged withOCLCin 2006). The trademark owned by OCLC between 1987 and 2009 has now expired.[4]

See also[edit]


Works cited[edit]

  • Coulmas, Florian (1991).The writing systems of the world.Blackwell.ISBN978-0-631-18028-9.
  • DeFrancis, John (1977).Colonialism and language policy in Viet Nam.The Hague: Mouton.ISBN978-90-279-7643-7.


External links[edit]