Gujarati/How to use Unicode in creating Gujarati script

How-tos bookshelf|How to use Unicode in creating Gujarati script

Introduction

This page contains anIndic script.Without sufficient text support you may see irregular vowel placements and no conjuncts.More...

Gujarati script is used to writeGujarati language.The topic first started as a stub, then became a sub-page underGujarati script,and finally was flagged as a candidate forWikibooks.Here, we would attempt to deal with a slightly complicatedtypographyof the Gujarati script for the non-native users of the script, and also a slightly complicated manner in which it is implemented in Unicode. By the virtue of standardization, Unicode has tried to implement twelveSouth Asian Scriptswith similar set of rules. This means that once you knowHow to use Unicode in creating Gujarati script,you may apply somewhat similar methodology for otherIndic scriptslikeDevanagari,Bengali,Gurumukhi,at al; provided you have the basic knowledge and the required fluency in the respectivewriting system.

The Basics

Gujarati alphabet mainly includes 34 consonants (ornamented sounds), 2 compound characters that are treated as consonants (not lexically though), and 14 vowels (pure sounds). Overall, the writing system comprises 94 legitimate and recognized distinct symbols or shapes. In the current Unicode 4.1 implementation, however, only some of these symbols have been incorporated as glyphs or shapes. The remaining shapes are created byconjunctions.

Introductory knowledge of Gujarati language and script can be obtained from

Framework of a Gujarati symbol

Given a constructed Gujarati syllable, it can be logically divided into the following parts based on the position of the shapes involved.

1. Baseline area – this is the placeholder for consonants and independent vowels
2. Area below and above the baseline – used for placing lower (below-base) and upper (above-base) dependent vowels respectively
3. Area before and after the baseline – used for writing left (pre-base) and right (post-base) dependent vowels respectively

Examples (clock-wise from top-left): 1. Post-based (Right) 2. Below-based (Lower) 3. Pre-based (Left) 4. Above-based (Upper). We will use these conventions in our further discussion.

What isSubstitution?

Substitution, in the sense applicable here, means replacing a set or group of characters or shapes with a single character or shape. In practical terms, this translates as – 1) multiple key-strokes will generate a single shape; and 2) the resultant shape will keep transforming itself (based on certain rules) in accordance with the user's key-strokes or inputs.

Substitution can happen when you add one or more shapes in any of the positions other than the baseline area (see illustration above).

Unicode Code-set

The Unicode range for Gujarati script is from U+0A80 to U+0AFF. TheISCIICode-page identifier for Gujarati script is 57010.

The table below shows the glyphs that are implemented in Unicode standard 4.0.0. Gray boxes indicate the code-points that are reserved/unused.

x=

0

1

2

3

4

5

6

7

8

9

A

B

C

D

E

F

U+0A8x

ઁ

ં

ઃ

અ

આ

ઇ

ઈ

ઉ

ઊ

ઋ

ઌ

ઍ

એ

U+0A9x

ઐ

ઑ

ઓ

ઔ

ક

ખ

ગ

ઘ

ઙ

ચ

છ

જ

ઝ

ઞ

ટ

U+0AAx

ઠ

ડ

ઢ

ણ

ત

થ

દ

ધ

ન

પ

ફ

બ

ભ

મ

ય

U+0ABx

ર

લ

ળ

વ

શ

ષ

સ

હ

઼

ઽ

ા

િ

U+0ACx

ી

ુ

ૂ

ૃ

ૄ

ૅ

ે

ૈ

ૉ

ો

ૌ

્

U+0ADx

ૐ

U+0AEx

ૠ

ૡ

ૢ

ૣ

૦

૧

૨

૩

૪

૫

૬

૭

૮

૯

U+0AFx

૱

For further details regarding Unicode Code-points and standards, you may refer toUnicode Code-chart — Standard 4.1.

Examples

Note: In the examples shown in the sections below, the"+"sign denotes the combination of key-strokes or user inputs.

Half-form of consonants

Half-forms of consonants are used in pre-base position. For consonants that do not have distinct glyph for half-forms, aHalant(્) is used to create half-forms as follows:

મ +્ + ય = મ્ય

— as in રમ્ય (pleasant)

(Note the Half-form of મ, which is used here in conjunction with ય) Note: Half-form is not created for the base glyph even if the syllable ends with aHalant.

Application of Upper-based form ofRa– (Reph)

Application ofRawith aHalant(Half-form ofRa,as seen above) to a full-form consonantbeforethe consonant producesRephfor that consonant. This affects the pronunciation ofRain conjunction with that consonant. ARephcan be created as follows:

ર +્	=Ra+Halant
ર +્ + થ = ર્થ	— as in અર્થ (meaning)

(Ra+Halant+ થ =Repheffect on થ)

Application of Lower-based form ofRa– (Vattu)

Application of aHalantof a consonant (Half-form of consonant) to a full-form ofRaproducesVattufor that consonant. This affects the pronunciation ofRain conjunction with that consonant. AVattucan be created as follows:

પ +્ + ર = પ્ર

— as in પ્રજા (people)

(પ +Halant+Ra=Vattueffect on પ)

Vattu variants

Vattuvariants (half and full) are formed when consonants withvattumark are combined. Often in some cases, a special glyph is required to representvattuwhen various consonants are combined.

ડ +્ + ર = ડ્ર

— as in ડ્રમ (drum)

(special glyph ડ્ર. Notice the two lower-based marks, as compared to only one in the previous example.)

Special Marks, Characters andNukta

Above-based marks

All above-based marks and post-basedmatraare created as under:

ક +ં = કં

— as in કંપન (vibration)

Below-based marks

The below-based marks and post-basedmatraare created as below:

ક +ુ = કુ	— as in કુતરો (dog)
ભ +ૂ = ભૂ	— as in ભૂકંપ (earthquake)

Characters શ્ર, ક્ષ and જ્ઞ

Following characters, which are part of the Gujarati alphabet, but are not explicitly created as glyphs in Unicode character-set, can be generated as indicated below:

શ +્ + ર = શ્ર

ક +્ + ષ = ક્ષ

જ +્ + ઞ = જ્ઞ

Application ofNukta

Nuktaeffects the pronunciation of the (preceding) consonant to which it is applied. ANuktaform of a consonant can be created in Unicode as follows:

ય +઼ = ય઼

Substitutions for specific typography of the script

Following are the main character substitutions which are required to address the complexity of the language and to generate various character forms of the script:

Pre-base substitutions

The half-form conjunctions, one of the most common occurrences of the script, are created by pre-base substitutions.

ન +્ + ન = ન્ન

— as in પ્રસન્ન (happy)

Also, the special use of this substitution is in creatingI-Matra(and its appropriately aligned shape) as shown below:

ત +િ = તિ

— as in તિર (arrow)

Post-base substitutions

Consonants of the Gujarati script do not have post-based forms. Primarily, post-based substitution is used to createvisargaout of vowels, and is also applied for "I-Matra" substitutions as follows (which will precede any above-based substitution, if applied as well):

જ +ી = જી

— as in જીવન (life)

(Compare the special shape જી – a result of post-based substitution – with another result of similar conbination using a character like લ, which will generate: લ +ી = લી)

Above-base substitutions

Above-based substitution is mainly applied forMatra,Reph,vowel modifications and for stress and tone marks. Consider the following examples:

વ +ૈ = વૈ	— as in વૈભવ (pompousness)
ર +્ + ગ +ે = ર્ગે	— as in સ્વર્ગે (in heaven)
મ +ે +ં = મેં	— as in મેંઢક (frog)

Below-base substitutions

Mainly used for below-basedmatra,the below-based substitution could produce a conjunction, or change the whole shape of the glyph. This substitution is also used for producing special tone effect likeanudatta.

More details on Gujarati Unicode

For further details on Gujarati Unicode, you may refer toUnicode Std 4.0.0 - Chapter 9
TDIL:Ministry of Communication & Information Technology, India
If you are creating a web-page while the OS language is not Gujarati, save the file as UTF-8 Unicode HTML. The code-points may be lost otherwise.