[INDOLOGY] Precomposed characters vs combining characters

Marco Franceschini franceschini.marco at fastwebnet.it
Tue Jan 14 00:01:30 UTC 2014

Dear friends,

I’m devising a keyboard layout (on OS X) for the Italian "physical" keyboard, that allows the user to type all the combinations of a base character with one or more diacritics that are used for the transliteration of many Indian scripts as well as Arabic and Perso-Arabic scripts, in conformity with the main standards and transliteration schemes used in scholarly publications. I’m using Ukelele for this purpose.

My keyboard layout makes extensive use of dead keys: it allows the user to combine up to three diacritics to one base character, in order to let her/him to add Vedic tone signs (represented by grave/acute or vertical stroke above/underbar) to the transliterated text. Diacritics can be typed in any order, and the base character must be typed after them. The complete list of the allowed combinations is available here:

My question is: should I encode the output as precomposed characters (or as combinations of a precomposed character plus added diacritics –as far as precomposed characters are available, of course) or should I use combining characters throughout (that is: sequences of the codes of all the glyphs that constitute the final character)?

My keyboard is based on the “Italiano - Pro” keyboard layout that comes with OS X, in which just a few combinations of a base character+diacritic are provided. With a few exceptions, they are not used in the transliteration of Indian/Arabic scripts, but they are widely used in Italian language (e.g.: è é ì ò ù etc.). All of these combinations are encoded by the “Italiano - Pro” keyboard layout as precomposed characters.

I’m tempted to use combining characters throughout (and to convert the encoding of the combinations inherited from the “Italiano - Pro” keyboard accordingly). But I hesitate, because I know that only a few word processors (e.g. Nisus, which I'm using) are able to recognize the two different encodings (precomposed and combining characters) as equivalent for Finding/Replacing and Sorting purposes, while the most widespread softwares are not (Word for Mac, Neo Office, Open Office); and this fact would create problems if one adds/mixes text typed with my keyboard layout to an old file typed with the “Italiano - Pro” keyboard layout.

Precomposed characters or combining characters? This is the dilemma. Has any of you already faced such a quandary?


Marco Franceschini

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://list.indology.info/pipermail/indology/attachments/20140114/ea310ec8/attachment.htm>

More information about the INDOLOGY mailing list