[INDOLOGY] Precomposed characters vs combining characters
rajam at earthlink.net
Tue Jan 14 00:29:11 UTC 2014
This is a challenging endeavor in using computing technology for academic prints. I’ve been through this since 1978 and have learned quite a lot through hardship!
Things we may have to keep in mind are:
1. Keyboard input and display on the screen depend upon the specific operating system one uses (IBM, Windows, Mac OS).
2. Most important and complicating thing in all of this is word processing with indexing. I’ve gone through a lot in this process!
No matter how you encode your input/output … some kind of macro would be needed to convert your input/output when you want to send it to a publisher.
On Jan 13, 2014, at 4:01 PM, Marco Franceschini <franceschini.marco at fastwebnet.it> wrote:
> Dear friends,
> I’m devising a keyboard layout (on OS X) for the Italian "physical" keyboard, that allows the user to type all the combinations of a base character with one or more diacritics that are used for the transliteration of many Indian scripts as well as Arabic and Perso-Arabic scripts, in conformity with the main standards and transliteration schemes used in scholarly publications. I’m using Ukelele for this purpose.
> My keyboard layout makes extensive use of dead keys: it allows the user to combine up to three diacritics to one base character, in order to let her/him to add Vedic tone signs (represented by grave/acute or vertical stroke above/underbar) to the transliterated text. Diacritics can be typed in any order, and the base character must be typed after them. The complete list of the allowed combinations is available here:
> My question is: should I encode the output as precomposed characters (or as combinations of a precomposed character plus added diacritics –as far as precomposed characters are available, of course) or should I use combining characters throughout (that is: sequences of the codes of all the glyphs that constitute the final character)?
> My keyboard is based on the “Italiano - Pro” keyboard layout that comes with OS X, in which just a few combinations of a base character+diacritic are provided. With a few exceptions, they are not used in the transliteration of Indian/Arabic scripts, but they are widely used in Italian language (e.g.: è é ì ò ù etc.). All of these combinations are encoded by the “Italiano - Pro” keyboard layout as precomposed characters.
> I’m tempted to use combining characters throughout (and to convert the encoding of the combinations inherited from the “Italiano - Pro” keyboard accordingly). But I hesitate, because I know that only a few word processors (e.g. Nisus, which I'm using) are able to recognize the two different encodings (precomposed and combining characters) as equivalent for Finding/Replacing and Sorting purposes, while the most widespread softwares are not (Word for Mac, Neo Office, Open Office); and this fact would create problems if one adds/mixes text typed with my keyboard layout to an old file typed with the “Italiano - Pro” keyboard layout.
> Precomposed characters or combining characters? This is the dilemma. Has any of you already faced such a quandary?
> Marco Franceschini
> INDOLOGY mailing list
> INDOLOGY at list.indology.info
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the INDOLOGY