Is CSX the best solution?

DR.S.KALYANARAMAN MDSAAA48 at giasmd01.vsnl.net.in
Sat Oct 21 09:05:56 UTC 1995


On Fri, 20 Oct 1995, Dominik Wujastyk wrote:
> 
> You say that any 8-bit character set can "provide for the
> transliteration and text exchange functions"; well this can only work if
> the charset encoding is exchanged along with the text, and separately
> from it.  If that isn't done, you obviously get in a mess.  CSX is a
> publicly available standard, aimed at solving this problem.
> 
> CSX is an 8-bit character set too, and gives access not only to
> pre-formed accented characters for Sanskrit etc., but also to those
> required for the major European languages (a-circumflex, e-acture,
> etc.).
> 
> NB: CSX is not a font.  It is a character-set definition, in the same
> way as the many definitions published under ISO 8859 (ECMA-94) for
> "8-bit single-byte coded graphic character sets".
> 
Re: On sounds, sights and computers

I assumed that many would be familiar with ISCII standard. 
However, it may be useful to recount, in a rather long note, the status 
of the
evolution of standards for Indian script processing on computers.

ON SOUNDS

After a decade of protracted deliberations, a standard was adopted 
in India in 1991 to 'process' languages on computers: ref. Bureau
of Indian Standards (BIS), Indian Script Code for Information
Interchange (ISCII); cf. BIS document 13194:1991.

This was a standard which laid out the keyboard layout phonetically
for all Indian languages. This enabled a concordance being established
with the 7-bit ASCII standard used on most personal computers. The
phonetic keyboard on QWERTY keys for example would be: 

au, ai, A, I, U, (vowels), bh

and on 'qwerty' keys would be:

au, ai, A, I, U, (corresponding vowel diacriticals), b

Other examples:
Nasals such as En, an were keyed to the ASCII '!', '@'; jn, tr, kS, zr
were keyed to the ASCII '%', '^', '&', '*'; the unique Tamil consonants
'n' (vallina nakaram), 'ZH' were keyed to the ASCII 'V', 'B'

The basic design objective of this ASCII key was to provide for
transliteration from one Indian language to another Indian language.
Electronics Commission of India tried to promote the design and
production of terminals based on this standard. (Jus as King Canute
failed to roll back the waves of the sea, ECI also failed, unable
to withstand the tidal waves of applications which washed around
the Windows-based platforms.)

ON SIGHTS

This ISCII standard was a partial computing solution since the glyphs
corresponding to the phonetics of the 'ISCII keyboard' were not
identified and keyed to a standard layout. This glyph layout was
not possible owing to the limiting space of a 7-bit character set
which could accommodate only 128 glyphs. This led to 'programmed'
software solutions, to compose or typeset a limited repertoire of
glyphs based on keyed-in sequences. These solutions compounded the
problem; specific applications (wp, spreadsheet etc.)
had to be developed which incorporated these composition programming
routines.

A new standard was, therefore, evolved on the 8-bit (256 character set)
which had become the standard to accommodate for example, the Roman
extended character set (with 'a' umlauts, accents etc.) most dominantly
in MS Windows. This 8-bit standard (exemplified by CDAC- Centre for
Development of Advanced Computing) is unique for each script of each
language. For example, for Sanskrit, ANSI numbers 202, 203, 204, 205 
are keyed to the glyphs 'i', 'im', 'r-i' (as in: a-r-m-i, movement; 
da-r-v-i, ladle), 'r-i-m'(as in: a-r-k-i-m, radiant with light).

COMPUTERS

The ISCII phonetic 91-key structure PLUS the CDAC 8-bit glyph structure 
are thus the Indian language standards to depict the sounds and sights
of the Indian languages on the computers which conform to the ANSI
8-bit standard.

Writing systems: with touch-screen or mouse-based 'text entry' devices,
using MS Windows, the phonetic keyboard can be substituted by
an ANSI (8-bit) chart displayed on the screen. In fact, Windows 95
uses this as a standard feature. The chart pops up in one window,
superimposed on the application screen (which may be a WP text, or an 
ACCESS database or an EXCEL spreadsheet).

The advantage of this method of 'keying' is the elimination of 
a phonetic keyboard interface (like the ISCII keyboard). The ANSI
display chart becomes the direct graphical user interface (GUI).

UNICODE

Unicode is the specification (evolving) for a unique character set
encoding for all of the world's languages, using a double byte (16-bit)
encoding system defined by the Unicode consortium which includes
Mcrosoft, Apple, IBM etc.

I believe that the CDAC glyphs standard for Indian scripts are excellent
for incorporation in the Unicode specification. The same standard is used
on the CDROM with 1500+ fonts for South Asian languages published
by Scanrom Publications, NY, USA. These fonts are truetype/postscript
type and portable to Windows, Mac, OS/2 and Unix 8-bit platforms. Since
the diacritical glyphs are designed to 'intelligently' ligature themselves
at appropriate locations of the preceding 'principal' glyphs, no 
programming intervention is required to do 'composing' or 'typesetting'.

For those who prefer the use of QWERTY or DVORAK keyboard for 'keying',
without waiting for the Godot Unicode, MSWindows 95 makes available a 
simple ANSI solution: 
any ANSI chart character can be keyed to any desired keyboard 
key sequences. This can be defined for each font or a group of language
fonts.

The advantage of this method is the portability of the glyphs chart
on to ALL APPLICATIONS which run on the Windows platform. Thus the
Indian language scripts can be depicted directly on wp, dtp, teleprocessing,
spreadsheed, database, transliteration etc. etc. applications.


Dr. S. Kalyanaraman
Indus Sarasvati Research Centre,
20 Warren Road, Mylapore, Madras 600004
Tel. 91-44-493-6288; Fax. 91-44-499-6380
email: mdsaaa48 at giasmd01.VSNL.net.inM
 






More information about the INDOLOGY mailing list