Feedback on CSX+

John Smith jds10 at CUS.CAM.AC.UK
Wed Jun 24 11:46:58 UTC 1998

A couple of weeks ago I posted details of a proposed new encoding, CSX+,
based on and almost wholly compatible with CSX, and requested feedback. I
have received a number of comments, which I summarise below, together with
my responses.  Two of them concern bad-mannered behaviour on the part of
this or that specific piece of software. My overall feeling here is that
it would not be wise to take too much notice of such problems, so long as
they are relatively minor: if we do, (a) we may introduce
incompatibilities, (b) we may fail to include all the characters we want.
Further, (c) future releases of the same programs may anyway behave
differently (for better or for even worse).

(1) B.Philip Jonsson pointed out that y-dieresis is in fact used in
Afrikaans, and John Clews added that it is "used in French, and very
frequently in Belgium". However, neither they nor anyone else suggested
that it is worth retaining it in CSX+, and I continue to propose that it
should be eliminated in favour of something more generally useful for

(2) Birgit Kellner remarked that (Japanese versions of) Word for Windows
assume that character 150 is a dash, and freely introduce line-breaks
after it. In CSX and CSX+ the character in question is u-circumflex; my
hope would be that this is sufficiently rarely used for the problem to be
marginal. (I have not heard of any complaints from existing CSX users.)

(3) Anthony Stone reported that WordPerfect 6.1 for Windows replaces
character 171 (a-macron-tilde) by left guillemot and character 187
(currently r-underdieresis) by right guiullemot. (The guillemots are, I
think, used in WP's "reveal codes" mode.) He himself added that WP is
anyway less than ideal for other reasons for dealing with Indological-type
material; the two characters affected are relatively uncommon ones and I
propose to leave them where they are (but see next paragraph).

A change has been made in the draft standard for transliteration of Indian
languages: it is proposed to drop r-underdieresis in favour of l-underbar.
l-underbar already exists in CSX and CSX+ (character 215), so the
elimination of r-underdieresis frees up one more slot (187). I propose to
devote that slot to R-underring, which will be needed by Sanskritists
using the standard transliteration.

I am pleased -- and surprised -- to find that CSX+ now contains capital
versions of all the characters that are at all likely to require them. At
least, I *think* it does!

A revised definition file for the whole CSX+ encoding is attached. I shall
allow a few days for final comments to come in; after that a set of fonts
implementing the new encoding will be built, and an announcement will
appear on both the Indology and Conv-dev mailing lists.

John Smith

Dr J. D. Smith                *  jds10 at
Faculty of Oriental Studies   *  Tel. 01223 335140 (Switchboard 01223 335106)
Sidgwick Avenue               *  Fax  01223 335110
Cambridge CB3 9DA             *

# CSX+ encoding for mkt1font and vpl2vpl
# Enhanced version of CSX (Classical Sanskrit eXtended encoding)
# for the representation of Indian languages in Roman script
# CSX+ aims to be downward compatible with CSX, save for moving aacute
# away from the slot (decimal 160) used as non-breaking space on PCs.
# It also seeks to implement the (draft) ISO/TC46/SC2 standard, while
# retaining a useful set of European accented characters and adding
# dashes and directional double quotes.

128	C cedilla
129	u dieresis
130	e acute
131	a circumflex
132	a dieresis
133	a grave
134	a ring
135	c cedilla
136	e circumflex
137	e dieresis
138	e grave
139	i dieresis
140	i circumflex
141	i grave
142	A dieresis
143	A ring
144	E acute
145	ae
146	AE
147	o circumflex
148	o dieresis
149	o grave
150	u circumflex
151	u grave
152	ae macron		# Was y dieresis in CSX
153	O dieresis
154	U dieresis
155	u breve			# Was cent in CSX
156	sterling
157	r underring		# Was yen in CSX
158	a acute
159	r underbar
160	space			# Non-breaking space on PC: was a acute in CSX
161	i acute
162	o acute
163	u acute
164	n tilde
165	N tilde
166	l tilde
167	m overdot
168	amacron breve
169	imacron breve
170	umacron breve
171	amacron tilde
172	imacron tilde
173	n underbar
174	runderring macron	# Was guillemotleft in CSX
175	l underring		# Was guillemotright in CSX
176	lunderring macron
177	runderring acute
178	runderring grave
179	runderringmacron acute
180	lunderring acute
181	amacron acute
182	amacron grave
183	imacron acute
184	imacron grave
185	e macron
186	o macron
187	R underring
188	y overdot
189	umacron acute
190	umacron grave
191	r breve
192	M overdot
193	m candrabindu
194	t underbar
195	E macron
196	O macron
197	n breve
198	runderdot acute
199	runderdot grave
200	K h			# Overwritten by next definition
200	Kh underbar
201	k underbar
202	space			# Non-breaking space on Macintosh
203	AE macron
204	k h			# Overwritten by next definition
204	kh underbar
205	g overdot
206	c circumflex
207	runderdotmacron acute
208	a tilde
209	i tilde
210	u tilde
211	e tilde
212	o tilde
213	e breve
214	o breve
215	l underbar
216	umacron tilde
217	G overdot
218	C circumflex
219	h underbar
220	h underbreve
221	endash
222	emdash
223	quotedblleft
224	a macron
225	germandbls
226	A macron
227	i macron
228	I macron
229	u macron
230	U macron
231	r underdot
232	R underdot
233	runderdot macron
234	Runderdot macron
235	l underdot
236	L underdot
237	lunderdot macron
238	Lunderdot macron
239	n overdot
240	N overdot
241	t underdot
242	T underdot
243	d underdot
244	D underdot
245	n underdot
246	N underdot
247	s acute
248	S acute
249	s underdot
250	S underdot
251	quotedblright
252	m underdot
253	M underdot
254	h underdot
255	H underdot

More information about the INDOLOGY mailing list