Feedback on CSX+
John Smith
jds10 at CUS.CAM.AC.UK
Wed Jun 24 11:46:58 UTC 1998
A couple of weeks ago I posted details of a proposed new encoding, CSX+,
based on and almost wholly compatible with CSX, and requested feedback. I
have received a number of comments, which I summarise below, together with
my responses. Two of them concern bad-mannered behaviour on the part of
this or that specific piece of software. My overall feeling here is that
it would not be wise to take too much notice of such problems, so long as
they are relatively minor: if we do, (a) we may introduce
incompatibilities, (b) we may fail to include all the characters we want.
Further, (c) future releases of the same programs may anyway behave
differently (for better or for even worse).
(1) B.Philip Jonsson pointed out that y-dieresis is in fact used in
Afrikaans, and John Clews added that it is "used in French, and very
frequently in Belgium". However, neither they nor anyone else suggested
that it is worth retaining it in CSX+, and I continue to propose that it
should be eliminated in favour of something more generally useful for
Indologists.
(2) Birgit Kellner remarked that (Japanese versions of) Word for Windows
assume that character 150 is a dash, and freely introduce line-breaks
after it. In CSX and CSX+ the character in question is u-circumflex; my
hope would be that this is sufficiently rarely used for the problem to be
marginal. (I have not heard of any complaints from existing CSX users.)
(3) Anthony Stone reported that WordPerfect 6.1 for Windows replaces
character 171 (a-macron-tilde) by left guillemot and character 187
(currently r-underdieresis) by right guiullemot. (The guillemots are, I
think, used in WP's "reveal codes" mode.) He himself added that WP is
anyway less than ideal for other reasons for dealing with Indological-type
material; the two characters affected are relatively uncommon ones and I
propose to leave them where they are (but see next paragraph).
A change has been made in the draft standard for transliteration of Indian
languages: it is proposed to drop r-underdieresis in favour of l-underbar.
l-underbar already exists in CSX and CSX+ (character 215), so the
elimination of r-underdieresis frees up one more slot (187). I propose to
devote that slot to R-underring, which will be needed by Sanskritists
using the standard transliteration.
I am pleased -- and surprised -- to find that CSX+ now contains capital
versions of all the characters that are at all likely to require them. At
least, I *think* it does!
A revised definition file for the whole CSX+ encoding is attached. I shall
allow a few days for final comments to come in; after that a set of fonts
implementing the new encoding will be built, and an announcement will
appear on both the Indology and Conv-dev mailing lists.
John Smith
--
Dr J. D. Smith * jds10 at cam.ac.uk
Faculty of Oriental Studies * Tel. 01223 335140 (Switchboard 01223 335106)
Sidgwick Avenue * Fax 01223 335110
Cambridge CB3 9DA * http://bombay.oriental.cam.ac.uk/index.html
# CSX+ encoding for mkt1font and vpl2vpl
#
# Enhanced version of CSX (Classical Sanskrit eXtended encoding)
# for the representation of Indian languages in Roman script
#
# CSX+ aims to be downward compatible with CSX, save for moving aacute
# away from the slot (decimal 160) used as non-breaking space on PCs.
# It also seeks to implement the (draft) ISO/TC46/SC2 standard, while
# retaining a useful set of European accented characters and adding
# dashes and directional double quotes.
128 C cedilla
129 u dieresis
130 e acute
131 a circumflex
132 a dieresis
133 a grave
134 a ring
135 c cedilla
136 e circumflex
137 e dieresis
138 e grave
139 i dieresis
140 i circumflex
141 i grave
142 A dieresis
143 A ring
144 E acute
145 ae
146 AE
147 o circumflex
148 o dieresis
149 o grave
150 u circumflex
151 u grave
152 ae macron # Was y dieresis in CSX
153 O dieresis
154 U dieresis
155 u breve # Was cent in CSX
156 sterling
157 r underring # Was yen in CSX
158 a acute
159 r underbar
160 space # Non-breaking space on PC: was a acute in CSX
161 i acute
162 o acute
163 u acute
164 n tilde
165 N tilde
166 l tilde
167 m overdot
168 amacron breve
169 imacron breve
170 umacron breve
171 amacron tilde
172 imacron tilde
173 n underbar
174 runderring macron # Was guillemotleft in CSX
175 l underring # Was guillemotright in CSX
176 lunderring macron
177 runderring acute
178 runderring grave
179 runderringmacron acute
180 lunderring acute
181 amacron acute
182 amacron grave
183 imacron acute
184 imacron grave
185 e macron
186 o macron
187 R underring
188 y overdot
189 umacron acute
190 umacron grave
191 r breve
192 M overdot
193 m candrabindu
194 t underbar
195 E macron
196 O macron
197 n breve
198 runderdot acute
199 runderdot grave
200 K h # Overwritten by next definition
200 Kh underbar
201 k underbar
202 space # Non-breaking space on Macintosh
203 AE macron
204 k h # Overwritten by next definition
204 kh underbar
205 g overdot
206 c circumflex
207 runderdotmacron acute
208 a tilde
209 i tilde
210 u tilde
211 e tilde
212 o tilde
213 e breve
214 o breve
215 l underbar
216 umacron tilde
217 G overdot
218 C circumflex
219 h underbar
220 h underbreve
221 endash
222 emdash
223 quotedblleft
224 a macron
225 germandbls
226 A macron
227 i macron
228 I macron
229 u macron
230 U macron
231 r underdot
232 R underdot
233 runderdot macron
234 Runderdot macron
235 l underdot
236 L underdot
237 lunderdot macron
238 Lunderdot macron
239 n overdot
240 N overdot
241 t underdot
242 T underdot
243 d underdot
244 D underdot
245 n underdot
246 N underdot
247 s acute
248 S acute
249 s underdot
250 S underdot
251 quotedblright
252 m underdot
253 M underdot
254 h underdot
255 H underdot
More information about the INDOLOGY
mailing list