None

Kellner kellner at ue.ipc.hiroshima-u.ac.jp
Tue Dec 20 18:32:07 UTC 1994


indology at liverpool.ac.uk

Re: Input of e-texts

I am not aware of any ongoing discussions on e-texts, files & formats, as
I have been reading INDOLOGY only for a few months now. 

Anyway - the problem of diacritics on e-texts mainly arises as soon as e-
texts are transferred from one E-mail-account (or ftp-site) to another. This
poses the following problems: 

1. File format

Can we basically agree on the point that ASCII is the most widely used
text standard? That it is most likely to be read by all kinds of computers?
I noticed that some people distribute e-texts in TEX-format. Given that I
never heard of TEX in my life before, this seemed rather odd to me, and
if my present institute (Institute of Indian Philosophy, Hiroshima Univ.)
had not incidentally possessed TEX-conversion tools, my search for
wisdom would have had to stop at that point. I am not going to go on
about all the frustrating attempts at converting file formats other than
ASCII to something either a Mac or a PC can work with. Frankly, I am
sick of it. To the extent that, IMHO, all e-texts should be converted to
ASCII. 

A few months
ago, somebody on this list mentioned that cheap indological publications
should be made available. This is also applicable for e-texts: Please
prepare _cheap_ e-texts that most people around the globe can easily
incorporate in whatever system or software they use, without having to
buy additional soft-ware, and without wasting days and days to convert.


2. Transcription
O.K., there is no _best_ transcription system, but there are better and
worse ones. The main (and grave) drawback of Scharf's system (or that of
the Mahabharata-files) is that diacritics are transcribed with ordinary
English letters. Obviously, this works if the e-text is in Sanskrit ONLY.
If, however, articles, papers etc., which combine English (or any other
language) and Sanskrit, are to be e-mailed, this will result in utter
confusion and a lot of unnecessary work.

Hiroshima (and a few other Japanese institutes, as far as I know) uses the
following transcription system:

        vowels:     a, a@, i, i@, u, u@, r@, y@, l@, e, e@, o, o@
        gutturals:  k, kh, g, gh, g@
        palatals:   c, ch, j, jh, j@
        linguals:   t@, t at h, d@, d at h, n@
        dentals:    t, th, d, dh, n
        labials:    p, ph, b, bh, m
        semivowels: y, r, l, v
        sibilants:  c@, s@, s
        aspiration: h
        anusva at ra:  m@
        visarga:    h@

[For transcription of Tibetan, z@ is also used]

Of course, Scharf's system is better in that it only needs ONE keystroke
to produce the corresponding diacritical letter, but, as most word-
processing-programs can use macros to trigger TWO letters with one
keystroke, this can easily be done away with. 

Moreover, capital letters can be _diacriticized_, too, e.g. as _A at _ etc.
This is important for proper names in titles etc. (and again, probably more
often used in papers & articles which combine Sanskrit with other
languages). 

Another main advantage is that the system, apart from letters such as _j at _
etc., which would be ambivalent otherwise, by and large corresponds to
a _common sense_ understanding of Sanskrit transcription (for example,
the aspirated consonants) as used in printed matter. This makes it much
easier to memorize. 

3. Sandhi
A specific comment to Scharf's postulate that Sandhi should be undone
in all cases:
In some cases, Sandhi is ambivalent as to whether the respective padas are
a compound or two different words. Not only does the separation of the
Sandhi already presuppose an interpretation of the text (which, given the
amount of text which is generally put in, is hardly likely to be reliable);
moreover, sometimes such ambivalent Sandhis are the source for
commentarial battles in philosophical schools. Such information might be
blurred if the person who types already precludes alternative interpretations.

Finally, how far do you go when undoing Sandhi? Do you also undo
word-internal Sandhi? Of course, this makes it easier to use search tools,
but it produces MUCH, MUCH more work when inputting text - I
basically leave the Sandhi _as it is_ in the edition or manuscript I use,
apart from minor adjustments (obvious misprints), which are, of course, 
mentioned in the e-text itself. 

Lastly, undoing Sandhi can be quite a source for mistakes, and neither do
I want to trouble other users of my e-texts with
my silly mistypings, nor do I want to be bothered with others'. 

Suffice it to add that there are quite a few search tools that can do _fuzzy_
searches. I use one called WPSMOUS, developed by Johannes Prandstetter
at the Austrian Academy of Sciences, Vienna. It can search files,
directories etc. and writes the output (embedded in its syntactical context)
into a special file, giving also page number etc. of the respective passage
in the respective file. (BTW, Prandstetter also developed an ASCII-type
diacritical fonts which works both with Windows and DOS-applications
(keyboard and screen fonts)). 


4. The moral side

One of my main concerns, when I put texts on file and distribute them to
more or less any individuum that asks for it, is copyright. We downloaded
Liverpool's As at t@adhya at yi and noted that it was taken word for word
from Katre's edition, without bothering to mention. Given that Katre's
edition is _not_ out of print, would that not be a rather gross violation of
copyright? As I myself am pretty ignorant of such legal aspects, I would
welcome others to provide more information on that point (also, I heard
that there is a lawsuit going on about the Pali Text Society's editions which
were put on file somewherre in Thailand, allegedly without any kind of
permission...). 

Moreover, e-texts should include as complete information as possible on
the edition(s) they are based on etc. Page-numbers of edition(s), editor's
remarks or verse-numbers should be included, too. I had lots of fun with
ACIP's Tibetan version of Dharmaki at rti's Prama at n@ava at rttika, as they
did not bother to include ANY verse-numbers at all....verse-numbers can,
of course, be a matter of scholarly dispute (as is the case with PV I, still),
but they are absolutely necessary for at least a rough orientation.
Moreover, ANY kind of convention (e.g. _page-numbers are given in
square brackets_) should be clarified. 

Lastly, and this is almost self-evident, despite my nagging & complaining
about various details of existing e-texts, I thank all those individuals who
dedicate their time to typing Indian and Tibetan texts on file. Without their
enthusiasm, my work would be twice as slow and cumbersome as it
already
is (sometimes). 


Birgit Kellner
Institute for Indian Philosophy
Hiroshima University



 






More information about the INDOLOGY mailing list