compound analysis in e-texts
kellner at ipc.hiroshima-u.ac.jp
Tue Aug 27 16:58:13 UTC 1996
Lars Martin Fosse wrote:
> This is not a good argument against compound analysis as such. I have worked
> with people who typed text for me, some of them analysing compounds very
> competently, others not. When I got non-analysed text, I sent it to another
> person who was able to analyse the compounds. Using the TZ-format, I never
> have any problem recreating a sandhi-text by means of macros, thus getting
> the best of two worlds. The compound-analysed text is essential for a number
> of analytical tasks performed by computer, e.g. language statistics, word
> collocation studies etc.
That's all very well, as long as you have competent people to analyze the
compounds, and as long as the time spent on producing e-texts is not a major
One problem, which Jakub Cejka mentioned before, are ambiguous compounds
which are read in different ways by the tradition itself, and I would like
to second his question whether you have a policy on such cases.
Add to which, my current experience with preparing an e-text version of the
complete works of J~naana'sriimitra (short: JNA) tells me that "competence"
is a very, very relative concept. I have typed in quite a lot of his texts
by now, and I am virtually "living" with two of his treatises, but his style
is so intricately difficult that, more often than not, I have to give up on
compound analysis. Another problem is my lack of competence outside the very
narrow field of pramaan.a-studies. An author like J~naana'srii, who
frequently uses vocabulary/illustrations taken from poetics or at least not
conforming to the "standards" of the poor man's pramaan.a-terminology in
general, requires constant lexicographical investigation, and a lot of
reading experience in other subject areas. I don't have this experience, and
if I had to gain it simply to TYPE in the text, it would take at least ten
more years for me to come up with the preliminary electronic version of JNA,
which is not really in anybody's interest.
Hence, I have formed the opinion that (a) we can never be sure about the
competence required for the analysis, and (b) if I personally have to choose
between probably flawed compound-analysis and no compound-analysis at all, I
would prefer the latter, as far as texts published for the general audience
are concerned. This, of course, does not prevent one from preparing
compound-analyzed texts for the tasks you mentioned (indexing, collocations
etc.). Maybe one should differentiate different target-audiences for
different types of e-texts in the first place.
Another question I would like to ask is what principles people apply when
carrying out compound-analysis. Motoi Ono, Jun'ichi Oda and Jun Takashima,
for example, separated compounds with hyphens in their recently published
KWIC-Index to Dharmakiirti's works. They adopted the policy not to separate
(1) words with the prefixes a-, dur- and nih.-; (2) possessive adjectives
with -vat/-mat are separated, while adverbs with -vat meaning "such as" are
not; (3) a numeral with -dha/-vidha/-prakaara remains unseparated; (4)
compounds with -taa/-tva or with the elements -bhaava/-bhuuta are not
separated; (5) compounds starting with evam-, tat-, tathaa-, para-, yathaa-,
su-, sva- are not separated; (6) some compounds which are considered as
technical terms are not separated, e.g. padaartha, agnihotra,
ayogavyavaccheda, prasajyapratis.edha, svabhaavapratibandha.
I would be very interested in getting opinions on this policy.
As to Lars' argument that compound-analyzed texts facilitate students'
efforts - this leads on to another discussion, that whether facilitating
reading Sanskrit for students should be made into a general policy for
e-texts, and whether it is such a good thing to facilitate too many things
for students in the first place. I personally don't like romanization at
all, and I think romanized textual editions should die out as soon as
possible. This opinion is not based on a somewhat sadistic dislike of
students as such, but on the assumption that Sanskrit is a foreign language
with its own distinct writing style, and that it should be taught as such.
But, as I said, this leads on to another discussion, which is why I shall
Department for Indian Philosophy
University of Hiroshima
More information about the INDOLOGY