order of letters
Gérard Huet
Gerard.Huet at INRIA.FR
Wed May 19 22:48:57 UTC 2010
Dear Pr Tull,
This matter is one of those small problems which create lots of
headaches for the proper design of computer-processing tools
for Sanskrit. One wants to avoid encoding redundancies, in order to
keep overgeneration low, while preserving completeness.
Also clear consistent rules must be applied concerning lexicographic
ordering in computerized lexicons.
Here is what I do at present. I normalize non-genuine anusvAra to the
nasal homophonic to the following nasal. Both in the parsed
input and in the inflected forms databanks. Remark that the inverse
normalization would be incorrect, since we want to keep
forms of root sa~nj and not turn them to *sa.mj.
Until recently I applied a similar normalization to visarga. When it
is followed by a sibilant, I would replace it by the same sibilant.
Thus e.g. a.hsu would normalize to assu. Similarly to the anusvaara
treatment, such visarga would appear in the alphabetical order
of the corresponding sibilant, and only visarga preceding "k" and "p"
would be preserved, and be listed just before the consonants.
But recently I noticed an improper side-effect of this normalization,
in an example such as:
api tvam adya prAtaH svasuH g.rham agacchaH
since normalizing input prAtaHsvasuH to prAtassvasuH would prevent its
parsing. Here prAtaH must be preserved, in order for
sandhiviccheda to find the proper solution with prAtar. Because of
this, I now treat all visargas in the same way, and I do not
replace them before sibilants.
Another place where a similar problem occurs is that words ending in
"r" such as prAtar, antar, punar, catur but also all the
verbal forms in -ur must be stored intact in the forms databanks, and
terminal sandhi should not be used.
This is needed to parse chunks such as punarapi, since puna.h+api
would yield *puno'pi.
This difficulty is discussed nowhere I know of, and pandits protest at
my listing punar rather than puna.h in tables.
Many other mundane problems arise with avagraha, with marking hiatus
in words such as pra-ucya or pra-uga, etc.
Another problem which seems to be below the threshold of interest for
most linguists is the optional gemination/degemination: forms
such as karmma, kiirttyate, vAggmI which are indeed Paninian (the last
one using the very ad-hoc 5.2.124 sUtra), but also forms such as
chatra, chAtra, patrikA, vArtA, satra, satva, which may not be
Paninian, but which occur in corpus nonetheless.
Regards
G. Huet
Le 19 mai 10 à 15:38, Herman Tull a écrit :
> I am putting together some simple rules for first year students. In
> dealing with the order of the letters, there is the ever present
> confusion over the placement of anusvAra and visarga. I notice that
> Monier Williams and Macdonell are consistent in their placement of
> the anusvara (placing it after the vowels when it precedes the semi-
> vowels or the sibilants). But, they seem not to agree on the
> placement of the visarga. Macdonell follows the rule that he states
> in his student grammar (p. 3), that the visarga follows the vowels
> when it precedes "k" and "p" and that it when it precedes a sibilant
> it is placed in the consonantal order of the sibilants. E.g., on p.
> 17 of his dictionary he has an article for antaH-ka... and then on
> p. 18 he has the article for antaH-sa...
>
> Monier-Williams, on the other hand, combines this all into one
> article (p. 43, "antaH"), and so does not distinguish between
> visarga before "k" and "p" on the one hand, and before the sibilants
> on the other, treating all of it as if it precedes "antar" (which
> seems incorrect, since antaH-sa would not precede antar according to
> the rule).
>
> A small matter (especially with the advent of the electronic
> dictionary), perhaps, but can anyone shed light on this?
>
> Herman Tull
More information about the INDOLOGY
mailing list