[INDOLOGY] Extension of Sanskrit Heritage tools with connection to Amba Kulkarni's parser

Fri Feb 1 15:50:32 UTC 2019

Thank you Jan for your remarks.

For clarification, let me say again that what I announced on the list was not so much a rehaul of my generation devices than a new interface of my segmenter with Amba Kulkarni’s dependency parser, which supersedes my old rather unsatisfactory shallow parser.

Now concerning my « non-paninian generative device », I must reiterate a warning, since this is a recurrent motive of misunderstanding. The tool accessible for declension of nominal stems and verbal roots under the link « Grammar » is not something that should be trusted blindly,
since it expects stems consistent with their spelling in my dictionary. Thus throwing « anaḍuh » to the tool yields arbitrary forms, since its lexicalized stem is « anaḍvah », consistent with its etymology as a compound of anas and vah#2. If you click on the « m. » masculine link
on its defining entry page <https://sanskrit.inria.fr/DICO/2.html#ana.dvah> you will get the following table <https://sanskrit.inria.fr/cgi-bin/SKT/sktdeclin.cgi?q=ana.dvah;g=Mas;font=roma> which should correspond to your expectations, or at least is consistent with Whitney§404. And this is all that matters for me, since I am not interested in drilling students with vibhakti generation, but on the contrary in providing a reader tool that assists users in analyzing classical Sanskrit text without extensive grammatical training. 

May I make the remark that this situation is not so different from trying to use Aṣṭādhyāyī with a random Dhātupāṭha ? As you very well know, one may get very weird results if one does not use the precise Dhātupāṭha used by Pāṇini himself.
Similarly, if you throw a root to my conjugation engine with arbitrary spelling, without including its homophony index, and with a non-expected gaṇa, you will get tons of wrong forms, but this is without consequences on the reader tool, since it uses only forms generated through the lexicon parameters.
Actually, in view of its misleading usages, the Grammar link to my generating morphology tools ought probably to be hidden from casual users sight, and reserved for debugging mode. It seems mostly to attract Sanskrit teachers like a magnet, to try anaḍuh, path, dos and other weird consonantal stems, as to get reassured in their opinion that it generates junk. So once in a while I spend a few improductive hours fixing generation of improbable items. Scratching my head as to whether I should generate the nominative of vivikṣ as viviṭ (following Monier-Williams) or as vivik (following Kane§97). Should I consult the pandits to know their opinion as to what is the correct Paninian form ? My own experience at this sort of test is frustrating. What really matters for me is whether the form appears in corpus, probably under the two variants…

This said, I am always happy to fix blatant mistakes signaled by seasoned users of the tool (i.e. having read the manual with its numerous caveats). But anantaṃśāstram alpaṃjīvitam !

Best
Gérard

> Le 1 févr. 2019 à 13:14, Jan E.M. Houben <jemhouben at gmail.com> a écrit :
> 
> Dear Gérard, 
> Congratulations on updating your non-paninian generative device!
> Something is indeed still going wrong with 
> "anaḍuh"
> However, in the case of "ubhaya", the lexical information that "ubhaya" in dual is very rare (though acc. to MW not entirely unattested) is a lexicographic annotation which does not affect the grammatical generation of its forms. 
> And, logically, "two" is the lower limit of the number of separate things that can be "of two kinds"... which ubhau would express as being simply two in number, not also as "of two kinds"... 
> Best, Jan
> 
> On Fri, 18 Jan 2019 at 19:17, Walter Slaje via INDOLOGY <indology at list.indology.info <mailto:indology at list.indology.info>> wrote:
> Certainly a welcome development, but
> 
> > this wonderful grammatical analysis
> 
> really calls for a thorough revision before activating it. The very few searches I could make yielded inexplicable, in parts unbelievably wrong results. Anyone may see for themselves if they just test the declension of "adas", "idam", "ubhaya" (displaying dual forms, but no fem.), "sarvā" (fem.), "dos / doṣan", "anaḍuh", "asthi", and many more. In the hands of Sanskrit beginners who lack grammatical training this database will do more harm than good. I can only warn against its use until it has seen a very careful revision by experts.
> 
> Kind regards,
> WS
> _______________________________________________
> INDOLOGY mailing list
> INDOLOGY at list.indology.info <mailto:INDOLOGY at list.indology.info>
> indology-owner at list.indology.info <mailto:indology-owner at list.indology.info> (messages to the list's managing committee)
> http://listinfo.indology.info <http://listinfo.indology.info/> (where you can change your list options or unsubscribe)
> 
> 
> -- 
> Jan E.M. Houben
> Directeur d'Études, Professor of South Asian History and Philology
> Sources et histoire de la tradition sanskrite
> École Pratique des Hautes Études (EPHE, PSL - Université Paris)
> Sciences historiques et philologiques 
> 54, rue Saint-Jacques, CS 20525 – 75005 Paris
> johannes.houben at ephe.sorbonne.fr <mailto:johannes.houben at ephe.sorbonne.fr>
> johannes.houben at ephe.psl.eu <mailto:johannes.houben at ephe.psl.eu>
> https://ephe-sorbonne.academia.edu/JanEMHouben <https://ephe-sorbonne.academia.edu/JanEMHouben>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://list.indology.info/pipermail/indology/attachments/20190201/fb0c3bef/attachment.htm>