[INDOLOGY] Sandhi and compound splitting model

Wed Aug 29 10:02:23 UTC 2018

Dear Jan,

thanks for the positive feedback on the word splitter. Hope it turns out 
to be useful for our research community.

Reg. RV 1.1.1: The analysis does not imply that dhAtama is 
morphologically derived from dA "to give", although one may get this 
impression by the term "giving" in that line. "giving" is just a coarse 
word semantic annotation of dhAtama, which is - it's meant to be coarse! 
- not too far away from Jamison + Brereton 2014 ("most richly conferring 
treasure"). Same for the English terms (if any) in other lines.

Best wishes, Oliver

On 29/08/2018 09:58, Jan E.M. Houben wrote:
> Dear Oliver,
> Congratulations and thanks for sharing again a very useful research tool.
> Also for the tool you shared earlier (see below),
> which, incidentally, contains a mistake in the very first line:
> 1#1#1#2#2ratnadhātamam#2#dhātamam#dhātama###219609#4443604#1#ADJ#3#1#1#_##giving~130047~2
> The mistake -- and you are not the only one to make it -- is that the 
> adjectival word part -dhātama- (you have chosen to neglect tama, 
> probably consciously) is not derived from dā (cp. Gk. didoomi "I give, 
> confer") but from dhā (cp. Gk. tithēmi "I establish").
> Herzliche Grüße,
> Jan
>
> ***
> I would like to announce the release of a full annotation of the Rigveda
> with morphological, lexical and verb-argument information.
>
> Data are stored in a publicly accessible repository at
> https://git.adwmainz.net/open/rigveda
>
> Details of the annotation process are described in the LREC paper, 
> which is
> stored at the upper level of the repository.
>
>
>
>
> On Wed, 29 Aug 2018 at 07:24, Oliver Hellwig via INDOLOGY 
> <indology at list.indology.info <mailto:indology at list.indology.info>> wrote:
>
>     Dear all,
>
>     Sebastian Nehrdich and I have developed a machine learning model that
>     splits Sandhis and compounds in "raw" Sanskrit text.
>
>     You find further details, model, code and the data it was built with
>     (~600.000 lines of Sanskrit text from the DCS) at
>     https://github.com/OliverHellwig/sanskrit/tree/master/papers/2018emnlp
>
>     The pdf in the github directory contains further technical
>     information.
>
>     If you know researchers who work on this topic and may be
>     interested in
>     the model or the data, it would be great if you could forward this
>     mail
>     to them.
>
>     Oliver
>
>     ---
>     Oliver Hellwig
>     IVS Zurich / SFB 991, Düsseldorf
>
>
>     _______________________________________________
>     INDOLOGY mailing list
>     INDOLOGY at list.indology.info <mailto:INDOLOGY at list.indology.info>
>     indology-owner at list.indology.info
>     <mailto:indology-owner at list.indology.info> (messages to the list's
>     managing committee)
>     http://listinfo.indology.info (where you can change your list
>     options or unsubscribe)
>
>
>
> -- 
>
> *Jan E.M. Houben*
>
> Directeur d'Études, Professor of South Asian History and Philology
>
> /Sources et histoire de la tradition sanskrite/
>
> École Pratique des Hautes Études (EPHE, PSL - Université Paris)
>
> /*Sciences historiques et philologiques */
>
> 54, rue Saint-Jacques, CS 20525 – 75005 Paris
>
> /johannes.houben at ephe.sorbonne.fr 
> <mailto:johannes.houben at ephe.sorbonne.fr>/
>
> /johannes.houben at ephe.psl.eu <mailto:johannes.houben at ephe.psl.eu>/
>
> /https://ephe-sorbonne.academia.edu/JanEMHouben/
>
> 1506959459738_Signature
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://list.indology.info/pipermail/indology/attachments/20180829/2eed15a9/attachment.htm>