[INDOLOGY] Sandhi and compound splitting model
Oliver Hellwig
hellwig7 at gmx.de
Wed Aug 29 10:02:23 UTC 2018
Dear Jan,
thanks for the positive feedback on the word splitter. Hope it turns out
to be useful for our research community.
Reg. RV 1.1.1: The analysis does not imply that dhAtama is
morphologically derived from dA "to give", although one may get this
impression by the term "giving" in that line. "giving" is just a coarse
word semantic annotation of dhAtama, which is - it's meant to be coarse!
- not too far away from Jamison + Brereton 2014 ("most richly conferring
treasure"). Same for the English terms (if any) in other lines.
Best wishes, Oliver
On 29/08/2018 09:58, Jan E.M. Houben wrote:
> Dear Oliver,
> Congratulations and thanks for sharing again a very useful research tool.
> Also for the tool you shared earlier (see below),
> which, incidentally, contains a mistake in the very first line:
> 1#1#1#2#2ratnadhātamam#2#dhātamam#dhātama###219609#4443604#1#ADJ#3#1#1#_##giving~130047~2
> The mistake -- and you are not the only one to make it -- is that the
> adjectival word part -dhātama- (you have chosen to neglect tama,
> probably consciously) is not derived from dā (cp. Gk. didoomi "I give,
> confer") but from dhā (cp. Gk. tithēmi "I establish").
> Herzliche Grüße,
> Jan
>
> ***
> I would like to announce the release of a full annotation of the Rigveda
> with morphological, lexical and verb-argument information.
>
> Data are stored in a publicly accessible repository at
> https://git.adwmainz.net/open/rigveda
>
> Details of the annotation process are described in the LREC paper,
> which is
> stored at the upper level of the repository.
>
>
>
>
> On Wed, 29 Aug 2018 at 07:24, Oliver Hellwig via INDOLOGY
> <indology at list.indology.info <mailto:indology at list.indology.info>> wrote:
>
> Dear all,
>
> Sebastian Nehrdich and I have developed a machine learning model that
> splits Sandhis and compounds in "raw" Sanskrit text.
>
> You find further details, model, code and the data it was built with
> (~600.000 lines of Sanskrit text from the DCS) at
> https://github.com/OliverHellwig/sanskrit/tree/master/papers/2018emnlp
>
> The pdf in the github directory contains further technical
> information.
>
> If you know researchers who work on this topic and may be
> interested in
> the model or the data, it would be great if you could forward this
> mail
> to them.
>
> Oliver
>
> ---
> Oliver Hellwig
> IVS Zurich / SFB 991, Düsseldorf
>
>
> _______________________________________________
> INDOLOGY mailing list
> INDOLOGY at list.indology.info <mailto:INDOLOGY at list.indology.info>
> indology-owner at list.indology.info
> <mailto:indology-owner at list.indology.info> (messages to the list's
> managing committee)
> http://listinfo.indology.info (where you can change your list
> options or unsubscribe)
>
>
>
> --
>
> *Jan E.M. Houben*
>
> Directeur d'Études, Professor of South Asian History and Philology
>
> /Sources et histoire de la tradition sanskrite/
>
> École Pratique des Hautes Études (EPHE, PSL - Université Paris)
>
> /*Sciences historiques et philologiques */
>
> 54, rue Saint-Jacques, CS 20525 – 75005 Paris
>
> /johannes.houben at ephe.sorbonne.fr
> <mailto:johannes.houben at ephe.sorbonne.fr>/
>
> /johannes.houben at ephe.psl.eu <mailto:johannes.houben at ephe.psl.eu>/
>
> /https://ephe-sorbonne.academia.edu/JanEMHouben/
>
> 1506959459738_Signature
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://list.indology.info/pipermail/indology/attachments/20180829/2eed15a9/attachment.htm>
More information about the INDOLOGY
mailing list