[INDOLOGY] Sandhi and compound splitting model

Jan E.M. Houben jemhouben at gmail.com
Wed Aug 29 07:58:29 UTC 2018


Dear Oliver,
Congratulations and thanks for sharing again a very useful research tool.
Also for the tool you shared earlier (see below),
which, incidentally, contains a mistake in the very first line:
1#1#1#2#2ratnadhātamam#2#dhātamam#dhātama###219609#4443604#1#ADJ#3#1#1#_##giving~130047~2
The mistake -- and you are not the only one to make it -- is that the
adjectival word part -dhātama- (you have chosen to neglect tama, probably
consciously) is not derived from dā (cp. Gk. didoomi "I give, confer") but
from dhā (cp. Gk. tithēmi "I establish").
Herzliche Grüße,
Jan

***
I would like to announce the release of a full annotation of the Rigveda
with morphological, lexical and verb-argument information.

Data are stored in a publicly accessible repository at
https://git.adwmainz.net/open/rigveda

Details of the annotation process are described in the LREC paper, which is
stored at the upper level of the repository.




On Wed, 29 Aug 2018 at 07:24, Oliver Hellwig via INDOLOGY <
indology at list.indology.info> wrote:

> Dear all,
>
> Sebastian Nehrdich and I have developed a machine learning model that
> splits Sandhis and compounds in "raw" Sanskrit text.
>
> You find further details, model, code and the data it was built with
> (~600.000 lines of Sanskrit text from the DCS) at
> https://github.com/OliverHellwig/sanskrit/tree/master/papers/2018emnlp
>
> The pdf in the github directory contains further technical information.
>
> If you know researchers who work on this topic and may be interested in
> the model or the data, it would be great if you could forward this mail
> to them.
>
> Oliver
>
> ---
> Oliver Hellwig
> IVS Zurich / SFB 991, Düsseldorf
>
>
> _______________________________________________
> INDOLOGY mailing list
> INDOLOGY at list.indology.info
> indology-owner at list.indology.info (messages to the list's managing
> committee)
> http://listinfo.indology.info (where you can change your list options or
> unsubscribe)
>


-- 

*Jan E.M. Houben*

Directeur d'Études, Professor of South Asian History and Philology

*Sources et histoire de la tradition sanskrite*

École Pratique des Hautes Études (EPHE, PSL - Université Paris)

*Sciences historiques et philologiques *

54, rue Saint-Jacques, CS 20525 – 75005 Paris

*johannes.houben at ephe.sorbonne.fr <johannes.houben at ephe.sorbonne.fr>*

*johannes.houben at ephe.psl.eu <johannes.houben at ephe.psl.eu>*

*https://ephe-sorbonne.academia.edu/JanEMHouben
<https://ephe-sorbonne.academia.edu/JanEMHouben>*

[image: 1506959459738_Signature]


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://list.indology.info/pipermail/indology/attachments/20180829/1a4966b6/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Outlook-1506959459.jpg
Type: image/jpeg
Size: 7300 bytes
Desc: not available
URL: <https://list.indology.info/pipermail/indology/attachments/20180829/1a4966b6/attachment.jpg>


More information about the INDOLOGY mailing list