Dear all,
Sebastian Nehrdich and I have developed a machine learning model that
splits Sandhis and compounds in "raw" Sanskrit text.
You find further details, model, code and the data it was built with
(~600.000 lines of Sanskrit text from the DCS) at
https://github.com/OliverHellwig/sanskrit/tree/master/papers/2018emnlp
The pdf in the github directory contains further technical information.
If you know researchers who work on this topic and may be interested in
the model or the data, it would be great if you could forward this mail
to them.
Oliver
---
Oliver Hellwig
IVS Zurich / SFB 991, Düsseldorf
_______________________________________________
INDOLOGY mailing list
INDOLOGY@list.indology.info
indology-owner@list.indology.info (messages to the list's managing committee)
http://listinfo.indology.info (where you can change your list options or unsubscribe)
Jan E.M. Houben
Directeur d'Études, Professor of South Asian History and Philology
Sources et histoire de la tradition sanskrite
École Pratique des Hautes Études (EPHE, PSL - Université Paris)
Sciences historiques et philologiques
54, rue Saint-Jacques, CS 20525 – 75005 Paris
johannes.houben@ephe.sorbonne.fr
https://ephe-sorbonne.academia.edu/JanEMHouben