This is very great indeed.
I could not run and test.
I am just curious to know how the compound word पटप्रतियोगिकघटानुयोगिकाभावः or   पीताम्बरकृष्णः would split.

Thanks

On Wed, Aug 29, 2018 at 9:01 PM Jan E.M. Houben via INDOLOGY <indology@list.indology.info> wrote:
Dear Oliver,
I hope to be able to use the sandhi and word splitter, it will definitely be useful. 
As for RV 1.1.1: in no way does it affect your syntactic analysis which is your main aim, but even in a coarse annotation dA and dhA should not and need not be confounded; in fact, not J&B but, half a century earlier, Geldner showed the way to a more correct interpretation, not in his translation but in his note ad loc... 
Best, 
Jan

On Wed, 29 Aug 2018 at 12:02, Oliver Hellwig <hellwig7@gmx.de> wrote:

Dear Jan,

thanks for the positive feedback on the word splitter. Hope it turns out to be useful for our research community.

Reg. RV 1.1.1: The analysis does not imply that dhAtama is morphologically derived from dA "to give", although one may get this impression by the term "giving" in that line. "giving" is just a coarse word semantic annotation of dhAtama, which is - it's meant to be coarse! - not too far away from Jamison + Brereton 2014 ("most richly conferring treasure"). Same for the English terms (if any) in other lines.

Best wishes, Oliver


On 29/08/2018 09:58, Jan E.M. Houben wrote:
Dear Oliver,  
Congratulations and thanks for sharing again a very useful research tool. 
Also for the tool you shared earlier (see below), 
which, incidentally, contains a mistake in the very first line:
1#1#1#2#2ratnadhātamam#2#dhātamam#dhātama###219609#4443604#1#ADJ#3#1#1#_##giving~130047~2
The mistake -- and you are not the only one to make it -- is that the adjectival word part -dhātama- (you have chosen to neglect tama, probably consciously) is not derived from dā (cp. Gk. didoomi "I give, confer") but from dhā (cp. Gk. tithēmi "I establish"). 
Herzliche Grüße, 
Jan

*** 
I would like to announce the release of a full annotation of the Rigveda 
with morphological, lexical and verb-argument information.

Data are stored in a publicly accessible repository at
https://git.adwmainz.net/open/rigveda

Details of the annotation process are described in the LREC paper, which is 
stored at the upper level of the repository.




On Wed, 29 Aug 2018 at 07:24, Oliver Hellwig via INDOLOGY <indology@list.indology.info> wrote:
Dear all,

Sebastian Nehrdich and I have developed a machine learning model that
splits Sandhis and compounds in "raw" Sanskrit text.

You find further details, model, code and the data it was built with
(~600.000 lines of Sanskrit text from the DCS) at
https://github.com/OliverHellwig/sanskrit/tree/master/papers/2018emnlp

The pdf in the github directory contains further technical information.

If you know researchers who work on this topic and may be interested in
the model or the data, it would be great if you could forward this mail
to them.

Oliver

---
Oliver Hellwig
IVS Zurich / SFB 991, Düsseldorf


_______________________________________________
INDOLOGY mailing list
INDOLOGY@list.indology.info
indology-owner@list.indology.info (messages to the list's managing committee)
http://listinfo.indology.info (where you can change your list options or unsubscribe)


--

Jan E.M. Houben

Directeur d'Études, Professor of South Asian History and Philology

Sources et histoire de la tradition sanskrite

École Pratique des Hautes Études (EPHE, PSL - Université Paris)

Sciences historiques et philologiques 

54, rue Saint-Jacques, CS 20525 – 75005 Paris

johannes.houben@ephe.sorbonne.fr

johannes.houben@ephe.psl.eu

https://ephe-sorbonne.academia.edu/JanEMHouben

1506959459738_Signature






_______________________________________________
INDOLOGY mailing list
INDOLOGY@list.indology.info
indology-owner@list.indology.info (messages to the list's managing committee)
http://listinfo.indology.info (where you can change your list options or unsubscribe)