[INDOLOGY] New version of the Digital Corpus of Sanskrit
hellwig7 at gmx.de
hellwig7 at gmx.de
Sun Jul 17 14:36:10 UTC 2016
after several years, a new version of the Digital Corpus of Sanskrit has
come out. It contains, among other texts, the complete morphological and
lexical annotation of the Mahabharata except for three prose chapters.
Although you are still redirected from the old URL, you may note the new web
A few notes on the new release:
(1) I find the multi-word search rather useful: You can now search for text
lines that must contain two or more lemmata (click on the "Add to multi-word
q." links after a search result on the query page). To start with, try
something popular such as rāma and sītā; will display all text lines that
contain any inflected form of rāma and sītā.
(2) Global and text dictionaries have been merged into one. Contrary to
former versions, the lexicographic database now contains all lemmata given
in my digital dictionary, even if they don't occur in a text.
(3) You should, in principle, be able to type IAST Unicode directly in the
(4) The information contained under "Similar and related words" is only a
gimmick at the moment, at least for less popular words. It displays the
cosine similarity between neural embeddings built with word2vec
(https://en.wikipedia.org/wiki/Word_embedding for more information). They
seem to capture some semantic similarites; check, for instance, 'rāma' or
unresponsive when JS is deactivated in your browser.
(6) Not sure why I chose the former design. The readability of the site
should now be better, esp. on small screens.
(7) Access to parts of the semantic annotation layer will be added in the
I'm considering quite seriously to make this version of the DCS open source.
If you are interested in collaborating, please send me your github user
name, so that I can invite you to the project.
Finally, my special thanks go to the patient and helpful team of the KJC at
the University of Heidelberg!
Oliver Hellwig, University of Düsseldorf, Germany
More information about the INDOLOGY