[INDOLOGY] FYI: New version of SARIT website released

Birgit Kellner kellner at asia-europe.uni-heidelberg.de
Tue May 12 13:53:11 UTC 2015


Dear colleagues,

(apologies for cross-posting)

The SARIT project announces the release of a new version of the SARIT
web application at http://sarit.indology.info. SARIT – short for “Search
and Retrieval of Indic Texts”, but also meaning “river” in Sanskrit –
offers electronic texts in Sanskrit and other Indian languages.

With major funding within the DFG/NEH Bilateral Digital Humanities
Programme, a new website was created which offers searching and browsing
features that were specifically designed for use with texts in Sanskrit
and other Indian languages.

The website offers full Unicode support, searching with Devanāgarī and
transliterated search terms, and an NGram index to handle texts without
clearly and consistently marked word-boundaries. Search results are
returned in a Key Word in Context (KWIC) display. All texts are
available for free download in PDF and TEI-XML formats.

Programming and development have been driven by Wolfgang Meier
(http://www.existsolutions.com/), Jens Petersen and Claudius Teodorescu,
under the umbrella of the Heidelberg Research Architecture
(http://www.asia-europe.uni-heidelberg.de/de/hra-portal.html). With its
heterogeneous corpus of texts, SARIT offers an ideal test case for
developers to work towards a more general – open-source – framework that
can be reused for other TEI-based corpora.

The website has been designed for a growing corpus of texts encoded
according to the standards of the Text Encoding Initiative (TEI). Aiming
to foster the adoption of TEI among scholars and students working with
Indic texts, we have also provided detailed guidelines
(http://sarit.indology.info/exist/apps/sarit/docs/encoding-guidelines.html)
for adding TEI encoding to texts in Indian languages, and we hope that
the public will contribute texts to this initiative.

SARIT's corpus of texts is available at GitHub
(https://github.com/sarit/SARIT-corpus), where it is easy for anyone to
make additions, changes, and suggestions. To date, this corpus consists
of 28 partly voluminous texts in Sanskrit and Prakrit. We plan to add
support for other South Asian languages and scripts, including Tamil,
Kannada, and Sinhala, to the SARIT web application in the near future.

SARIT is and always will be free and open-source. All of the texts are
made available under a Creative Commons license.

The DFG/NEH-project includes teams at Columbia University and the
University of Heidelberg, directed respectively by Profs. Sheldon
Pollock and Birgit Kellner.

Links:
* The SARIT application: http://sarit.indology.info/
* The SARIT text collection: https://github.com/sarit/SARIT-corpus
* The SARIT encoding guidelines:
http://sarit.indology.info/exist/apps/sarit/docs/encoding-guidelines.html
* To suggest improvements to the search application:
https://github.com/eXistSolutions/sarit/issues/
* To add new texts: https://github.com/sarit/SARIT-corpus/issues
* Information on the DFG/NEH project:
http://www.asia-europe.uni-heidelberg.de/de/forschung/hcts-professuren/buddhismusstudien/research0/sarit.html

----------
Prof. Dr. Birgit Kellner
Chair of Buddhist Studies
Cluster of Excellence "Asia and Europe in a Global Context - The
Dynamics of Transculturality"
University of Heidelberg
Karl Jaspers Centre
Voßstraße 2, Building 4400
D-69115 Heidelberg
Phone: +49(0)6221 - 54 4301 (Office Ina Chebbi: 4363)
Fax: +49(0)6221 - 54 4012






More information about the INDOLOGY mailing list