Dear INDOLOGYists,
As you may know, the SARIT service, running at http://sarit.indology.info, is a project that aims to offer freely-downloadable electronic editions of Indic texts, together with a menu-driven interface to make it easy to search, index, and do other useful things with the e-texts. E-texts in SARIT are editions in the scholarly sense: they have a transparent history and provenance, they can be cited accurately, and they represent a fixed point of reference that can be used in footnotes and bibliographies.
SARIT is the beginning of a scholarly language corpus for Sanskrit and related languages. As such, it aims to set an example of doing things "the right way." That is to say, the base e-texts are encoded ("marked-up") using the Guidelines of the Text Encoding Initiative. A big part of that is the addition of structured meta-data to the top of the files, the "TEI Header," that gives a full and structured account of the provenance of the e-text, who has done what to it, and a history of changes as it evolves and is corrected or updated.
There's more than this to the design of the SARIT project, but I'll stop there for the moment. Files prepared "the SARIT way" are useful for many purposes in computing and the humanities, far beyond SARIT itself. TEI encoding is, in fact, now the accepted standard for all serious textual work in humanities computing. There are all sorts of fascinating things one can do with e-texts, once one has materials in the TEI format.
We have been adding e-texts to SARIT for a few years now, slowly and carefully, learning all the time. There is already a sizeable corpus of materials and SARIT is already more than merely a test system. However, this week the SARIT text corpus took a giant step forward, through the addition of a new text of the complete Mahabharata, the "Southern recension."
During the last few years, Professor Shrinivasa Varakhedi, Dean and Director, Karnataka Sanskrit University, has created an e-text of the seventeen-volume "Kumbakonam" Mahabharata, i.e.,
At that location, anyone can use Git to take a copy of the XML files, and also to submit corrections and updates. This is for more advanced users, obviously, people who understand version control systems and are able to use Git, and who also have a good knowledge of Sanskrit. But in principle, it is all open to the public. If anyone makes a mess, it can easily be rolled back, since Git keeps a history of all changes.
The purpose of doing this is to be able to track the history of all changes to the SARIT files in the future, down to the byte level, with documentation and the ability to fork (and merge) versions in future, should the need arise.