SARIT's flawless flow into philology /// Re: [INDOLOGY] SARIT updates
wujastyk at GMAIL.COM
Thu Jan 31 19:34:32 UTC 2013
On 31 January 2013 16:57, Jan E.M. Houben <jemhouben at gmail.com> wrote:
> When the level of perfection is so high:
> - at one place I saw "Rao" instead of "Rau" in connection with VP.
Thanks for spotting this. It's fixed. The raw XML file in "downloads" is
updated, but the copy in the SARIT/Philologic system won't be updated for a
couple of weeks.
Incidentally, while I'm glad to help, in the long run the effort to update
files and fix corrections is public. You may do this yourself, at the
SARIT Github home, as described on the SARIT front page. It takes some
computer-savvy, but SARIT is potentially a community project, at least in
some key regards. If you feel like taking the Mahabharata and tagging all
the geographical names, for example, feel free. You can then feed that
updated file back into Github, and the new tagging will be there for all to
> - I regret that words remain improperly joined following devanagari
> consonant-vowel mergers as in uktaH and evamuktaH which need to be searched
> separately (wildcards possible but leads to other problems: cp. evamuktaH
> and compounds with -muktaH).
Yes, this is a real question. In SARIT, we mostly host files that are in
Devanagari-script style spacing. At
Amba Kulkarni and her team demonstrate that such files can be
algorithmically parsed and word-separations can be inserted automatically,
and rather successfully. A future release of SARIT may incorporate this
technology, which is Open-Access. We want, also, to run a Romanized and a
Devanagari service side-by-side, so that we also serve our audience in
India in an appropriate manner. There are technologies for all these
things, and they work. But just at the present time, we have concentrated
on building up the size of the corpus.
In defence of the current situation, in my experience if one really masters
the syntax of the Grep <http://en.wikipedia.org/wiki/Grep> searching that
SARIT supports, surprisingly sophisticated and precise searches can be
achieved, even with Devanagari-style files.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the INDOLOGY