Re: Convert Devanagari into romanized text: any automatic tools?
Somadevah at AOL.COM
Sat Mar 20 19:16:16 UTC 2004
In a message dated 20/3/04 5:30:43 pm, birgit.kellner at UNIVIE.AC.AT writes:
> I was wondering whether there are any utilities which can convert
> Devanagari writing into romanized text and also implement proper
> word-separation. I doubt that something like this would be possible, but
> maybe I'm thinking along the wrong lines.
Going from Roman to Devanaagarii is comparatively easy. Probably the best way
is to write a PERL text filter. For instance, if you take the huge and
fabulous (despite many typos) collection of "saastric etexts available from the
Raasthriya Sanskrita Vidyaapeetha (http://sansknet.org/) you will need to convert
them to whatever encoding you use. They have helpfully provided a couple of
conversion programs to do this. For me (mac os x) these produced a number of
unexpected errors (tp for pt etc.) because there are often several ways to input
conjunct characters, so I ended up spending an afternoon writing a PERL
filter. Still not perfect but the remaining errors can be removed by hand when
reading through the texts.
I have never tried word-separation but it is not impossible. There are
electronic Sanskrit parsers available, see for instance "Desika" at
(happy to send the PERL filter for Sansknet to anyone who needs it)
More information about the INDOLOGY