Convert Devanagari into romanized text: any automatic tools?

Birgit Kellner birgit.kellner at UNIVIE.AC.AT
Sat Mar 20 17:30:05 UTC 2004


Dear fellow list-members,

I was wondering whether there are any utilities which can convert
Devanagari writing into romanized text and also implement proper
word-separation. I doubt that something like this would be possible, but
maybe I'm thinking along the wrong lines.

A main difficulty I see is that spacing in Devanagari and romanized text
is not identical, and that it is not possible to automatically convert
Devanagari spacing into romanized spacing.

Take, for instance, DharmakIrti's PramANavArttika verse 301:

kriyAsAdhanam ity eva sarvaM sarvasya karmaNaH |
sAdhanaM na hi tat tasyAH sAdhanaM yA kriyA yataH ||

If I were to write this up for Devanagari presentation in LaTeX, for
instance, I would join strings of words like this:

kriyAsAdhanamityeva sarvaM sarvasya karmaNaH |
sAdhanaM na hi tattasyAH sAdhanaM yA kriyA yataH ||

If a program were to convert the text written for Devanagari into text
suitable for romanized presentation, it would have to have the full
Sanskrit lexicon available, with all paradigms and forms of inflection,
and it would have to compute all Sandhi rules. (Most likely even this
would produce garbage, there being too many possibilities in the
lexicon, but perhaps with a more context-sensitive lexicon this might
work.) But even then, how is the program to know that "kriyAsAdhanam" is
a compound and not a phrase "kriyA sAdhanam"?

In systems where text is entered in romanized form and then converted to
Devanagari through some processor (like LaTeX), this could be
circumvented with a hyphenation convention. One could write
"kriyA-sAdhanam" and write a program that (a) ignores dashes for
Devanagari conversion and (b) takes dashes into consideration when
converting from Devanagari to proper romanized text ("leave hyphenated
words together, and perform separations on all other strings").

But when you have e.g. a straightforward Devanagari text, not hyphenated
or tagged in any form, how could you possibly handle such ambiguities?

Thanks in advance for any suggestions and comments,

best regards,

Birgit Kellner





More information about the INDOLOGY mailing list