Sanskrit OCR

Fri Jul 11 18:18:58 UTC 1997

At 09:51 7.7.1997, Lars Martin Fosse wrote:
>At 07:48 7.07.97 BST, you wrote:
>>I think I am safe in assuming that no-one has yet developed an optical
>>character recognition (OCR) system for the devanagari script.
>>
>>I know that DHH Ingalls and his son had worked on the problem some 20 years
>>ago, but I have not heard any more about it.
>>
>>In any case, does anyone know of an OCR system that is accurate for roman
>>transliteration.  Has anyone tried to input text in this way?
>
>I remember when I was in Tuebingen some years ago (1992 i think) I was told
>that they had managed to "teach" the OCR program Optopus (spelling correct)
>to scan devanagari. I had been fairly difficult, and there had been rather a
>lot of errors. But in principle, Optopus could be taught to recognize any
>pattern as a letter or a combination of lettes. I used it to scan a
>romanized Sanskrit text and taught it to give cerebral t's, d's etc. as .t,
>d etc. It worked reasonable well.

I have been told that OmniPage professional for the Macintosh can be
'trained' to convert practically anything to any desired character or
string of characters.  Since there are often problems with "i" versus "l"
in OCR I imagine the same would apply to e.g. ".h" versus "b".  As far as
Devanagari is concerned the nature of the script, where long strings of
characters are 'amalgamated' by the head-stroke there would probably be
some problem, unless the program could be taught to disregard the
headstroke!