OCR of (transcribed) indic texts

Peter Schreiner pesch at INDOGER.UNIZH.CH
Thu Nov 26 09:47:05 UTC 1998

Dear J. Neuss and list-members,

Concerning the request for information about OCR I may mention that I used
an (expensive!!) programme called proLektor (by improx GmbH, Rennweg 83,
A-2345 Brunn am Gebirge) to skan the Sanskrit-Deutsches Woerterbuch by K.
Mylius (cf. my project report in "Langue, style et structure dans le monde
indien : Centenaire de Louis Renou", ed. N. Balbir et G.-J. Pinault, Paris
1996, p. 413-426). The programme has to be "taught" what to interpret (any
pattern surrounded by white space); the distinction of small i, capital I,
small l, numeral 1 in normal and bold fonts remained a problem. When used
in "interactive" mode I had to enter ca. 100 letters per page (out of ca.
4000). The result is far from error free. I tested the programme with
Gujarati text which works alright (but I did not skan enough text to be
able to say whether manual transliteration would be not MUCH slower but
less error prone).

Mr. Neuss, please keep me informed about your experiences and other
feed-back addressed to you directly. With thanks and best wishes, 

Peter Schreiner
Abt. fuer Indologie
Universitaet Zuerich
Raemistr. 68
CH-8001 Zuerich


At 20:37 24.11.98 +0100, you wrote:
>Dear list-members,
>maybe some of you are aware of the difficulties which arise if one wants
>to scan transcribed (not to speak of original) indic texts. Of course
>the scanning itself is not the problem but the subsequent transformation
>of the image-file into a text-file by means of OCR (Optical character
>recognition) programs. These programs often do recognise only the usual
>set of ASCII-characters. Some of them include extended features ehich
>means that in certain cases the user may direct the program to read a
>certain difficult character in a certain way. As far as I know
>diacritical signs are a problem for at least most of these programs. If
>anyone of you has experience with OCR programs in this respect I would
>be grateful for your recommendations. Moreover I would like to know
>whether there are any OCR programs available which recognise Indian
>characters of any kind. I hope this message does not provoke any
>response which violates the non-commercial spirit of this list.
>Thanks for reading.
>jneuss at zedat.fu-berlin.de
>Juergen Neuss
>Freie Universitaet Berlin
>Institut für Indische Philologie und Kunstgeschichte
>Königin-Luise-Str. 34a
>14195 Berlin

More information about the INDOLOGY mailing list