OCR for Romanized Sanskrit with Diacritics

Alexander von Rospatt rospatt at BERKELEY.EDU
Tue May 18 16:14:05 UTC 2010

Dear Computer-Literati, 

I have been in contact with Dominik Wujastyk regarding the application of OCR to romanized Sanskrit.

Dominik responded: 

Several software packages will do that quite well, even Acrobat 9.  It's critical that the exemplar is good and that the scan is not a too low a resolution.  300dpi minimum, 400dpi+ better.  ...
If you choose one of the better contemporary OCR packages, and really learn how to use it, I believe you can get good results even for romanized Sanskrit.  The advent of Unicode has changed everything, and many software packages are now more or less obliged to be strongly multilingual and recognise a wide range of diacritcal marks...
Acrobat is the only one with Clearscan font technology, I believe, which is very good it you can use it.

I wonder about others' experiences in using OCR for this purpose. Which programs are most user-friendly, and which programs did you have the best results with?

Many thanks,

Alex Rospatt

Alexander von Rospatt, Professor and Chair
Department of South and Southeast Asian Studies
Head Graduate Adviser of the Group in Buddhist Studies 
University of California 
7233 Dwinelle Hall # 2540
Berkeley, CA 94720-2540
Phone: +1-510-6421610
Fax: +1-510-6432959
Email:  rospatt at berkeley.edu

More information about the INDOLOGY mailing list