Finding Indological full-book PDFs on Google Books

Sun Jun 17 08:40:45 UTC 2007

Am Sun, 17 Jun 2007 10:00:28 +0200 hat Peter Friedlander  
<P.Friedlander at LATROBE.EDU.AU> wrote:

> Dear List members,
> (...)
> To be able to search text meaningfully it needs to be carefully  
> proofread as it is digitised,
> many of the current projects do not do this, it is time consuming and  
> expensive.
> plus, OCR of diacritical marks, let alone indic scripts, is not  
> straightforward.

what do you mean by straightforward? diakritical marks are usually not  
recognized by any of the standard (office) ocr software products such as  
finereader etc.
as regards devanagari i don't know of any reliable program except the  
marvellous one, my colleague Oliver Hellwig has created. it works well for  
devanagari and already contains the standard font faces used by the old  
indian printing presses (like e.g. Venkateshvara Press, Bangabasi Steam  
Press, etc.)
apart from certain unavoidable intricacies (like consonant clusters  
vocalized with short i) it works very well, but, as is the case with every  
OCR, proofreading remains inevitable. digitizing texts is time consuming  
and will remain so, but at the same time it is very rewarding, as the  
Göttinger Register of Electronic Texts in Indian Languages (GRETIL,  
http://www.sub.uni-goettingen.de/ebene_1/fiindolo/gretil.htm )  
demonstrates.
the OCR I mentioned reads devanagari and transcribes it according to  
sanskrit phonology into roman characters with diacritical marks. the  
program is freely available at: http://www.sanskritreader.de/ (> software).

cheers

jn

_________________
Jürgen Neuß, M.A.

Freie Universität Berlin
Institut für die Sprachen und Kulturen Südasiens
Königin-Luise-Str. 34 a
D-14195 Berlin