Finding Indological full-book PDFs on Google Books
JN
jneuss at ARCOR.DE
Sun Jun 17 08:40:45 UTC 2007
Am Sun, 17 Jun 2007 10:00:28 +0200 hat Peter Friedlander
<P.Friedlander at LATROBE.EDU.AU> wrote:
> Dear List members,
> (...)
> To be able to search text meaningfully it needs to be carefully
> proofread as it is digitised,
> many of the current projects do not do this, it is time consuming and
> expensive.
> plus, OCR of diacritical marks, let alone indic scripts, is not
> straightforward.
what do you mean by straightforward? diakritical marks are usually not
recognized by any of the standard (office) ocr software products such as
finereader etc.
as regards devanagari i don't know of any reliable program except the
marvellous one, my colleague Oliver Hellwig has created. it works well for
devanagari and already contains the standard font faces used by the old
indian printing presses (like e.g. Venkateshvara Press, Bangabasi Steam
Press, etc.)
apart from certain unavoidable intricacies (like consonant clusters
vocalized with short i) it works very well, but, as is the case with every
OCR, proofreading remains inevitable. digitizing texts is time consuming
and will remain so, but at the same time it is very rewarding, as the
Göttinger Register of Electronic Texts in Indian Languages (GRETIL,
http://www.sub.uni-goettingen.de/ebene_1/fiindolo/gretil.htm )
demonstrates.
the OCR I mentioned reads devanagari and transcribes it according to
sanskrit phonology into roman characters with diacritical marks. the
program is freely available at: http://www.sanskritreader.de/ (> software).
cheers
jn
_________________
Jürgen Neuß, M.A.
Freie Universität Berlin
Institut für die Sprachen und Kulturen Südasiens
Königin-Luise-Str. 34 a
D-14195 Berlin
More information about the INDOLOGY
mailing list