Finding Indological full-book PDFs on Google Books

Paul G. Hackett ph2046 at COLUMBIA.EDU
Mon Jun 18 12:34:23 UTC 2007


Dear Colleages,

At 2:17 PM +1000 6/18/07, Antonio Ferreira-Jardim wrote:
>A much better scanned version of Burnouf's "Histoire" is available
>from the DLI here: http://tinyurl.com/2q88jj

There are are actually several "hosts" for the DLI ("Digital Library 
of India"):

http://tera-2.ul.cs.cmu.edu/
http://dli.iiit.ac.in/
http://www.new.dli.ernet.in/
http://www.dli.ernet.in/
etc.

which, to no surprise, are *not* actually mirrored hosts, but rather 
contain different books, meta-data, interfaces, etc. which 
*sometimes* overlap in content.  Also, some appear to be up at all 
times; some appear to go offline from time to time. Moreover, I would 
point out that the quality of scans varies there as well 
(inconsistent page sizes, cropped pages, missing or corrupted page 
images, etc.), and with only individual page image TIFFs (presumably 
for bandwidth reasons), offers another usability challenge.

   Nonetheless, having spent countless hours scanning books myself, 
whether individual TIFF images from the DLI or PDFs from Google, I 
would much rather scan one or two pages to fix an otherwise complete 
book, than to have to scan the entire book myself.

    I say this because while I think it is easy to find fault with any 
initiative of this scope, and as a librarian I completely agree with 
the sentiments expressed thus far about quality control issues and 
all that, I would, however, *still* defend Google in their 
enterprise, because I think it's still better than *not* having it, 
and Google Books, for all it's flaws *is* e-text searchable, and I 
have already found numerous references to materials that I would have 
never even known existed, much less been able to locate or access.
    To give a small example: I have been working for sometime now on 
issues related to early twentieth century Sikkim/India/Tibet border 
communities.  By simply performing a keyword search for "Darjeeling" 
(or "Darjiling" or "Kalimpong" or "Lachen" or "Yatung", etc.) I have 
located countless memoirs and travelogs of individuals who visited 
these areas in the late nineteenth & early twentieth century and 
whose accounts of places and individuals I would have never found 
through conventional means.  This fact alone makes Google Books and 
invaluable resource for me.

    Similarly, if you explore the DLI, you will find an unbelievably 
large quantity of scanned Sanskrit books.  A few years ago I began 
working with some of the engineers at ABBYY and have been training 
Abbyy FineReader to do OCR for Devanagari.  For me, despite their 
flaws, the DLI page images have been yet another very convenient 
resource for training the recognition engine, above and beyond just 
having the content available.  The fact that Oliver Hellwig has 
created a Devanagari OCR program (which I look forward to testing) 
offers yet another reason to celebrate an abundance of such data 
given the potential for rendering all of those materials e-text 
searchable as well.

Perhaps I'm belaboring the obvious, but I would just like to plead 
for some "perspective" here, even if Google or DLI or anyone else 
doesn't actually hit the mark of perfection in their efforts.

Paul Hackett
Columbia University





More information about the INDOLOGY mailing list