Finding Indological full-book PDFs on Google Books

David Magier magier at COLUMBIA.EDU
Mon Jun 18 11:08:07 UTC 2007

Dear All,
another angle on mass digitization projects (perhaps less true of Google 
than of other, unnamed mass projects whose name begins with a large number) 
that has librarians and scholars very concerned is the impact of the 
digitization process on fragile original books and manuscripts. SOME such 
projects are now well known march into libraries and archives (particularly 
underfunded ones in the Subcontinent), to "wow" the local staff and 
administrators with high-end equipment and high-profile publicity, secure 
agreements and grandiose press releases, and then proceed to do shoddy 
digitization work (including horrible metadata) while literally destroying 
the books in the process. As librarians, we are concerned about preserving 
original materials and content as part of the effort of preserving 
knowledge for future generations. In the rush to digitize, many people 
(certainly the general public) have lost sight of what for us is a basic 

  "Digitization is a wonderful medium for DISSEMINATION. It is not a method 
of long-term PRESERVATION."

For the latter, one must use conservation techniques to extend the life of 
the book, and/or duplication of the content onto a proven long-term storage 
medium (and format) that will be usable many generations into the future. 
This latter category includes microfilm (chemical/physical studies say 
microfilm, properly produced and stored, will last and be readable -- 
without any technology other than a lens -- up to 500 years from now), as 
well as archival-quality preservation photocopying onto acid-free paper 
(which, properly bound, creates a new copy of the book that should last 
many hundreds of years). Does anyone really believe that the thousands of 
books being digitized now are going to still be usable, as current-standard 
PDFs on their hard drives or CD-ROMs, even 50 years from now??

Digital file content can (and some probably will) be carried forward in 
usable formats into the future only by very active, very expensive, and 
ONGOING permanent intervention via constant "refreshment" of the data into 
each successive wave of current file formats and storage devices, as the 
technologies involved continue to change at ever-increasing rates. Who is 
going to make that investment, continually, into the future? For which 
specific materials? At a reasonable guess, there will be lots of attrition 
and lots of content will fall behind. (And if the original books from which 
it was derived are not *preserved* as above, then the books and their 
content are lost forever). Tt is really only the most commercially valuable 
content that will continue to get the digital preservation investment 
needed to refresh the data and keep the content viable. Do we really 
believe that our indological books fall into that category?

I'm a strong advocate of digitization and dissemination, but I am 
constantly fighting against a widespread, general misunderstanding under 
which people feel that once a book as been digitized we can rest easy: it 
has been "taken care of". Particularly given the dismal actual record of 
what happens to books getting digitized, I feel scholars everywhere must 
take much more active notice of the distinction between digitization and 
preservation, and must make sure that appropriate attention is given to the 
latter, even if it is so much less "sexy" than the former.

David Magier
South Asia Librarian
Columbia University


President, Center for South Asia Libraries

--On June 17, 2007 9:00:46 PM -0700 Jonathan Silk <silk at> 

> Dear  Tim,
> Just a quick note: I do understand the logic that something can be better
> than nothing. But I think the concern is that if one is going to do
> something, it should be done right (not as much as a philosophical stance
> as a practical one). And if something is done by Google, even if badly,
> is it then likely that it will be done later better? Is it not a case of
> bad coin driving out good?
> Then, specifically:
>> .  I did look at the pages Jonathan mentioned, and although several
>> had distortions along the left edge of the pages, they were quite
>> legible.
> Page 211--left side distorted, but yes, legible
> 212: approximately 1/5 of the [right side of the] page missing because it
> was placed on the scanner at a diagonal--to me, this does not count as
> 'quite legible'
> 213 more or less = 211
> 214 --the page was moved during scanning, such that a large part of the
> right side is indeed not legible.
> All of these problems and worse can be found throughout the whole
> book--at a very rough guess about every second or third page has this
> type of trouble, which almost systematically leaves part of the text
> legible-- the rate of trouble is astonishing (and far beyond that even of
> the old Indian reprints, or the work of even sloppy student assistants).
> I would not fear contradicton to say that  as now available Burnouf's
> book is unreadable in the Google version. And this does not even address
> the issue of huge portions of some books missing, the book scanned not
> being the book catalogued (e.g., who in the Google group would scan PW
> when  their records indicate that they already have it? yet, as I said,
> it is not PW at all...) etc
> Sorry--my quick note was not so quick. With this, I'm done with this
> topic, with the wish that those of us with a professional interest in a
> relatively narrow field might profitably discuss (in future, in a
> different forum?) how to prepare the relatively limited corpus of key
> materials we all are likely to find useful to have on our hard-drives.
> --
> Jonathan Silk
> Department of Asian Languages & Cultures
> Center for Buddhist Studies
> 290 Royce Hall
> Box 951540
> Los Angeles, CA 90095-1540
> phone: (310) 206-8235
> fax:  (310) 825-8808
> silk (at)
>  From July 15, 2007:
> Prof. Dr. Jonathan Silk
> Instituut Kern / Universiteit Leiden
> Postbus 9515
> 2300 RA Leiden

More information about the INDOLOGY mailing list