A misconception regarding the PDF format (Re: Text processing in Unicode

rajam rajam at EARTHLINK.NET
Fri Mar 26 21:33:30 UTC 2010

PDF documents are searchable--but we have to abide by the rules of  
the PDF technology or we should device our own technique to get  
around them.

We need to respect the technology (PDF or other) which has its own  
characteristics as any other software in the industry.

I agree with JLC that PDF files are "E-paper" and the format was not  
"primarily invented for being a text storage format and it has never  
been guaranteed that round-trip conversions is always possible  
between PDF files and text files."

I'd like to add that expecting something "post-inventional" won't  
help us unless we do something about it -- for example, tell the  
creators/inventors of the software what we want to see the software  
do for us now or in the future. That's why the IT world has "tech  
support" departments and "feedback" channels.

Most importantly, I feel that our wishes like this one (that PDF  
documents should "be searchable") would be more effective if we  
direct them to the IT industry (for example to Adobe or any PDF  
developers) rather than expressing them only here in an academic  
forum as if we are just complaining about technology.


On Mar 26, 2010, at 7:16 AM, George Hart wrote:

> I have been playing around with unicode in both Tamil and  
> Devanagari.  On the Mac (Snow Leopard), it is not possible to  
> search pdf's in either writing system -- nor is it possible to use  
> Acrobat to export such files into rtf or other editable format.   
> Using Nisus on the Mac, searching works perfectly for both writing  
> systems, and Rajam's problem does not appear.  Many documents are  
> available as pdf's, and it is quite important that they be  
> searchable.  Unfortunately, that is not the case at this point with  
> at least two important Indic writing systems.  George Hart

More information about the INDOLOGY mailing list