A misconception regarding the PDF format (Re: Text processing in Unicode
rajam
rajam at EARTHLINK.NET
Fri Mar 26 21:33:30 UTC 2010
PDF documents are searchable--but we have to abide by the rules of
the PDF technology or we should device our own technique to get
around them.
We need to respect the technology (PDF or other) which has its own
characteristics as any other software in the industry.
I agree with JLC that PDF files are "E-paper" and the format was not
"primarily invented for being a text storage format and it has never
been guaranteed that round-trip conversions is always possible
between PDF files and text files."
I'd like to add that expecting something "post-inventional" won't
help us unless we do something about it -- for example, tell the
creators/inventors of the software what we want to see the software
do for us now or in the future. That's why the IT world has "tech
support" departments and "feedback" channels.
Most importantly, I feel that our wishes like this one (that PDF
documents should "be searchable") would be more effective if we
direct them to the IT industry (for example to Adobe or any PDF
developers) rather than expressing them only here in an academic
forum as if we are just complaining about technology.
--vsr
(<www.letsgrammar.org>)
On Mar 26, 2010, at 7:16 AM, George Hart wrote:
> I have been playing around with unicode in both Tamil and
> Devanagari. On the Mac (Snow Leopard), it is not possible to
> search pdf's in either writing system -- nor is it possible to use
> Acrobat to export such files into rtf or other editable format.
> Using Nisus on the Mac, searching works perfectly for both writing
> systems, and Rajam's problem does not appear. Many documents are
> available as pdf's, and it is quite important that they be
> searchable. Unfortunately, that is not the case at this point with
> at least two important Indic writing systems. George Hart
More information about the INDOLOGY
mailing list