A misconception regarding the PDF format (Re: Text processing in Unicode
rajam at EARTHLINK.NET
Fri Mar 26 19:27:56 EDT 2010
Just wanted to make sure that we understand the principle behind the
acronym PDF ( "Printable Document Format").
So, if we want to have a searchable PDF, we should ask the powers
that are in the IT industry to develop something like an
"SPDF" ("Searchable Printable Document Format").
Hope you can understand what I mean.
On Mar 26, 2010, at 2:33 PM, rajam wrote:
> PDF documents are searchable--but we have to abide by the rules of
> the PDF technology or we should device our own technique to get
> around them.
> We need to respect the technology (PDF or other) which has its own
> characteristics as any other software in the industry.
> I agree with JLC that PDF files are "E-paper" and the format was
> not "primarily invented for being a text storage format and it has
> never been guaranteed that round-trip conversions is always
> possible between PDF files and text files."
> I'd like to add that expecting something "post-inventional" won't
> help us unless we do something about it -- for example, tell the
> creators/inventors of the software what we want to see the software
> do for us now or in the future. That's why the IT world has "tech
> support" departments and "feedback" channels.
> Most importantly, I feel that our wishes like this one (that PDF
> documents should "be searchable") would be more effective if we
> direct them to the IT industry (for example to Adobe or any PDF
> developers) rather than expressing them only here in an academic
> forum as if we are just complaining about technology.
> On Mar 26, 2010, at 7:16 AM, George Hart wrote:
>> I have been playing around with unicode in both Tamil and
>> Devanagari. On the Mac (Snow Leopard), it is not possible to
>> search pdf's in either writing system -- nor is it possible to use
>> Acrobat to export such files into rtf or other editable format.
>> Using Nisus on the Mac, searching works perfectly for both writing
>> systems, and Rajam's problem does not appear. Many documents are
>> available as pdf's, and it is quite important that they be
>> searchable. Unfortunately, that is not the case at this point
>> with at least two important Indic writing systems. George Hart
More information about the INDOLOGY