A misconception regarding the PDF format (Re: Text processing in Unicode
jean-luc.chevillard at UNIV-PARIS-DIDEROT.FR
Fri Mar 26 17:27:57 EDT 2010
you write "Many documents are available as pdf's, and it is quite
important that they be searchable."
It is not very clear (to me) to whom this YOUR QUERY is adressed.
Is it ADRESSED to the millions of users (on this planet) who have
created those (printable) PDF-s
(and who may also have simultaneously created searchable HTML or XML files)?
Are you telling them to RESTRAIN FROM creating PDF-s as long as PDFs are
not all GOOGLE-searchable?
(although they may have created those PDFs simply for the purpose of
OR is your message adressed to the creators of the Acrobat/PDF format?
[it is not absolutely clear to me whether the ADOBE company owns the
definition of the format]
Do you want to the creators of the PDF format to set as a future goal
[for the future releases of the PDF format]
that there should be an UNFAILING possibility to make roundtrips
between TEXT and PDF formats ?
Do you want the Unicode Consortium
to suspend its existence until everybody on the planet
is able to see how UNCREDIBLY useful they have been?
I have been trying to express this jokingly
[no offense is meant to anyone!],
but the truth is that I am really WORRIED to see the word "E-governance"
making its appearance on an academic list
(SEE the (almost) last line
Le 3/26/2010 3:16 PM, George Hart a écrit :
> I have been playing around with unicode in both Tamil and Devanagari. On the Mac (Snow Leopard), it is not possible to search pdf's in either writing system -- nor is it possible to use Acrobat to export such files into rtf or other editable format. Using Nisus on the Mac, searching works perfectly for both writing systems, and Rajam's problem does not appear. Many documents are available as pdf's, and it is quite important that they be searchable. Unfortunately, that is not the case at this point with at least two important Indic writing systems. George Hart
More information about the INDOLOGY