A misconception regarding the PDF format (Re: Text processing in Unicode

Sun Mar 28 01:16:07 UTC 2010

Dear George,

I agree with you wholeheartedly that PDF documents in Tamil, Sanskrit, and other Indian languages be searchable. 

Whatever might have been the origin of PDF, today across all levels of government and business in US, PDFs in English are used to store textual materials. Most solicitations, RFPs, and RFQs, and supporting documents are posted and stored as PDFs. These are used by the responding businesses to develop proposals. Without the ability to search these PDFs, many business development and sales functions in private sector will slow down significantly. Almost inevitably, text from these PDFs are copied and pasted into Word to create new Word dand PDF documents such as proposals. Business developers do it routinely. The text from such PDFs are also copied and inserted in email correspondence between business firms who are team members responding to an RFP. It is done all the time. Portability of data between PDFs and other applications is a must these days. PDF, Word, and email do not function as silos in government and business. Here is a case study of Nuance implementation at US Department of Defense,  http://WWW.NUANCE.COM/imaging/pdf/cs_PDF_DefenseContract.pdf . The following text gas been copied and pasted from the PDF document.

"To streamline their information workflows, one of the major U.S. DoD agencies looked to the power of PDF. They needed an affordable tool that would allow them to turn paper into fully searchable digital documents that could be easily re-purposed, secured and archived. They also needed the ability to work with static PDF forms (paper forms which have been scanned) and make them fillable as well as annotate and edit PDF documents if necessary." 

Calling PDF 'E-Paper' is not accurate any more in the real world.

Of course, I reached the PDF document using Google search. 

Regards,
Palaniappan

On Mar 26, 2010, at 9:16 AM, George Hart wrote:

> I have been playing around with unicode in both Tamil and Devanagari.  On the Mac (Snow Leopard), it is not possible to search pdf's in either writing system -- nor is it possible to use Acrobat to export such files into rtf or other editable format.  Using Nisus on the Mac, searching works perfectly for both writing systems, and Rajam's problem does not appear.  Many documents are available as pdf's, and it is quite important that they be searchable.  Unfortunately, that is not the case at this point with at least two important Indic writing systems.  George Hart=