Text processing in Unicode
rajam
rajam at EARTHLINK.NET
Fri Mar 26 04:40:51 UTC 2010
This may also happen based on the specific operating system. I use an
old OS X (10.4)-based Mac PowerBook G4.
I copied a small paragraph and have pasted it below. It seems to have
turned out fine (except for a couple of minor glitches -- with
respect to the letters r and ai). The latest OS may have resolved
this problem:
===============
நன்றியுைர
ெசன்ைன வாெனாலி
நிைலயத்தான்
ஏற்பாட்டின்படி
"ெசால்வன்ைம"
என்னும் ெபாருள்பற்றிப்
பள்ளி மாணவரக்காக, ௧௯௫௨
ஜூைல௧௪, ௨௮,
ஆகஸ்ட் 11,25, ெசப்ெடம்பர் 15
ஆகிய ஐந்து நாட்களில்
ஐந்து தைலப்பில்
வாெனாலியில் ேபச ே
நர்ந்தது. ேபசியவற்ைற
நூல்வடிவில் ெவளியிட
வாெனாலி நிைலயத்தார்
அனுமதி தந்தனர்.
அவர்கட்கு நன்றி
கூறுகின்ேறன்.
==================
As for searching, the typing into the search field may pose a
problem, since the keyboard may be different and the ASCII values are
not properly read-in. The best way is to copy an instance of the
desired item and paste it into the search field and hit the Enter/
Return key as we go along.
Best,
--vsr
On Mar 25, 2010, at 9:19 PM, Dipak Bhattacharya wrote:
> I can speak of two problems. The characters/signs are sometimes
> dispersed when saved in MSWord and a systematic replacement of the
> signs in the keyboard. The seond problem rises because of
> unintended bad programming during processing and can be removed by
> restarting.the computer. The first problem rises only when some
> editing is attempted in the MS Word format ie not in the Indian
> script processor. I avoid editing such files in MSWord formaat.
> Best
> DB
>
> --- On Fri, 26/3/10, Sudalaimuthu Palaniappan <palaniappa at AOL.COM>
> wrote:
>
>
> From: Sudalaimuthu Palaniappan <palaniappa at AOL.COM>
> Subject: Text processing in Unicode
> To: INDOLOGY at liverpool.ac.uk
> Date: Friday, 26 March, 2010, 8:41 AM
>
>
> Dear Indologists,
>
> I am seeing some problems in text processing in Tamil texts created
> using Unicode fonts.
> Consider the following text in Project Madurai.
> http://www.projectmadurai.org/pm_etexts/pdf/pm0323.pdf
>
> According to the cover page, "This pdf file is based on Unicode
> with corresponding Latha font embedded in the file. Hence this file
> can be viewed and printed on all computer platforms: Windows,
> Macintosh and Unix without the need to have the font installed in
> your computer."
>
> When I searched the text for the string தான் (tAn2), I hit
> not only தான் but also தோன் (tOn2) !
>
> Has anyone processing (searching, sorting) Unicode texts in
> Sanskrit or other Indian languages encountered any problems like
> the above?
>
> (Needless to say, when one copies the text from PDF and pastes in
> email, one gets messed up text like this. நாககம்
> இல்லாத மிகப் பழங்
> காலத்தில் மனிதர்கள்
> வடுீ கட்டத் ெதயாமல்
> குைககளில்
> வாழ்ந்தார்களாம். அந்தப்
> பழஙகாலத்ைதக் கற்காலம்
> என்று ெசால்லுகிேறாம்.)
>
> (However, a draft report by an Expert Committee on Technology
> Standards for Indian Languages
> (http://egovstandards.gov.in/apex-review/egscontent.
> 2009-06-10.5999916108/at_download/file) claims:
> All major operating systems, browsers, editors, word processors
> and other applications & tools are supporting Unicode.
> It is possible to use Indian languages and scripts in the
> Unicode environment, which will resolve the compatibility issue.
> The documents created using Unicode may be searched very easily
> on the web.
> As Unicode is widely recognized all over the world and also
> supporting Indian languages, it will ease Localization applications
> including e-Governance application
> for all the constitutionally recognized Indian languages.
> Since Indian languages are also used in the other part of the
> world, it is possible to have Global data exchange.)
>
> Thanks in advance.
>
> Regards,
> Palaniappan
>
>
>
> Your Mail works best with the New Yahoo Optimized IE8. Get it
> NOW! http://downloads.yahoo.com/in/internetexplorer/
More information about the INDOLOGY
mailing list