[INDOLOGY] Transcoding

Peter Scharf scharfpm7 at gmail.com
Sat Jun 18 13:37:12 UTC 2016

Dear Peter and others,

The most convenient way to transform Sanskrit text from one encoding to another is to use the transcoding software developed by the Sanskrit Library.  This transcoding software can be used in one of two ways:

1. Strings of text of any length can be transcoded in toto by pasting them into the transcoding window on the web at: http://sanskritlibrary.org/transcodeText.html
Simply select the input and output encodings from the two menus at the bottom of the page.

2. Download the transcoding software and install it locally on your own machine and run it under Unix and transcode from and two a great number of transcodings.  On a Mac or Linux system this is easy.  I don't know how to do it on a PC.  The downloaded software permits very sophisticated delineation of which strings to transcode within a document of mixed text.  One can tag strings in a certain way, for example with specific start and end character strings or xml tags, and then transcode all strings with those tags in one way and all strings with another tag in another, e.g. transcode <s>kfzRa</s> to Devanagari and <r>kfzRa</r> to Roman.  Or one can select text within a document to transcode using regular expressions.  The software is available for download near the bottom of the alphabetical list of downloadable software on the Sanskrit Library downloads page: http://sanskritlibrary.org/downloads.html.  Look for TranscodeFile (Java program)

I have made a number of transcoding rules for my own use which I'm glad to share if you want help getting started.

Peter M. Scharf
scharfpm7 at gmail.com

On 18 Jun 2016, at 5:18 AM, Peter Flugel wrote:

> Dear Peter
> Thank you for this really interesting information.
> I have a question which you may be able to answer as well: what is the best way for transforming texts written in Nagari characters into roman script? I am trying to integrate two data bases. 
> Yours
> Peter 
> Sent from my iPhone
> On 17 Jun 2016, at 20:28, Peter Scharf <scharfpm7 at gmail.com> wrote:
>> Dear Indologists,
>> I have just completed a comparison of the ligature formation produced by several Devanagari fonts and thought it might be useful to share the results of the comparison.  I compared 1260 ligatures formed by the LaTeX Skt package with seven Unicode fonts.  The ligatures compared were the combined set of all those listed by Ulrich Stiehl in his document, Conjunct Consonants in Sanskrit, Heidelberg, 21 April 2003, pp. 4--34, and those listed in the Skt package documentation Sanskrit for LaTeX2e, pp. 22--35.
>> 1. LaTeX Skt package
>> 2. Chandas
>> 3. Uttara
>> 4. Sanskrit2003
>> 5. Praja
>> 6. Arial Unicode MS
>> 7. Devanagari MT
>> 8. Mangal
>> The LaTeX Skt package comes with the TeXLive installation available at https://www.tug.org/texlive/.  The Chandas and Uttara fonts were produced by produced by Mihail Bayaryn and are available at http://www.sanskritweb.net/cakram/.  The Sanskrit2003 font was produced by Ulrich Stiehl and is available at http://www.omkarananda-ashram.org/Sanskrit/itranslator2003.htm.  These fonts are all available free of cost.  Praja was produced by Peter Freund and is available for $35 at https://secure.bmtmicro.com/servlets/Orders.ShoppingCart?CID=5115&PRODUCTID=51150002.  Arial Unicode MS is available with Microsoft Office, FrontPage and Publisher, with the installation of international support.  Devanagari MT is available with Mac systems with the Asian languages support.  Mangal is available with Windows systems with supplemental language support.
>> The comparison showed that Chandas and Uttara are able to form all conjuncts correctly with the exception of seven sequences: ṅkṣṇva, ṅrvya, ṭhthya, dḍḍa, ddbra, ddvra, l̃la, without the interruption of an inappropriate virāma.  The LaTeX Skt package handles all but 29.  Sanskrit 2003 lacked 80, Praja 187, Arial Unicode MS 201, Devanagari MT 232, and Mangal 236.  I also checked the behavior of the fonts in handling the accents in the Devanagari extended, and Vedic extenstions Unicode pages.  Only the Praja font handled them all properly, the LaTeX Skt package handles most Vedic accentuation, while most fonts handled only the common accentual system.  A test of Vedic accents with any font can be performed by visiting the Sanskrit Library's interactive Vedic Unicode character phonetic value table at http://sanskritlibrary.org/accents.html.  Simply set your browser to use the font you would like to test.
>> The first five fonts listed are therefore commendable; the last three are inadequate for Sanskrit.  It would be desirable for Mihail Bayaryn and Ulrich Stiehl to upgrade their fonts, which otherwise handle conjuncts very comprehensively, to handle the Vedic characters in the two Unicode pages mentioned including in particular the combining candrabindu with semivowels l, y, and v.
>> Other Indic fonts not tested are described on the University of Chicago's South Asia Language Resource Center page at http://salrc.uchicago.edu/resources/fonts/available/hindi/.
>> Yours,
>> Peter
>> *************************
>> Peter M. Scharf
>> scharfpm7 at gmail.com
>> *************************
>> _______________________________________________
>> INDOLOGY mailing list
>> INDOLOGY at list.indology.info
>> indology-owner at list.indology.info (messages to the list's managing committee)
>> http://listinfo.indology.info (where you can change your list options or unsubscribe)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://list.indology.info/pipermail/indology/attachments/20160618/321ea400/attachment.htm>

More information about the INDOLOGY mailing list