[INDOLOGY] Diacriticals in unicode, single or multiple glyphs

Dominik Wujastyk wujastyk at gmail.com
Thu Nov 24 16:21:01 UTC 2016


Dear Harry, the documentation in the "document header" of each SARIT file
notes that the encoding is IAST, and notes where that differs from the ISO
standard.  Thus, the document header of the Astangahrdayasamhita file, to
pick one at random, has this statement:

Editorial Description

The published edition from which this e-text was transcribed is printed in
the Devanāgarī script. The electronic text below is in a lossless
transliteration using the Latin alphabet. The transliteration scheme used
is the IAST (The International Alphabet of Sanskrit Transliteration
<http://en.wikipedia.org/wiki/International_Alphabet_of_Sanskrit_Transliteration>).
IAST differs in small ways from ISO 15919, but is preferred by most working
Sanskrit scholars. Conversion of this file to ISO 15919 can be achieved by
performing the following replacements throughout the file: ṛ -> r̥ and ṃ ->
ṁ

Text divison is as Devanāgarī ("ityevam" not "ity evam".)

Initial vowel elision for avagraha is reversed and marked with a + sign:
e.g., "prathamo+adhyāyaḥ"


The principle behind the SARIT repository is that the e-texts should be
documented, so that any quirks or editorial decisions are explict and
up-front.   In addition, there is a revision history for each file.  So if
you take the file and do something to it, you brieflly note what you've
done before re-uploading the file.

Best,
Dominik


​
--
Professor Dominik Wujastyk <http://ualberta.academia.edu/DominikWujastyk>
​,​

Singhmar Chair in Classical Indian Society and Polity
​,​

Department of History and Classics <http://historyandclassics.ualberta.ca/>
​,​
University of Alberta, Canada
​.​

South Asia at the U of A:

​sas.ualberta.ca​
​​


On 18 November 2016 at 18:23, Harry Spier <hspier.muktabodha at gmail.com>
wrote:

> I've just looked at the Sanskrit Library, SARIT and GRETIL texts and I see
> that these use IAST transliteration for anusvara (dot under m) while
> Muktabodha digital library uses ISO15919 for anusvara (dot on top of m).
>
> Harry Spier
>
> On Fri, Nov 18, 2016 at 11:57 AM, Peter Scharf <scharfpm7 at gmail.com>
> wrote:
>
>> Dear list members,
>> The Sanskrit Library transcoding facility on line at
>> http://sanskritlibrary.org/transcodeText.html does indeed transcode to
>> Romanization using the preferred Unicode composites of characters plus
>> diacritics.  Our off-line transcoding software
>>
>>    - TranscodeFile (Java program)
>>    <http://sanskritlibrary.org/software/transcodeFile.html>
>>
>>  which is downloadable from http://sanskritlibrary.org/downloads.html has
>> a large array of transcoders one of which transcodes to Romanization using
>> precomposed Unicode characters that include diacritics.  The problem with
>> searching that Harry Spier mentions is just one of a number of reasons why
>> Malcolm Hyman and I designed the Sanskrit Library phonetic encoding for all
>> our linguistic programming, including both the encoding of texts and
>> searching, and use Unicode only for display, and data input if desired
>> (though for the latter purpose SLP and most other meta-encodings are
>> preferable).  Our book *Linguistic Issues in Encoding Sanskrit*
>> available at http://sanskritlibrary.org/publications.html discusses the
>> issues comprehensively.
>>
>> Yours,
>> Peter
>>
>> On Fri, Nov 18, 2016 at 6:28 PM, Harry Spier <hspier.muktabodha at gmail.com
>> > wrote:
>>
>>> Dear list members,
>>>
>>> In unicode you can write characters with diacriticals with either a
>>> single glyph or you can combine the character with the diacritical writing
>>> it in two glyphs.
>>>
>>> This is a problem when one searchs sanskrit etexts.
>>>
>>> For example, the letters with diacriticals in the Muktabodha digital
>>> library are written with one glyph and as far as I can see GRETIL does the
>>> same thing.  But the transcoding utility at  "The Sanskrit Library"
>>> http://sanskritlibrary.org/transcodeText.html
>>> combines letters with their diacriticals in two glyphs.
>>>  So if you used the Sanskrit Library utility to create a transliterated
>>> word such as for example: *śākti* and then searched texts from either
>>> GRETIL or Muktabodha for that word your search wouldn't find anything.
>>>
>>> Thanks,
>>> Harry Spier
>>>
>>>
>>>
>>> _______________________________________________
>>> INDOLOGY mailing list
>>> INDOLOGY at list.indology.info
>>> indology-owner at list.indology.info (messages to the list's managing
>>> committee)
>>> http://listinfo.indology.info (where you can change your list options
>>> or unsubscribe)
>>>
>>
>>
>>
>> --
>> *******************
>> Peter M. Scharf
>> scharfpm7 at g <peter.scharf at univ-paris-diderot.fr>mail.com
>> *******************
>>
>
>
> _______________________________________________
> INDOLOGY mailing list
> INDOLOGY at list.indology.info
> indology-owner at list.indology.info (messages to the list's managing
> committee)
> http://listinfo.indology.info (where you can change your list options or
> unsubscribe)
>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://list.indology.info/pipermail/indology/attachments/20161124/b4d4d8ce/attachment.htm>


More information about the INDOLOGY mailing list