[INDOLOGY] Sinhala half nasal plus m

Suresh Kolichala suresh.kolichala at gmail.com
Wed Aug 10 18:47:50 UTC 2016


Hans,

I think this conversation is slightly tangential to the original question,
but I believe is useful for all members of the group. I hope it is okay to
continue the discussion a little bit further.

I should have spoken about the charsets in my previous email. The use of
wrong charset is perhaps most likely cause of garbled text, much more than
the use of non-Unicode applications. Most of the modern word processing
applications are Unicode compatible, but if the default character encoding
scheme is set to anything other than Unicode (UTF-8), then, you are likely
to see garbled text when you try to edit a document that used Unicode
encoding.


[image: Inline image 1]

Some of the recent versions of Microsoft Word doesn't let you change the
default encoding from "Unicode (UTF-8)" easily, but the earlier versions of
Microsoft Word had the default encoding set to windows-1252 (superset of
ISO/IEC 8859-1). If the default encoding is set to any of those pre-Unicode
standards, then, a unicode document saved on that application, while
retaining common characters such as  ā, ī, ū, ē,  ō, will replace
Unicode-specific characters with question marks, garbled text or blank
spaces. I think this is the primary cause of most of your issues with your
volume.

For a long time, Yahoo Mail and Yahoo Groups also had the default encoding
set to ISO-8859-1, although they supported Unicode (UTF-8) encoding as
well. That means, anyone receiving the email in Unicode can read it fine,
but if they forward or respond to it, then the new receiver will get
garbled text for the Unicode characters. I think Yahoo switched to Unicode
all the way by 2010, albeit a bit later than most of other popular web
portals.

As for the non-Unicode applications, I found Adobe Pagemaker to be the
biggest offender, in my experience. Several of publishing houses in India
still use Adobe Pagemaker for preparing material for publishing. although
Pagemaker cannot accept Unicode characters. Recently, we had to write a
program to map all Unicode based diacritics into a specific font
equivalents to facilitate publishing the book in India.

I believe there's virtually no excuse in this day and age not to be using
Unicode all the way. Some specialized encodings may be more efficient than
the Unicode encodings for certain languages. But unless you're storing
terabytes and terabytes of very specialized text, there's usually no reason
to change your encoding from the default Unicode (UTF-8). If you have older
documents that may have used other character encodings, it may be better to
get them converted to use the "Unicode (UTF-8)" encoding.

Regards,
Suresh.

On Wed, Aug 10, 2016 at 1:26 PM, Hock, Hans Henrich <hhhock at illinois.edu>
wrote:

> Thanks, Suresh.
>
> A few things—
>
> 1. Your B́ did survive in the message that I received.
>
> 2. I’m not so sure whether it will survive (across platform) if
> incorporated in Word document (Apple’s Pages is, I believe, readable only
> on Macs; I don’t know about Open Office).
>
> 3. So here’s perhaps the most important question: What word processing
> applications are fully Unicode-compliant or, to use a double negative, are
> not non-Unicode applications? (The font that I am using, the old war horse
> Times New Roman, is Unicode-compliant, as far as I can tell.)
>
> Cheers,
>
> Hans
>
> On 10 Aug 2016, at 12:08, Suresh Kolichala <suresh.kolichala at gmail.com>
> wrote:
>
> Thanks, Hans. Combined Characters such as *m̆ *should survive through
> multiple transmissions across different platforms, as long as the
> applications utilized are fully compatible with Unicode.
>
> Technically speaking, I believe the problems you have had with your
> wonderful volume are not really single-glyph characters vs. multi-glyph
> characters, but Unicode vs. non-Unicode. As you may know, the standard for
> extended Latin characters predates Unicode. For example, ISO 8859-1
> includes many commonly used symbols for Indological transliteration such as
>  Ā, ā, Ē, ē, Ī, ī, Ō, ō, Ū, ū was first published in March 1985. Many
> non-Unicode applications support these extended Latin characters, and
> therefore you find them survive through multiple transmissions (Unicode is
> compatible with these Latin extensions as well).
>
> However, many of the new characters added for the Unicode standard, will
> not survive through the use of non-Unicode applications, whether they are
> single glyph or multi-glyph. For example, Ḃ (Latin Capital Letter B with
> dot above) is a single glyph character, but I am pretty sure it will not
> survive transmission through non-Unicode applications (let's see how many
> find Ḃ in their emails correctly).
>
> Alas, until the computing world rids itself of all pestiferous non-Unicode
> applications, perhaps, we will have to deal with the headaches of
> diacritics!
>
> Suresh.
>
>
> On Wed, Aug 10, 2016 at 11:46 AM, Hock, Hans Henrich <hhhock at illinois.edu>
> wrote:
>
>> Thanks, Suresh.
>>
>> Would that things were that simple. The proper transcription, *m̆*, with
>> candra, but no bindu, can be produced through what are called rendering
>> machines; but when sending files back and forth between different platforms
>> for the volume *The languages and linguistic of South Asia*, we found
>> that this character, as well as characters such as *ā́*, did not always
>> survive intact but instead showed up as blanks. Unlike single-glyph
>> characters, these are not stable. We therefore always made sure to add a
>> pdf version, so that messed-up characters could be restored.
>>
>> All the best,
>>
>> Hans
>>
>>
>> On 10 Aug 2016, at 09:13, Suresh Kolichala <suresh.kolichala at gmail.com>
>> wrote:
>>
>> Dear Rolf and others,
>>
>> Sorry for a late response, but I just saw this thread. Since I have some
>> experience with Unicode as a member of the Unicode Consortium, Here are my
>> quick remarks:
>>
>>    1. Please don't use ṁ for transcribing the half nasal. ṁ is used for
>>    anusvāra in ISO-15919. Sinhala should be using ISO-15919, not IAST. ṃ
>>    is not used in ISO-15919.
>>    2. I don't know what work/application you need this symbol for, but
>>    the use of combining diacritical marks are very common in Unicode, and
>>    widely used for Indological purposes, as Hans pointed out.
>>    3. *m̐ should serve your purposes for representing the labial
>>    prenasalized consonant *ඹ (U+0DB9 SINHALA LETTER AMBA BAYANNA)
>>    4. In the world of Unicode, nobody should be bothering about
>>    individual glyphs. If you work requires it, then, it is perhaps using an
>>    outdated technology. Abandon it :).
>>    5. If you use Unicode, it should not matter whether you are using a
>>    Mac, PC or any other publishing software. The text should be as
>>    transportable as the ordinary English Text (which follows ASCII standard).
>>    Unicode is an international standard, and almost all the applications and
>>    operating systems developed in the last decade should fully support it.
>>
>> Regards,
>> Suresh.
>>
>>
>>
>>
>
>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://list.indology.info/pipermail/indology/attachments/20160810/984bc88d/attachment.htm>


More information about the INDOLOGY mailing list