Hans,

I think this conversation is slightly tangential to the original question, but I believe is useful for all members of the group. I hope it is okay to continue the discussion a little bit further.

I should have spoken about the charsets in my previous email. The use of wrong charset is perhaps most likely cause of garbled text, much more than the use of non-Unicode applications. Most of the modern word processing applications are Unicode compatible, but if the default character encoding scheme is set to anything other than Unicode (UTF-8), then, you are likely to see garbled text when you try to edit a document that used Unicode encoding.


Inline image 1

Some of the recent versions of Microsoft Word doesn't let you change the default encoding from "Unicode (UTF-8)" easily, but the earlier versions of Microsoft Word had the default encoding set to windows-1252 (superset of ISO/IEC 8859-1). If the default encoding is set to any of those pre-Unicode standards, then, a unicode document saved on that application, while retaining common characters such as  ā, ī, ū, ē,  ō, will replace Unicode-specific characters with question marks, garbled text or blank spaces. I think this is the primary cause of most of your issues with your volume.

For a long time, Yahoo Mail and Yahoo Groups also had the default encoding set to ISO-8859-1, although they supported Unicode (UTF-8) encoding as well. That means, anyone receiving the email in Unicode can read it fine, but if they forward or respond to it, then the new receiver will get garbled text for the Unicode characters. I think Yahoo switched to Unicode all the way by 2010, albeit a bit later than most of other popular web portals.

As for the non-Unicode applications, I found Adobe Pagemaker to be the biggest offender, in my experience. Several of publishing houses in India still use Adobe Pagemaker for preparing material for publishing. although Pagemaker cannot accept Unicode characters. Recently, we had to write a program to map all Unicode based diacritics into a specific font equivalents to facilitate publishing the book in India. 

I believe there's virtually no excuse in this day and age not to be using Unicode all the way. Some specialized encodings may be more efficient than the Unicode encodings for certain languages. But unless you're storing terabytes and terabytes of very specialized text, there's usually no reason to change your encoding from the default Unicode (UTF-8). If you have older documents that may have used other character encodings, it may be better to get them converted to use the "Unicode (UTF-8)" encoding.

Regards,
Suresh.

On Wed, Aug 10, 2016 at 1:26 PM, Hock, Hans Henrich <hhhock@illinois.edu> wrote:
Thanks, Suresh.

A few things—

1. Your B́ did survive in the message that I received. 

2. I’m not so sure whether it will survive (across platform) if incorporated in Word document (Apple’s Pages is, I believe, readable only on Macs; I don’t know about Open Office). 

3. So here’s perhaps the most important question: What word processing applications are fully Unicode-compliant or, to use a double negative, are not non-Unicode applications? (The font that I am using, the old war horse Times New Roman, is Unicode-compliant, as far as I can tell.)

Cheers,

Hans

On 10 Aug 2016, at 12:08, Suresh Kolichala <suresh.kolichala@gmail.com> wrote:

Thanks, Hans. Combined Characters such as should survive through multiple transmissions across different platforms, as long as the applications utilized are fully compatible with Unicode. 

Technically speaking, I believe the problems you have had with your wonderful volume are not really single-glyph characters vs. multi-glyph characters, but Unicode vs. non-Unicode. As you may know, the standard for extended Latin characters predates Unicode. For example, ISO 8859-1 includes many commonly used symbols for Indological transliteration such as  Ā, ā, Ē, ē, Ī, ī, Ō, ō, Ū, ū was first published in March 1985. Many non-Unicode applications support these extended Latin characters, and therefore you find them survive through multiple transmissions (Unicode is compatible with these Latin extensions as well). 

However, many of the new characters added for the Unicode standard, will not survive through the use of non-Unicode applications, whether they are single glyph or multi-glyph. For example, Ḃ (Latin Capital Letter B with dot above) is a single glyph character, but I am pretty sure it will not survive transmission through non-Unicode applications (let's see how many find Ḃ in their emails correctly).

Alas, until the computing world rids itself of all pestiferous non-Unicode applications, perhaps, we will have to deal with the headaches of diacritics! 

Suresh.


On Wed, Aug 10, 2016 at 11:46 AM, Hock, Hans Henrich <hhhock@illinois.edu> wrote:
Thanks, Suresh.

Would that things were that simple. The proper transcription, , with candra, but no bindu, can be produced through what are called rendering machines; but when sending files back and forth between different platforms for the volume The languages and linguistic of South Asia, we found that this character, as well as characters such as ā́, did not always survive intact but instead showed up as blanks. Unlike single-glyph characters, these are not stable. We therefore always made sure to add a pdf version, so that messed-up characters could be restored. 

All the best,

Hans


On 10 Aug 2016, at 09:13, Suresh Kolichala <suresh.kolichala@gmail.com> wrote:

Dear Rolf and others,

Sorry for a late response, but I just saw this thread. Since I have some experience with Unicode as a member of the Unicode Consortium, Here are my quick remarks:
  1. Please don't use ṁ for transcribing the half nasal. ṁ is used for anusvāra in ISO-15919. Sinhala should be using ISO-15919, not IAST. ṃ is not used in ISO-15919.
  2. I don't know what work/application you need this symbol for, but the use of combining diacritical marks are very common in Unicode, and widely used for Indological purposes, as Hans pointed out. 
  3. m̐ should serve your purposes for representing the labial prenasalized consonant ඹ (U+0DB9 SINHALA LETTER AMBA BAYANNA)
  4. In the world of Unicode, nobody should be bothering about individual glyphs. If you work requires it, then, it is perhaps using an outdated technology. Abandon it :).
  5. If you use Unicode, it should not matter whether you are using a Mac, PC or any other publishing software. The text should be as transportable as the ordinary English Text (which follows ASCII standard). Unicode is an international standard, and almost all the applications and operating systems developed in the last decade should fully support it.
Regards,
Suresh.