(Xe)(La)TeX, (L)Edmac, and Velthuis query

Sun Jul 4 10:27:21 UTC 2010

On 04.07.2010 05:59, Arlo Griffiths wrote:
>
> Dear Tim,
> I am not sure which publisher you are dealing with, but the expressed preference for Velthuis may be due to his ignorance of other available options. I personally find Devangari MT more elegant, and I believe it is more in conformity of the shape of letters expected by an Indian readership.
> There is nothing wrong with plain TeX in itself: it will just make things harder because plain TeX wizards are few and far between, and you are bound to run into situations where you need to call on technical advice. I would strongly disadvise launching a project based on plain TeX if you do not have a very helpful plain-TeXie near at hand. Anyhow, Velthuis certainly does work with LaTeX, so I believe that factor is no reason to reject the combination Velthuis-LEDMAC.
> Shilpa Sumant and I have been having rather positive experiences with XeLaTeX and LEDMAC for our Karmapa~njikaa edition, using the Devanagari MT font that is native to Mac OS. We started working directly in Devanagari, which means that now that we have substantial part of text in edited form, we face the problem of extracting an easily searchable romanized text from the Devanagari. I heard from Somdev Vasudeva that with some simple tricks, one can keep working in romanized transliteration or in code (as with Velthuis), to get the Devanagari only at the output stage. This might have been a better procedure for us in hindsight.
> This is all the perspective of a technical simpleton. Dominik and others will be able to give you sophisticated technical advice.
> Good luck!
> Arlo
>

If what is aimed for is (a) the production of high-quality print 
editions and simultaneously (b) the ability to extract searchable 
Sanskrit text without annotation (and have it searchable), then I would 
suggest moving up an abstraction layer - start encoding the edition in 
XML. If you know any scripting language that can deal with XML (perl, 
php, python), getting a TEI-compliant XML source text for a critical 
edition to LaTeX source code for printing with ledmac, or to HTML for 
online publication, is quite simple (time-consuming, but simple).

If you work in a (La)TeX environment, you may well be able to convert 
romanized to Devanagari and vice versa - but I don't think you will be 
able to extract searchable text without annotation, at least not without 
a lot of manual intervention and correction.

The problem is that closing braces in (La)TeX source code don't tell you 
what they close (whereas <note> in XML always has a matching </note>, 
for instance). I did encounter cases where it was formally not possible 
anymore to extract the actual text from a jungle of notes just with 
scripting or programming.

In the larger picture, if more editorial projects decided to work with 
XML from the start and were kind enough to share the resultant e-texts 
with repositories like SARIT, this would really help towards building 
larger digital corpora of better philological quality ... producing 
editions with Word or other specialized software that has its own 
proprietary file-format and does not follow any open standards is in my 
opinion a kind of technical imprisonment and rather solipsistic. The 
learning-curve towards XML is a little steep, but I think it really pays 
off.

Best,

Birgit