Peter_Scharf at Peter_Scharf at
Thu Dec 15 01:03:39 EST 1994

Michio Yano quietly mentioned that he had entered the whole text of the
Brhatsamhita at the end of message.  He wrote:

>I have digitalized
>the whole text of the BRhatsamhitaa and it is already on
>the net.  Recently we have placed the updated version of
>the e-text at the following ftp site.
>Michio YANO
>Kyoto Sangyo University
>yanom at

The top of his file reads:

              VarAhamihira's BRhatsaMhitA
              (Version 4, June 8, 1994)
                     degitalized by
              Michio YANO and Mizue Sugita
              based on the edition of A.V.TripAThI
              (SarasvatI Bhavan GranthamAlA Edition)
              with reference to H.Kern's text and his translation
              [variants marked by K. & K's tr.]
              and Utpala's commentary [marked by U.]

I fetched it and it looks great.  His scheme of transliteration is precise
so that there are no ambiguities.  It can easily be transformed to any
format anyone would want.  The tremendous work done by Michio Yano and his
colleagues in Kyoto recently leads me to take this opportunity to discuss
some general principles for inputting text.  It would be usefull for the
whole scholarly community to note these principles in order to make their
work most usefull with the least amount of effort.

Michio Yano's scheme for inputting text which leaves no ambiguities and is
easily transformable is as follows:

Text Input System
  (1)   Sanskrit characters which should bear diacritical marks
        when Romanized have been input mostly by capitals.

        vowels:     a, A, i, I, u, U, R, RR, L, e, ai, o, au
        gutturals:  k, kh, g, gh, G
        palatals:   c, ch, j, jh, J
        linguals:   T, Th, D, Dh, N
        dentals:    t, th, d, dh, n
        labials:    p, ph, b, bh, m
        semivowels: y, r, l, v
        sibilants:  z, S, s
        aspiration: h
        anusvAra:   M
        visarga:    H

This meets the necessary fundamentals.  There are a couple of modifications
I would suggest to make it a little simpler.  They are shown below.  The
simplest and fastest method of inputting text is to hit one key for one
Sanskrit sound.  Between upper and lower case keys there are plenty of keys
to cover the whole sound catalogue.  On that principle I came up with the
following scheme, for which I have made a screen font for the mac showing
the standard diacritic marks and for which I have written programs to
transliterate to a couple of popular Roman diacritic fonts and to Jaipur, a
Devanagari font.  (If you have fonts you regularly use and would like
tranlit programs for let me know and I'll make programs for them available
as well.  My apologies for the delays to a few who have already done so.  I
just finished teaching and this is the first opportunity I've had to get
back to these pursuits.)

My text entering scheme:

        vowels:     a, A, i, I, u, U, f, F, x, X, e, E, o, O
        gutturals:  k, K, g, G, N
        palatals:   c, C, j, J, Y
        linguals:   w, W, q, Q, R
        dentals:    t, T, d, D, n
        labials:    p, P, b, B, m
        semivowels: y, r, l, v
        sibilants:  S, z, s
        aspiration: h
        anusvAra:   M
        visarga:    H

        acute accent for udatta
        carrot (circumfles) for svarita
        grave accent for anudatta

The Y and R for the palatal and retroflex nasals, and capitals for long
vowels and aspirated stops are intuitive.  The f, F, x, X for vocalic r,
long r, l and long l (it shows up in some grammatical texts.) are not but
they are available.

Michio Yano undid vowel but not consonant sandhi.  He writes:

  (2)   Sandhi
        For the convenience of word search, internal and external
        vowel Sandhis are decomposed by ^.
        eg.  vizeSa^ukti < vizeSokti
             ca^iti < ceti
             horA^anyo < horAnyo
             ko +api < ko'pi
        Consonantal sandhis are retained.

  (3)   Compounds
        Members of compound words are sometimes separated by ^,
        but not consistent.

        For future reference, concerning the sandhi, it is definitely a
good idea to undo sandhi while entering a text.  One should undo ALL sandhi
though, not just vowel sandhi.  I have written a program to put together
sandhi.  Computers cannot take it apart without reference to enormous
lexical searches and syntax checking.  No one has even attempted to
automate such a process.  Therefore, human beings must take it apart.  Once
taken apart, a samhita text can be produced virtually instantaneously by
using my sandhi program.  The file with undone sandhi will allow for word
searching, etc.  The samhita text produced by using a sandhi program  will
allow analyses of meter etc. where undone sandhi is not desired.
Therefore, whoever enters text should undo sandhi to permit the greatest
usefulness with the least human effort.  One should indicate by using a
different separation mark whether the undone sandhi is within a compound or
between padas.  For example, I put a space between words and a hyphen
within a compound.  My sandhi program can put together sandhi at one type
of word boundary at a time, e.g. all sandhi at hyphens but not at words.  I
have also allowed for options of spacing.  e.g. to leave a space after
word-final consonants after the European custom or to remove all spaces not
permitted by Devanagari script.

I know that Peter Schreiner has entered a large number of texts.  I would
welcome the comments of others who have entered texts or who have
experience with computational linguistics to make comments publicly (or
privately to me as they wish).

Let me close by expressing my thanks to Michio Yano, to his colleague Prof.
Muneo Tokunaga and to those who helped them enter these texts.


                Peter M. Scharf
                Department of Classics
                Brown University


More information about the INDOLOGY mailing list