Input of e-texts - some suggestions
gruenendahl
GRUENEN at MAIL.SUB.UNI-GOETTINGEN.DE
Tue Jun 11 09:39:28 UTC 2002
Dear list members,
roughly half a year after the introduction of the Goettingen Register
of Electronic Texts in Indian Languages (GRETIL) I would like to
express my thanks to all contributors of e-texts and, at the same
time, invite further contributions. It should be noted that these
contributions are in no way expected to comply with the suggestions
made below.
Anyway, here are some points that I have come to find useful in my own
work as well as in preparing files from various sources for GRETIL.
Perhaps they can serve as a starting point for a discussion.
- FORMAT: Assuming that the aim of the text input is to provide a
scholarly reference aid for a given text, rather than an exercise
in piety, I consider transliteration in a PLAIN TEXT FILE
preferable to any other format such as PDF, RTF, HTML etc.,
which may turn out practically useless for the said purpose,
especially when combined with non-Latin scripts.
- ENCODING: No matter which encoding is used in transliteration,
it should be
- FREE FROM ANY AMBIGUITY (that may, e.g., arise from employing
"n" for different class nasals)
- and FULLY DOCUMENTED at the beginning of every e-text,
preferably in a chart giving the equivalent of each diacritic in
ASCII or an established reference encoding such as CSX. Casual
references to "ITRANS", "Unicode", "UTF8" or whatever are not
very helpful to those using other encodings - and, odd as it may
seem, "other" encodings are not likely to vanish into thin air,
nor will "global" marketing strategies for long prevent the rise
of new encoding systems, making today's one-size-fits-all
solution just another item of electronic mythology.
- REFERENCE SYSTEM: This is perhaps the most neglected aspect in
the majority of e-texts one comes across. And yet, with the
computer's well-known limitation to one screenful of text at a
time, it is crucial to provide readers with adequate orientation,
citing, as it were, book, chapter and verse in each and every
screenful of text.
- REFERENCES SHOULD BE PLACED AT THE END of the respective
text unit (such as a verse or line) to allow for later
SORTING of lines (or padas) in alphabetical order
(cf. below).
- REFERENCES SHOULD BE GIVEN IN FULL, e.g. "3,13.120",
instead of restricting them to the smallest unit, say, the
verse number (just "120" instead of "3,13.120"). Having
browsed two or three screens up or down from a chapter
heading, one may easily have forgotten where exactly one
happens to be. Orientation can be even more difficult if an
ordinary word search takes you from the beginning of the
file right to a verse with the enigmatic reference "120":
for a start, you will have to scroll 119 verses up to find
out that you're in chapter 13, and it is all too plain that
your expedition through the text - and away from the
passage you were looking for - doesn't end there.
- With next to no additional effort, references can be made
SUITABLE FOR CLASSIFIED SEARCH simply by using distinctive
punctuation, such as COMMA between book and chapter, and
DOT between chapter and verse. This allows you to
distinguish the search for "3,13" (=book 3, chapter 13)
from "3.13" (chapter 3, verse 13).
- Especially when a file contains more than one e-text, the
reference should include an ABBREVIATION FOR THE TEXT in
question, preferably with a connecting underscore to
prevent accidental separation due to line break, e.g.
"MBh_3,13.120". Such an abbreviation is essential in pada /
verse indices that you may later want to merge with indices
of other texts to search for parallels.
- In a file combining a root text and interspersed
commentary, say, the Mahabharata and Nilakantha's
Bharatabhavadipa, distinct abbreviations, e.g.,
"MBh_3,13.120" resp. "MBhN_3,13.120", will facilitate
orientation significantly.
- MARKERS FOR METRICAL UNITS (padas) AND SECTIONS OF PROSE
(sentences) are indispensable for generating pada indices.
E.g., the Anustubh pattern could look like this:
For a four-pada verse:
........ $ ........ &
........ stlg ........ // XY_n,n.n //
For a six-pada verse:
........ $ ........ &
........ stlg ........ peseta
........ florin ........ // XY_n,n.n //
Here again, everything is fine as long as it is
UMAMBIGUOUS.
*******************************************************************
These suggestions have gradually emerged from my own practice. I
would be interested to hear what others have to say about this.
Finally, let me again point out that contributions to GRETIL are in
no way expected to comply with these suggestions!
Best regards
Reinhold Gruenendahl
********************************************************************
Dr. Reinhold Gruenendahl
Niedersaechsische Staats- und Universitaetsbibliothek
Fachreferat sued- und suedostasiatische Philologien
(Dept. of Indology)
37070 Goettingen, Germany
Tel (+49) (0)5 51 / 39 52 83
Fax (+49) (0)5 51 / 39 23 61
gruenen at mail.sub.uni-goettingen.de
FACH-INFORMATIONEN INDOLOGIE, GOETTINGEN:
http://www.sub.uni-goettingen.de/ebene_1/fiindolo/fiindolo.htm
In English:
http://www.sub.uni-goettingen.de/ebene_1/fiindolo/fiindole.htm
GRETIL - Goettingen Register of Electronic Texts in Indian Languages
http://www.sub.uni-goettingen.de/ebene_1/fiindolo/gretil.htm
More information about the INDOLOGY
mailing list