New program for transliteration
ucgadkw at ucl.ac.uk
ucgadkw at ucl.ac.uk
Fri Mar 19 20:49:28 UTC 1993
A new general-purpose program for character-set
transliteration has appeared. INDOLOGISTs will all have been aware
of Jeroen Hellingman's excellent patc program for some time, of
course :-) (it's part of his Malayalam package, anyway, but is
of much broader use). This new transliteration program appears
to cover some of the same ground. I haven't tried it, but I
append the readme.doc file below, for your information.
WHAT IS TRANSLIT PROGRAM
The TRANSLIT program is used to transliterate character codes.
The ASCII table of characters (containing characters with codes 0 to 127)
is a table for English language. For other languages many different schemes
are used to represent their respective alphabets. Some use codes larger
than 127, some use multicharacter sequences to represent a single letter
in foreign alphabets. There is also UNICODE and other proposed standards
to use units larger than 8-bits(1 byte) to represent foreign alphabets.
For example, UNICODE will use 16-bit(2 byte) codes. At this moment, the
TRANSLIT program supports only 8-bit codes, but will be expanded to
UNICODE if there is enough interest.
It is frequently necessary to convert from one representation to another
representation of the foreign alphabet. E.g., in the Library of Congress
transliteration, the Russian letter sha is transliterated as two Latin
letters "sh" while the popular word processors use a code 232 (decimal),
the RELCOM network uses a code 221, and the KOI7 set uses character "["
for the same letter. So if your screen driver, printer, word processor,
etc. uses different codes than your text, you need to transliterate.
The TRANSLIT program is a powerful tool for such tasks. It converts an input
file in one representation to the output file in another representation using
an appropriate, user defined, transliteration table. Transliteration table
allows for very elaborate transliteration tasks and includes provisions for
plain character sequences, character lists, regular expressions (flexible
matches), SHIFT-OUT/IN sequences and more. The program comes with documentation
and examples of popular transliteration schemes for Russian language. Other
files will be added with your collaboration.
FILES IN THE PROGRAM DISTRIBUTION
The following files are currently in the distribution. They are all ASCII
(text) files (with the exception on translit.tar.Z and translit.zip).
Please note that the copyright notice requires that, if you distribute this
program, you have to distribute the complete set of files.
TRANSLIT is copyrighted: Copyright (c) Jan Labanowski and JKL Enterprises, Inc.
readme.doc This file
translit.ps PostScript version of program documentation and
translit.1 [nt]roff version of the above in the format
of UN*X man page (use -man option with [nt]roff)
translit.txt Plain text version of the above.
order.txt Order form for ordering the executable program (compiled
with installation script and instructions)
TRANSLITERATION TABLES FOR RUSSIAN (read comments in the files)
alt-gos.rus ALT to GOSTCII table
alt-koi8.rus ALT to KOI8 table
gos-alt.rus GOSTCII to ALT table
gos-koi8.rus GOSTCII to KOI8 table
koi7-8.rus KOI7 to KOI8 table
koi7nl-8.rus KOI7 (no Latin) to KOI8 table
koi8-7.rus KOI8 to KOI7 table
koi8-alt.rus KOI8 to ALT table
koi8-gos.rus KOI8 to GOSTCII table
koi8-lc.rus KOI8 to Library of Congress table
koi8-phg.rus KOI8 to GOST transliteration
koi8-php.rus KOI8 to Pokrovsky transliteration
koi8-tex.rus KOI8 to LaTeX conversion
phg-koi8.rus GOST transliteration to KOI8
pho-8sim.rus Simple phonetic to KOI8
pho-koi8.rus Various phonetic to KOI8
php-koi8.rus Pokrovsky transliteration to KOI8
tex-koi8.rus LaTeX to KOI8
example.alt.uu uuencoded example in ALT
example.ko8.uu uuencoded example in KOI8
example.pho phonetic transliteration example
example.tex LaTeX example
TRANSLIT PROGRAM SOURCE in C.
translit.c Main program
paths.h Include file
reg_exp.h Include file
reg_exp.c Modified regular expression package by H. Spencer
reg_sub.c Modified regular expression package by H. Spencer
PACKED FILES CONTAINING THE WHOLE DISTRIBUTION FROM ABOVE
translit.tar.Z --- Compressed tar file with the whole distribution.
ON UN*X use:
zcat translit.tar.Z | tar xvof -
to get all individual files. This file is BINARY, and
you should not attempt to obtain it via email.
This is a best way to get the whole ditribution via
ftp if you are on the UN*X machine.
translit.tar.z.uu --- uuencoded file from the above. It can be transmitted
via e-mail, but it is a large file, and if your mailer
sets limits on your messages, it may not be correctly
transmitted. To recover individual files from the
email message, do:
where the mesage_file is a saved email message.
You will obtain translit.tar.Z file which you can
unpack as described above.
translit.zip --- This is a "zipped" file (i.e., compressed with a ZIP
program. It is binary (i.e., you cannot get it via
e-mail, but you can get it via ftp with binary switch
set) To get individual file do:
unzip translit.zip (in UNIX)
PKUNZIP translit.zip (under MS-DOS)
and you will obtain a full distribution.
translit.zip.uu --- Uuencoded file from above. Can be sent via e-mail but
it is big. To recover all files do:
where message_file is your saved message and then
"unzip" it as shown above.
HOW TO OBTAIN THE FILES:
Via FTP (if you are on Internet):
ftp kekule.osc.edu (or ftp 220.127.116.11)
Password: Your_email_address (Please...)
ftp> ascii (or binary if you retrieve binary files)
ftp> cd pub/russian/translit
ftp> get file_name
..... (for each file)
send translit/file_name from russian
to OSCPOST at osc.edu or OSCPOST at OHSTPY.BITNET. You can retrieve more files
with a single message by placing several lines of the above format.
The file will be forwarded to your mailbox automatically.
The "file_name" in the instructions above means any file from the list
given above. If you do not know or have programs like uudecode, unzip, tar,
zcat or uncompress, get all individual files one by one. If you know how
to use the above programs it may be faster for you to get a tar or zip
archive and unpack it.
Program installation and compilation is described in the translit docs.
Since the program requires that you make small changes to paths.h before
compilation (depending on your system and environment), I cannot realy
distribute generic executables (i.e., compiled programs). You have to modify
paths.h to suit your needs and operationg system and compile the program using
your favorite C compiler.
GETTING THE READY TO RUN PROGRAM
If you do not have time, do not have resources, or for whatever reason
you wish a ready to run executable of TRANSLIT, you can order it for
a very modest fee from JKL ENTERPRISES, INC. as described in the file:
order.txt. It will come with an easy installation script which will ask
you a few simple questions and install the program.
I invite, and will try to answer, bug reports, comments and suggestions.
If there is an interest I will work on optimizing the program, on supporting
the UNICODE, and other enhancements which you suggest. If you use the
program for commercial purposes, and on many computers in your organization,
you might want to buy the program from JKL ENTERPRISES, INC., to aid further
development, though you are not required to do so.
P.O. Box 21821
Columbus, OH 43221-0821, USA
jkl at osc.edu, JKL at OHSTPY.BITNET
More information about the INDOLOGY