New program for transliteration

ucgadkw at ucgadkw at
Fri Mar 19 20:49:28 UTC 1993

A new general-purpose program for character-set 
transliteration has appeared.  INDOLOGISTs will all have been aware
of Jeroen Hellingman's excellent patc program for some time, of 
course :-)  (it's part of his Malayalam package, anyway, but is 
of much broader use).  This new transliteration program appears 
to cover some of the same ground.  I haven't tried it, but I 
append the readme.doc file below, for your information.


The TRANSLIT program is used to transliterate character codes.
The ASCII table of characters (containing characters with codes 0 to 127)
is a table for English language. For other languages many different schemes
are used to represent their respective alphabets. Some use codes larger
than 127, some use multicharacter sequences to represent a single letter
in foreign alphabets. There is also UNICODE and other proposed standards
to use units larger than 8-bits(1 byte) to represent foreign alphabets.
For example, UNICODE will use 16-bit(2 byte) codes. At this moment, the
TRANSLIT program supports only 8-bit codes, but will be expanded to
UNICODE if there is enough interest.

It is frequently necessary to convert from one representation to another
representation of the foreign alphabet. E.g., in the Library of Congress
transliteration, the Russian letter sha is transliterated as two Latin
letters "sh" while the popular word processors use a code 232 (decimal),
the RELCOM network uses a code 221, and the KOI7 set uses character "["
for the same letter. So if your screen driver, printer, word processor,
etc. uses different codes than your text, you need to transliterate.

The TRANSLIT program is a powerful tool for such tasks. It converts an input
file in one representation to the output file in another representation using
an appropriate, user defined, transliteration table. Transliteration table
allows for very elaborate transliteration tasks and includes provisions for
plain character sequences, character lists, regular expressions (flexible
matches), SHIFT-OUT/IN sequences and more. The program comes with documentation
and examples of popular transliteration schemes for Russian language. Other
files will be added with your collaboration.

The following files are currently in the distribution. They are all ASCII
(text) files (with the exception on translit.tar.Z and
Please note that the copyright notice requires that, if you distribute this
program, you have to distribute the complete set of files.
TRANSLIT is copyrighted: Copyright (c) Jan Labanowski and JKL Enterprises, Inc.

  Name                   Description
readme.doc        This file       PostScript version of program documentation and
                  installation procedure
translit.1        [nt]roff version of the above in the format
                     of UN*X man page  (use -man option with [nt]roff)
translit.txt      Plain text version of the above.
order.txt         Order form for ordering the executable program (compiled
                  with installation script and instructions)

      TRANSLITERATION TABLES FOR RUSSIAN (read comments in the files)
alt-gos.rus       ALT to GOSTCII table
alt-koi8.rus      ALT to KOI8 table
gos-alt.rus       GOSTCII to ALT table
gos-koi8.rus      GOSTCII to KOI8 table
koi7-8.rus        KOI7 to KOI8 table
koi7nl-8.rus      KOI7 (no Latin) to KOI8 table
koi8-7.rus        KOI8 to KOI7 table
koi8-alt.rus      KOI8 to ALT table
koi8-gos.rus      KOI8 to GOSTCII table
koi8-lc.rus       KOI8 to Library of Congress table
koi8-phg.rus      KOI8 to GOST transliteration
koi8-php.rus      KOI8 to Pokrovsky transliteration
koi8-tex.rus      KOI8 to LaTeX conversion
phg-koi8.rus      GOST transliteration to KOI8
pho-8sim.rus      Simple phonetic to KOI8
pho-koi8.rus      Various phonetic to KOI8
php-koi8.rus      Pokrovsky transliteration to KOI8
tex-koi8.rus      LaTeX to KOI8

example.alt.uu    uuencoded example in ALT
example.ko8.uu    uuencoded example in KOI8
example.pho       phonetic transliteration example
example.tex       LaTeX example

translit.c        Main program
paths.h           Include file
reg_exp.h         Include file
reg_exp.c         Modified regular expression package by H. Spencer
reg_sub.c         Modified regular expression package by H. Spencer

translit.tar.Z   ---   Compressed tar file with the whole distribution.
                          ON UN*X use:
                             zcat translit.tar.Z | tar xvof -
                          to get all individual files. This file is BINARY, and
                          you should not attempt to obtain it via email.
                          This is a best way to get the whole ditribution via
                          ftp if you are on the UN*X machine.
translit.tar.z.uu ---  uuencoded file from the above. It can be transmitted
                       via e-mail, but it is a large file, and if your mailer
                       sets limits on your messages, it may not be correctly
                       transmitted. To recover individual files from the
                       email message, do:
                         uudecode message_file
                       where the mesage_file is a saved email message.
                       You will obtain translit.tar.Z file which you can
                       unpack as described above.     ---   This is a "zipped" file (i.e., compressed with a ZIP
                       program. It is binary (i.e., you cannot get it via
                       e-mail, but you can get it via ftp with binary switch
                       set) To get individual file do:
                         unzip  (in UNIX)
                         PKUNZIP (under MS-DOS)
                       and you will obtain a full distribution.  ---   Uuencoded file from above. Can be sent via e-mail but
                       it is big. To recover all files do:
                          uudecode message_file
                       where message_file is your saved message and then
                       "unzip" it as shown above.


Via FTP (if you are on Internet):
  ftp           (or ftp
  Login: anonymous
  Password: Your_email_address (Please...)
  ftp> ascii       (or binary if you retrieve binary files)
  ftp> cd pub/russian/translit
  ftp> get file_name
     .....     (for each file)
  ftp> quit

Via E-mail:
  Send message:
     send translit/file_name from russian
  to OSCPOST at or OSCPOST at OHSTPY.BITNET. You can retrieve more files
  with a single message by placing several lines of the above format.
  The file will be forwarded to your mailbox automatically.

The "file_name" in the instructions above means any file from the list
given above. If you do not know or have programs like uudecode, unzip, tar,
zcat or uncompress, get all individual files one by one. If you know how
to use the above programs it may be faster for you to get a tar or zip
archive and unpack it.

Program installation and compilation is described in the translit docs.
Since the program requires that you make small changes to paths.h before
compilation (depending on your system and environment), I cannot realy
distribute generic executables (i.e., compiled programs). You have to modify
paths.h to suit your needs and operationg system and compile the program using
your favorite C compiler.

If you do not have time, do not have resources, or for whatever reason
you wish a ready to run executable of TRANSLIT, you can order it for
a very modest fee from JKL ENTERPRISES, INC. as described in the file:
order.txt. It will come with an easy installation script which will ask
you a few simple questions and install the program.

I invite, and will try to answer, bug reports, comments and suggestions.
If there is an interest I will work on optimizing the program, on supporting
the UNICODE, and other enhancements which you suggest. If you use the
program for commercial purposes, and on many computers in your organization,
you might want to buy the program from JKL ENTERPRISES, INC., to aid further
development, though you are not required to do so.


Author coordinates:
Jan Labanowski
P.O. Box 21821
Columbus, OH 43221-0821, USA


More information about the INDOLOGY mailing list