Fwd: [infitt] OCR

Sat Apr 29 01:49:24 UTC 2000

Because of least number of letters and no clustered consonants,
OCR is prgressing. Got this forwarded from the list today:
Int. Forum for Information Technology in Tamil [INFITT].

-----------------------------
Dear friends:

Tamil computing is implemented generally in two ways:
i) adapting existing shrink-wrap softwares written for the larger
English market with Tamil fonts and suitable keyboard editors;
ii) a very native environment with dedicated softwares where
everything from pull-down menus and all commands are in Tamil.

Former procedure is the way currently >99% of all Tamil materials
are processed or handled. There has been only a handful of softwares
of the second kind. Padhami of Chennai Kavigal/Manoj Annadurai
was an early one. Windows 2000 based on Unicode with Tamil support
will allow many new softwares of the second kind.

Recently several people have been working on Tamil OCR, though
we haven't yet any working version available. Three years ago,
I tried to use the existing OCR package OmniPage Pro to see if it
can be adapted for Tamil OCR usage. There is the "training" mode
where you can assign any arbitrary scanned image to a given
character (or group of characters). One can use this option to
assign specific letters to various Tamil glyphs and attempt to
derive a working OCR out of this mode. While the concept worked
in principle, the number of slots available with OmniPage Pro was
very limited (60) to assign individually all distinct glyphs of
Tamil. So complete working of Tamil OCR could not be achieved.
Since I was working with an commercial copy of the software, I
even wrote emails to the software company, asking if they could
relax the above restrictions of the training mode. No responses.

Recently I was pleased to hear from the software professional
Mr. Gopalrao Thukaram of USA (one of Project Madurai volunteer and
one associated with the Tamil website <www.thinnai.com>) on his
successes in similar Tamil OCR using existing OCR packages.
I quote below excerpts from his email:

 >>
 > I have been trying various commercially available ocrs for
 > training them for Tamil. The best I have seen is FineReader
 > 4.0 Professional from ABBYY software <www.abbyy.com>.
 >
 > This software has a read/learn facility in which one can read
 > as well as train the software for understanding TAMIL.
 >
 > This is better than key-in of the tamil books. Though this is
 > not perfect, the understanding is around 90-95%. Once all the
 > characters are trained, the conversion is a giffy. I used
 > TAMMaduram font for training and conversion. The output text is
 > received in TAMMaduram. one can use any font mapping to
 > get the data out.
 >
 > I could ocr an entire book of 50 pages in an 2 hours time
 > (with training  included). Time consuming part was scanning
 > not the ocring.
<<

ABBYY is a Russian software establishment and they do provide a
demo version of their OCR software for trials (works on PCs).
If someone wants to try these out, please contact Mr. Thukaram
<gthukaram at hotmail.com> for details. If it can save time, we
can request him to provide us with a copy of the training files
that he has generated.

So, there still appears to be a fast route to Tamil OCR.

[snip]
________________________________________________________________________
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com