Arabic/Persian/Urdu OCR

David Magier magier at EDU.COLUMBIA.CC.CUNIXF
Fri Jul 17 15:03:14 EDT 1992


Below is from an article in MacWeek (7/13/92, p.18), which appears as
a sidebar to a major article on optical character recognition software
for the Macintosh from CTA Inc. (25 Science Park, New Haven,
Connecticut 06511, phone: (203) 786-5828, FAX: 786-5833). Everyone has
always said, as a prime example of the weakness of OCR technology,
that it would be YEARS before anyone managed to come up with OCR
software that would recognize Arabic. But now that it exists, it opens
all sorts of possibilities for text analysis of Arabic/Urdu/Persian
poetry, digital text storage and retrieval, e-text archives (available
by ftp) in each of these languages, etc. etc. Of course the software
isn't cheap, but it's certainly cheap enough to make librarians and
scholars stop and consider it for possible projects that could benefit
us all...    (By the way, I have no affiliation with CTA, and no
direct knowledge of the software described, other than the article I'm
passing along below. Hope you find it as intriguing as I do).  /David
-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
    ____________________________       304 International Affairs
  ///    -- David Magier --    \\\     Columbia University
 |||     Head, AREA STUDIES     |||    New York, N.Y. 10027
 |||  S&SE Asia, Latin America, |||    (212) 854-8046 / FAX: 212 222-0331
  \\\ Mid-East, Slavic, Africa ///
    ---------------------------        magier at cunixf.cc.columbia.edu
 
----------------cut here----------------------------------
CTA makes first foray into Arabic recognition
 
   CTA Inc. has announced a new version of its OCR software that the
company said provides the first omnifont recognition of Arabic text.
   TextPert Arabic, available now for $1,495, recognizes documents
ranging from 10 to 72 points in several typeset Arabic script styles
and fonts, including Persian and Urdu. It can capture Arabic text at a
rate of 2,500 characters per minute, CTA said. It also handles 32
Indo-European languages and can recognize pages containing Arabic and
non-Arabic alphabets in separate text blocks.
   Like the Indo-European version, TextPert Arabic lets users specify
text blocks to recognize and supports batch processing of TIFF files.
A version of the program also is available with CTA's TextPert High
Speed RISC board for $5,995.
   The program comes with Arabic and English software and
documentation.
   According to CTA, TextPert Arabic was created at the request of
Apple Europe, which helped fund the project and is offering the
product through European dealers. CTA said its contract with Apple
stipulates that TextPert Arabic will be available exclusively on the
Mac for an undisclosed period of time.




More information about the INDOLOGY mailing list