[INDOLOGY] OCR with diacritics
Dominik Wujastyk
wujastyk at gmail.com
Sat Mar 29 02:16:05 UTC 2025
It's not the only player, Paras, but some people have good results with
- https://ocrmypdf.readthedocs.io/en/latest/
It supports many languages, and with some ingenuity it might be possible to
make a profile specifically optimized for Indic transliteration. Maybe
someone already has?
Best,
Dominik
--
Dominik Wujastyk, Professor Emeritus,
University of Alberta
"The University of Alberta is committed to the pursuit of truth,
the advancement of learning, and the dissemination of knowledge
through teaching, research and other scholarly and creative activities and
service."
-- Collective Agreement
<https://www.ualberta.ca/human-resources-health-safety-environment/media-library/my-employment/agreements/2020-2024-collective-agreement---working-version.pdf>
3.01
On Tue, 25 Mar 2025 at 23:08, Paras Mehta via INDOLOGY <
indology at list.indology.info> wrote:
> Respected scholars,
> Hello,
>
> An Indology book publisher whom I know has acquired the copyrights of an
> old book on Indology and wants to republish it. The book is in English and
> has many Sanskrit terms in Roman script (i.e. with diacritics). Because the
> printable soft copy of that book is no longer available, the publisher
> wishes to scan the pages of that book and do an OCR on those scans. The
> text obtained by OCR will then be laid out in a file and made ready for
> reprint.
> I would like to know if there is a good OCR resource which can take the
> scans and accurately extract the English text along with the Romanized
> Sanskrit words.
>
> Thank you.
>
> Best wishes,
> Paras Mehta
> Researcher at École française d'Extrême-Orient (Pondicherry)
>
> _______________________________________________
> INDOLOGY mailing list
> INDOLOGY at list.indology.info
> https://list.indology.info/mailman/listinfo/indology
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://list.indology.info/pipermail/indology/attachments/20250328/7a6b8c36/attachment.htm>
More information about the INDOLOGY
mailing list