[INDOLOGY] Nagari conversion

Oliver Hellwig hellwig7 at gmx.de
Thu Jun 9 15:14:34 UTC 2022


Thanks a lot, Harry, for explaining the technical background. This
definitely explains the strange encoding. For the time being, re-OCRing
it seems to be one manageable approach (already pointed out by Tim
Cahill), so I will try the Google cloud SDK which works in the
background of SanskritCR.

Best, Oliver

On 09/06/2022 16:40, Harry Spier wrote:
> Oliver,
> When I open the link you gave, in a browser it gives the file name in
> the upper left corner as drahyayana_shrauta_sutra.qxd . The extension
> qxd is for QuarkXpress files.  QuarkXpress is publishing software.  When
> I download the file, it downloads as a pdf but when I look at the
> properties, the fonts in the file are embedded Type 1 postscript fonts.
>
> MSTT315b9a0609O15504302
>
> MSTT319c623cc2O17006000
>
> MSTT31ab77a7ccO21306200
>
> MSTT31b3f9fa67O15204300
>
> So it looks like QuarkXpress has disguised the names of the fonts it
> used in creating the pdf.
>
>
> So as far as I can see, this is a (probably quite old) pdf file created
> from a QuarkXpress file.  Since the fonts aren't unicode fonts, and the
> names of the fonts are disguised,  the only thing I can think of is to
> make a jpeg of each page and enter it into SanskritCR
> https://ocr.sanskritdictionary.com/
> <https://ocr.sanskritdictionary.com/> and then manually correct the errors.
>
> Quite laborious but less laborious than typing the whole thing by hand
> again.
> Harry Spier
>
>
> On Thu, Jun 9, 2022 at 12:32 AM Oliver Hellwig via INDOLOGY
> <indology at list.indology.info <mailto:indology at list.indology.info>> wrote:
>
>     Dear all,
>
>     I came across this digitized version of the Drahyayana Srauta Sutra:
>
>     http://www.hinduonline.co/vedicreserve/kalpa/shrauta/drahyayana_shrauta_sutra.pdf
>     <http://www.hinduonline.co/vedicreserve/kalpa/shrauta/drahyayana_shrauta_sutra.pdf>
>
>     Everything seems fine, but when I try to copy-paste the text, the result
>     for the first line looks like:
>
>     {;Á;y,≈*tsU]m
>
>     (This should be the name of the text.)
>
>     Does anybody know how to obtain readable Devanagari from this kind of
>     custom encoding?
>
>     Best, Oliver
>
>     ---
>     Oliver Hellwig, IVS Zürich/ILI Düsseldorf
>
>     _______________________________________________
>     INDOLOGY mailing list
>     INDOLOGY at list.indology.info <mailto:INDOLOGY at list.indology.info>
>     https://list.indology.info/mailman/listinfo/indology
>     <https://list.indology.info/mailman/listinfo/indology>
>


More information about the INDOLOGY mailing list