Re: [INDOLOGY] Looking for "Śukranīti sāra"

Michael Slouber Michael.Slouber at wwu.edu
Thu Oct 30 18:15:59 UTC 2014


Dear Patrick, et al:

Below is a technique to download whole books from the DLI from a blog posting that is not my own.  The scholar no longer maintains the blog, so may no longer endorse the technique:
DLI to PDF with BBEdit<http://pratibham.blogspot.com/2010/06/dli-to-pdf-with-bbedit.html>
Sat, 06/26/2010 - 01:07

As PDSz has just pointed out, the Sāṃrājyalakṣmīpīṭhikā edition is available as a scan at DLI. Their website does not allow downloads of whole books but rather expects you to read them page by page. You can use wget or curl to fetch the whole set of tiff files. Here is my method, working entirely in BBEdit without ever opening a terminal page.

1) Open a new BBEdit file and call it "getfiles.sh" or something similar. Put this file into a new folder, mine is called "Scans" (or name it after the book you are downloading). I keep this on the desktop for convenience and keep getfiles.sh open in BBEdit, ready for the next steps.

Say you are now browsing DLI with Safari and open the first page of a book, say again the Sāṃrājyalakṣmīpīṭhikā. If you look at the URL you see the following:

http://www.new.dli.ernet.in/scripts/FullindexDefault.htm?path1=/rawdataupload/upload/0098/415&first=1&last=527&barcode=5990010098415

This is not the URL you need to download the images, however. To get the URL you do need, you can control-click (hold down the control key while while clicking) on the image in Safari and select "Copy image address" (penultimate item in the pop-up menu). The URL is now in the global clipboard and can be pasted (cmd-v) anywhere.

2) Switch to BBEdit (cmd-tab), open getfiles.sh and paste the URL into the document (cmd-v). You should see:

http://www.new.dli.ernet.in/rawdataupload/upload/0098/415/PTIFF/00000001.tif

This is the URL for the first tiff image of the the scanned book. We prefix curl -O (the option -O means "save the file with the same name it has on the web site"), so that we have:

curl -O http://www.new.dli.ernet.in/rawdataupload/upload/0098/415/PTIFF/00000001.tif

To get the range of pages we need the last page. It is tedious to navigate to the last page in Safari, but that is not necessary. Instead, look at the address bar, it has a section like this: "&first=1&last=527"
So our last page is 527. The range of numbers is easy to set in curl, it is represented with square brackets as: [00000001-527].tif

Our complete curl command is therefore:

curl -O http://www.new.dli.ernet.in/rawdataupload/upload/0098/415/PTIFF/[00000001-527].tif

To this we prefix on a separate line: "#!/bin/bash".

3) The whole file getfiles.sh should now look like this:

#!/bin/bash
curl -O http://www.new.dli.ernet.in/rawdataupload/upload/0098/415/PTIFF/[00000001-527].tif

Save this.

4) In the BBEdit "shebang" menu (it looks like this: #!, to the right of the "Window" menu) select "Run".

5) the images will download to your folder in a few seconds. A log window will open in BBEdit when you are done, telling you what was downloaded, just delete this, it is of no value. The images will appear in whatever folder the "getfiles.sh" file is in. Be careful, curl will simply overwrite any existing files with the same name, so immediately print them to pdf when done (see next step).

6) When you have the tiff images I would select them all, double click to open them in Preview and print them to a single PDF for convenience. In Preview there are two types of print now (not sure when this was introduced). So select all files in the "drawer" and then choose "print selected", not just "print". I now use an application called "Papers" to store, organize and tag such PDFs. Papers installs a convenient option in the "print selected" dialogue to print the PDFs directly to Papers. Immediately add bibliographic details and notes, you will otherwise have difficulties finding the file later.

You can keep the older curl searches in the file getfiles.sh. Simply put a # at the head of the line they contain, this turns the line into a "comment" that is not read at runtime.

While BBEdit is running curl, a “script running” box appears that has a “cancel” button. If you hit “cancel” it will free up BBEdit’s script environment, but the curl command will keep running invisibly in the background until completion (tiff files will keep appearing in the folder). If you want to stop it you can use activity monitor (in Apps/Utilities). Find the Process called curl and hit the “quit process” button (or if you use the terminal a lot, "kill" the appropriate PID).

––
Michael Slouber
Assistant Professor of South Asia
Department of Liberal Studies
Western Washington University


On २०१४ अक्टोबर ३०, at ११:०९ पूर्व मध्यान्ह, Patrick Olivelle <jpo at UTS.CC.UTEXAS.EDU<mailto:jpo at UTS.CC.UTEXAS.EDU>> wrote:

http://www.new1.dli.ernet.in/cgi-bin/metainfo.cgi?&title1=sukranitisara&author1=gaustava%20opperta&subject1=Religion&year=1882%20&language1=Sanskrit&pages=299&barcode=5010010087212&author2=NULL&identifier1=NULL&publisher1=The%20Government%20Press,%20Madras&contributor1=NULL&vendor1=svi&scanningcentre1=rmsc,%20iiith&scannerno1=0&digitalrepublisher1=NULL&digitalpublicationdate1=0000-00-00&numberedpages1=297&unnumberedpages1=2&rights1=OUT_OF_COPYRIGHT&copyrightowner1=NULL&copyrightexpirydate1=0000-00-00&format1=NULL%20&url=/data6/upload/0158/102



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://list.indology.info/pipermail/indology/attachments/20141030/af63d6ef/attachment.htm>


More information about the INDOLOGY mailing list