[INDOLOGY] Digital Library of India
L.S. Cousins
selwyn at ntlworld.com
Thu Dec 19 10:14:59 UTC 2013
Thanks for this, Michael. It might have been very useful. However, as
often happens, immediately after posting to the list, I managed to get
the downloading to work in dli-downloader. Later on with help from the
author I could get the search function to work too. Now it works fine.
This is a much easier method, I think. Dli-downloader copies the index
of the DLI to one's harddisk. So one can search even without a
connexion. Then a simple click selects and downloads to one's harddisk.
It is automatically converted to a pdf of the text. Downloads are queued
and resumed as necessary.
Lance Cousins
> One working method, from the now defunct blog of Somadeva Vasudeva:
>
>> Here is my method, working entirely in BBEdit without ever opening a terminal page.
>>
>> 1) Open a new BBEdit file and call it "getfiles.sh" or something similar. Put this file into a new folder, mine is called "Scans" (or name it after the book you are downloading). I keep this on the desktop for convenience and keep getfiles.sh open in BBEdit, ready for the next steps.
>>
>> Say you are now browsing DLI with Safari and open the first page of a book, say again the Sāṃrājyalakṣmīpīṭhikā. If you look at the URL you see the following:
>>
>> http://www.new.dli.ernet.in/scripts/FullindexDefault.htm?path1=/rawdataupload/upload/0098/415&first=1&last=527&barcode=5990010098415
>>
>> This is not the URL you need to download the images, however. To get the URL you do need, you can control-click (hold down the control key while while clicking) on the image in Safari and select "Copy image address" (penultimate item in the pop-up menu). The URL is now in the global clipboard and can be pasted (cmd-v) anywhere.
>>
>> 2) Switch to BBEdit (cmd-tab), open getfiles.sh and paste the URL into the document (cmd-v). You should see:
>>
>> http://www.new.dli.ernet.in/rawdataupload/upload/0098/415/PTIFF/00000001.tif
>>
>> This is the URL for the first tiff image of the the scanned book. We prefix curl -O (the option -O means "save the file with the same name it has on the web site"), so that we have:
>>
>> curl -O http://www.new.dli.ernet.in/rawdataupload/upload/0098/415/PTIFF/00000001.tif
>>
>> To get the range of pages we need the last page. It is tedious to navigate to the last page in Safari, but that is not necessary. Instead, look at the address bar, it has a section like this: "&first=1&last=527"
>> So our last page is 527. The range of numbers is easy to set in curl, it is represented with square brackets as: [00000001-527].tif
>>
>> Our complete curl command is therefore:
>>
>> curl -O http://www.new.dli.ernet.in/rawdataupload/upload/0098/415/PTIFF/[00000001-527].tif
>>
>> To this we prefix on a separate line: "#!/bin/bash".
>>
>> 3) The whole file getfiles.sh should now look like this:
>>
>> #!/bin/bash
>> curl -O http://www.new.dli.ernet.in/rawdataupload/upload/0098/415/PTIFF/[00000001-527].tif
>>
>> Save this.
>>
>> 4) In the BBEdit "shebang" menu (it looks like this: #!, to the right of the "Window" menu) select "Run".
>>
>> 5) the images will download to your folder in a few seconds. A log window will open in BBEdit when you are done, telling you what was downloaded, just delete this, it is of no value. The images will appear in whatever folder the "getfiles.sh" file is in. Be careful, curl will simply overwrite any existing files with the same name, so immediately print them to pdf when done (see next step).
>>
>> 6) When you have the tiff images I would select them all, double click to open them in Preview and print them to a single PDF for convenience. In Preview there are two types of print now (not sure when this was introduced). So select all files in the "drawer" and then choose "print selected", not just "print". I now use an application called "Papers" to store, organize and tag such PDFs. Papers installs a convenient option in the "print selected" dialogue to print the PDFs directly to Papers. Immediately add bibliographic details and notes, you will otherwise have difficulties finding the file later.
>>
>> You can keep the older curl searches in the file getfiles.sh. Simply put a # at the head of the line they contain, this turns the line into a "comment" that is not read at runtime.
>>
>> While BBEdit is running curl, a “script running” box appears that has a “cancel” button. If you hit “cancel” it will free up BBEdit’s script environment, but the curl command will keep running invisibly in the background until completion (tiff files will keep appearing in the folder). If you want to stop it you can use activity monitor (in Apps/Utilities). Find the Process called curl and hit the “quit process” button (or if you use the terminal a lot, "kill" the appropriate PID).
>
> ––
> Michael Slouber
> Visiting Assistant Professor of South Asia
> Department of Liberal Studies
> Western Washington University
More information about the INDOLOGY
mailing list