BitComet as (simpler?) alternative to wget

Lubin, Tim lubint at WLU.EDU
Tue Jan 26 15:31:03 UTC 2010


One can use a free batch downloader called BitComet to download DLI books in one go -- no bash scripts necessary.  

Once the program is installed, open it, press Ctrl+B for the batch download dialogue box, and paste the entire URL of the first page in the upper field.  One way to get this URL is to open the book in the DLI website's reader interface, which will produce a URL like this:

http://www.new.dli.ernet.in/scripts/FullindexDefault.htm?path1=/rawdataupload1/upload/0132/045&first=1&last=268&barcode=99999990133383
\
Then remove "FullindexDefault.htm?path1=/" and the whole string beginning with the first &, and append: PTIFF/00000001.tif

to produce:

http://www.new.dli.ernet.in/rawdataupload1/upload/0132/045/PTIFF/00000001.tif

Then in the second field of the batch download dialogue box, insert this same URL, but replace the 00000001 with the number of the last page preceded by enough 0's to make eight digits.  The last page number can be found in the original URL after the string "&last=".  In the example above, this will produce:

http://www.new.dli.ernet.in/rawdataupload1/upload/0132/045/PTIFF/00000268.tif

Then click ADD and DOWNLOAD NOW.

The tifs can then be combined in Acrobat into a single PDF.  

Tim


________________________________________
From: Indology [INDOLOGY at liverpool.ac.uk] On Behalf Of Dominik Wujastyk [ucgadkw at UCL.AC.UK]
Sent: Thursday, January 14, 2010 6:29 AM
To: INDOLOGY at liverpool.ac.uk
Subject: wget (was: Re: Abhandlungen der Königliche n Akademie der  Wissensch aften zu Berli)

Birgit is quite right about the value of wget.  It's an amazing little
tool.  I use it routinely to get books from the Digital Library of India,
where texts are presented only as individual pages.

Until about a year ago, one could use the "-r" recursion setting of wget
to fetch a whole directory-full of files in one go.  Then the DLI disabled
that feature.  So now one has to issue a wget command for each page.
But it's easy to do with a small bash script like this:

---------- cut here -----------
#!/bin/sh

# fetch Kapadia_Desc.Cat.Govt.Colls.MSS.BORI-Jaina
# Literature and Philosophy XIX.1 Svetambara Works_1957

for i in {00000001..397..1}
        do
                wget http://www.new.dli.ernet.in/data/upload/0048/903/PTIFF/$i.tif
        done
---------- cut here -----------

The magic number "371" is the number of pages in the book, which DLI tells
you.  In Firefox, you can find out the directory in which a book's TIFF
files live by loading a page of the book and then hitting Tools/Page Info
and selecting "media".

Bash is the default shell in Linux; it's also available to Windows users
by installing the excellent Cygwin.

Best,
Dominik



On Tue, 12 Jan 2010, Birgit Kellner wrote:

> Jonathan Silk wrote:
> > I have a feeling I may be an idiot (well, I'm sure in other respects, but in
> > this case...): I learn from Indologica that we can find online the
> > *Abhandlungen
> > der Königlichen Akademie der Wissenschaften zu Berlin. *
> >
> > When I go, however, for example to
> > http://bibliothek.bbaw.de/bibliothek-digital/digitalequellen/schriften/anzeige/index_html?band=07-abh/1844&seite:int=711,
> > I can only get one page at a time; is there not a way to download an entire
> > article?
> >
> > Thanks for your advice!  Jonathan
> >
> >
> The interface unfortunately doesn't offer the possibility to download
> several pages at once (as a PDF, for instance), no.
>
> There is a workaround, but it's a bit time-consuming:
>
> 1.) Click on a page image with the desired size ("large", for instance).
> You see the JPG file and get a link like this in the browser's URL line:
>
> http://bibliothek.bbaw.de/thumbnail?band=07-abh/1844&aufloesung:int=2&img=DAS_jpg/07-abh/1844/jpg-1000/00000002.jpg&seite:int=2
>
> This means that the image file is stored in this directory:
>
> http://bibliothek.bbaw.de/DAS_jpg/07-abh/1844/jpg-1000/
>
> 2.) If you have the wonderful little tool wget installed, run
>
> wget -r http://bibliothek.bbaw.de/DAS_jpg/07-abh/1844/jpg-1000/
>
> from the command line.
>
> This gets you all images for the volume in question to a local folder.
> You can then create a PDF file with these images (even run OCR on them
> before), with, for instance Adobe Acrobat Pro (on Windows); I gather on
> a recent Mac OS, there should already be tools available in the
> operating system for the job.
>
> I trust the real geeks on the list can provide further advice :-)
>
> Best,
>
> b
>

--
--
After nearly 25 years, I'm phasing out this UCL email account.
Please switch over to using the email address wujastyk at gmail.com

!SIG:4b4f0048141301638493625!





More information about the INDOLOGY mailing list