BitComet as (simpler?) alternative to wget

Dominik Wujastyk wujastyk at GMAIL.COM
Tue Jan 26 15:44:26 UTC 2010


That's a good wheeze for people who use Windows.

Thanks,
Dominik

2010/1/26 Lubin, Tim <lubint at wlu.edu>

> One can use a free batch downloader called BitComet to download DLI books
> in one go -- no bash scripts necessary.
>
> Once the program is installed, open it, press Ctrl+B for the batch download
> dialogue box, and paste the entire URL of the first page in the upper field.
>  One way to get this URL is to open the book in the DLI website's reader
> interface, which will produce a URL like this:
>
>
> http://www.new.dli.ernet.in/scripts/FullindexDefault.htm?path1=/rawdataupload1/upload/0132/045&first=1&last=268&barcode=99999990133383
> \
> Then remove "FullindexDefault.htm?path1=/" and the whole string beginning
> with the first &, and append: PTIFF/00000001.tif
>
> to produce:
>
>
> http://www.new.dli.ernet.in/rawdataupload1/upload/0132/045/PTIFF/00000001.tif
>
> Then in the second field of the batch download dialogue box, insert this
> same URL, but replace the 00000001 with the number of the last page preceded
> by enough 0's to make eight digits.  The last page number can be found in
> the original URL after the string "&last=".  In the example above, this will
> produce:
>
>
> http://www.new.dli.ernet.in/rawdataupload1/upload/0132/045/PTIFF/00000268.tif
>
> Then click ADD and DOWNLOAD NOW.
>
> The tifs can then be combined in Acrobat into a single PDF.
>
> Tim
>
>
> ________________________________________
> From: Indology [INDOLOGY at liverpool.ac.uk] On Behalf Of Dominik Wujastyk [
> ucgadkw at UCL.AC.UK]
> Sent: Thursday, January 14, 2010 6:29 AM
> To: INDOLOGY at liverpool.ac.uk
> Subject: wget (was: Re: Abhandlungen der Königliche n Akademie der
>  Wissensch aften zu Berli)
>
> Birgit is quite right about the value of wget.  It's an amazing little
> tool.  I use it routinely to get books from the Digital Library of India,
> where texts are presented only as individual pages.
>
> Until about a year ago, one could use the "-r" recursion setting of wget
> to fetch a whole directory-full of files in one go.  Then the DLI disabled
> that feature.  So now one has to issue a wget command for each page.
> But it's easy to do with a small bash script like this:
>
> ---------- cut here -----------
> #!/bin/sh
>
> # fetch Kapadia_Desc.Cat.Govt.Colls.MSS.BORI-Jaina
> # Literature and Philosophy XIX.1 Svetambara Works_1957
>
> for i in {00000001..397..1}
>        do
>                wget
> http://www.new.dli.ernet.in/data/upload/0048/903/PTIFF/$i.tif
>        done
> ---------- cut here -----------
>
> The magic number "371" is the number of pages in the book, which DLI tells
> you.  In Firefox, you can find out the directory in which a book's TIFF
> files live by loading a page of the book and then hitting Tools/Page Info
> and selecting "media".
>
> Bash is the default shell in Linux; it's also available to Windows users
> by installing the excellent Cygwin.
>
> Best,
> Dominik
>
>
>
> On Tue, 12 Jan 2010, Birgit Kellner wrote:
>
> > Jonathan Silk wrote:
> > > I have a feeling I may be an idiot (well, I'm sure in other respects,
> but in
> > > this case...): I learn from Indologica that we can find online the
> > > *Abhandlungen
> > > der Königlichen Akademie der Wissenschaften zu Berlin. *
> > >
> > > When I go, however, for example to
> > >
> http://bibliothek.bbaw.de/bibliothek-digital/digitalequellen/schriften/anzeige/index_html?band=07-abh/1844&seite:int=711
> ,
> > > I can only get one page at a time; is there not a way to download an
> entire
> > > article?
> > >
> > > Thanks for your advice!  Jonathan
> > >
> > >
> > The interface unfortunately doesn't offer the possibility to download
> > several pages at once (as a PDF, for instance), no.
> >
> > There is a workaround, but it's a bit time-consuming:
> >
> > 1.) Click on a page image with the desired size ("large", for instance).
> > You see the JPG file and get a link like this in the browser's URL line:
> >
> >
> http://bibliothek.bbaw.de/thumbnail?band=07-abh/1844&aufloesung:int=2&img=DAS_jpg/07-abh/1844/jpg-1000/00000002.jpg&seite:int=2
> >
> > This means that the image file is stored in this directory:
> >
> > http://bibliothek.bbaw.de/DAS_jpg/07-abh/1844/jpg-1000/
> >
> > 2.) If you have the wonderful little tool wget installed, run
> >
> > wget -r http://bibliothek.bbaw.de/DAS_jpg/07-abh/1844/jpg-1000/
> >
> > from the command line.
> >
> > This gets you all images for the volume in question to a local folder.
> > You can then create a PDF file with these images (even run OCR on them
> > before), with, for instance Adobe Acrobat Pro (on Windows); I gather on
> > a recent Mac OS, there should already be tools available in the
> > operating system for the job.
> >
> > I trust the real geeks on the list can provide further advice :-)
> >
> > Best,
> >
> > b
> >
>
> --
> --
> After nearly 25 years, I'm phasing out this UCL email account.
> Please switch over to using the email address wujastyk at gmail.com
>
> !SIG:4b4f0048141301638493625!





More information about the INDOLOGY mailing list