wget (was: Re: Abhandlungen der K öni gliche n Ak ademie der Wissensch aften zu Berli)

Dominik Wujastyk wujastyk at GMAIL.COM
Wed May 19 11:48:48 UTC 2010


The whole difficulty is to get the right number of leading zeros in the tif
filenames (e.g., "00000001.tiff" .. "00009999.tiff").  Your solution doesn't
do this.

D

2010/5/19 amba kulkarni <ambapradeep at gmail.com>

> With bash version 3.1.17
>
> GNU bash, version 3.1.17(1)-release (i586-suse-linux-gnu)
> Copyright (C) 2005 Free Software Foundation, Inc.
>
> for i in {X..Y} works incrementing i by 1 each time.
> So you may try it on CygWin's bash.
>
> -- amba kulkarni
>
> आ नो भद्रा: क्रतवो यन्तु विश्वत: ll
> Let noble thoughts come to us from every side.
> - Rig Veda, I-89-i.
>
> Reader and Head
> Department of Sanskrit Studies
> University of Hyderabad
> 040 23133802(off)
>
> http://sanskrit.uohyd.ernet.in
>
>
> 2010/5/19 Dominik Wujastyk <wujastyk at gmail.com>
>
> I was recently trying to help a colleague with the bash script below for
>> fetching entire books from the Digital Library of India.  He had Windows,
>> so
>> we installed CygWin, in order to get bash and wget.  However, the script
>> wouldn't work.
>>
>> I finally discovered that the essential syntax I'm using below, "for i in
>> {X..Y..Z}", works with bash version 4, but not earlier.  And CygWin's bash
>> is still at version 3 (so in MinGW's).
>>
>> When I type "bash --version", I get this:
>>
>> $ bash --version
>> > GNU bash, version 4.1.5(1)-release (i486-pc-linux-gnu)
>> > Copyright (C) 2009 Free Software Foundation, Inc.
>> > License GPLv3+: GNU GPL version 3 or later <
>> > http://gnu.org/licenses/gpl.html>
>> >
>>
>> Sorry, Windows users, it looks like you'll have to wait until bash 4 gets
>> into CygWin or MinGW.
>>
>> Best,
>> Dominik
>>
>> 2010/1/14 Dominik Wujastyk <ucgadkw at ucl.ac.uk>
>>
>> > Birgit is quite right about the value of wget.  It's an amazing little
>> > tool.  I use it routinely to get books from the Digital Library of
>> India,
>> > where texts are presented only as individual pages.
>> >
>> > Until about a year ago, one could use the "-r" recursion setting of wget
>> > to fetch a whole directory-full of files in one go.  Then the DLI
>> disabled
>> > that feature.  So now one has to issue a wget command for each page.
>> > But it's easy to do with a small bash script like this:
>> >
>> > ---------- cut here -----------
>> > #!/bin/bash
>> >
>> > # fetch Kapadia_Desc.Cat.Govt.Colls.MSS.BORI-Jaina
>> > # Literature and Philosophy XIX.1 Svetambara Works_1957
>> >
>> > for i in {00000001..397..1}
>> >        do
>> >                wget
>> > http://www.new.dli.ernet.in/data/upload/0048/903/PTIFF/$i.tif
>> >        done
>> > ---------- cut here -----------
>> >
>> > The magic number "371" is the number of pages in the book, which DLI
>> tells
>> > you.  In Firefox, you can find out the directory in which a book's TIFF
>> > files live by loading a page of the book and then hitting Tools/Page
>> Info
>> > and selecting "media".
>> >
>> > Bash is the default shell in Linux; it's also available to Windows users
>> > by installing the excellent Cygwin.
>> >
>> > Best,
>> > Dominik
>> >
>> >
>>
>
>





More information about the INDOLOGY mailing list