wget (was: Re: Abhandlungen der K öni gliche n Ak ademie der Wissensch aften zu Berli)
Dominik Wujastyk
wujastyk at GMAIL.COM
Wed May 19 11:48:48 UTC 2010
The whole difficulty is to get the right number of leading zeros in the tif
filenames (e.g., "00000001.tiff" .. "00009999.tiff"). Your solution doesn't
do this.
D
2010/5/19 amba kulkarni <ambapradeep at gmail.com>
> With bash version 3.1.17
>
> GNU bash, version 3.1.17(1)-release (i586-suse-linux-gnu)
> Copyright (C) 2005 Free Software Foundation, Inc.
>
> for i in {X..Y} works incrementing i by 1 each time.
> So you may try it on CygWin's bash.
>
> -- amba kulkarni
>
> आ नो भद्रा: क्रतवो यन्तु विश्वत: ll
> Let noble thoughts come to us from every side.
> - Rig Veda, I-89-i.
>
> Reader and Head
> Department of Sanskrit Studies
> University of Hyderabad
> 040 23133802(off)
>
> http://sanskrit.uohyd.ernet.in
>
>
> 2010/5/19 Dominik Wujastyk <wujastyk at gmail.com>
>
> I was recently trying to help a colleague with the bash script below for
>> fetching entire books from the Digital Library of India. He had Windows,
>> so
>> we installed CygWin, in order to get bash and wget. However, the script
>> wouldn't work.
>>
>> I finally discovered that the essential syntax I'm using below, "for i in
>> {X..Y..Z}", works with bash version 4, but not earlier. And CygWin's bash
>> is still at version 3 (so in MinGW's).
>>
>> When I type "bash --version", I get this:
>>
>> $ bash --version
>> > GNU bash, version 4.1.5(1)-release (i486-pc-linux-gnu)
>> > Copyright (C) 2009 Free Software Foundation, Inc.
>> > License GPLv3+: GNU GPL version 3 or later <
>> > http://gnu.org/licenses/gpl.html>
>> >
>>
>> Sorry, Windows users, it looks like you'll have to wait until bash 4 gets
>> into CygWin or MinGW.
>>
>> Best,
>> Dominik
>>
>> 2010/1/14 Dominik Wujastyk <ucgadkw at ucl.ac.uk>
>>
>> > Birgit is quite right about the value of wget. It's an amazing little
>> > tool. I use it routinely to get books from the Digital Library of
>> India,
>> > where texts are presented only as individual pages.
>> >
>> > Until about a year ago, one could use the "-r" recursion setting of wget
>> > to fetch a whole directory-full of files in one go. Then the DLI
>> disabled
>> > that feature. So now one has to issue a wget command for each page.
>> > But it's easy to do with a small bash script like this:
>> >
>> > ---------- cut here -----------
>> > #!/bin/bash
>> >
>> > # fetch Kapadia_Desc.Cat.Govt.Colls.MSS.BORI-Jaina
>> > # Literature and Philosophy XIX.1 Svetambara Works_1957
>> >
>> > for i in {00000001..397..1}
>> > do
>> > wget
>> > http://www.new.dli.ernet.in/data/upload/0048/903/PTIFF/$i.tif
>> > done
>> > ---------- cut here -----------
>> >
>> > The magic number "371" is the number of pages in the book, which DLI
>> tells
>> > you. In Firefox, you can find out the directory in which a book's TIFF
>> > files live by loading a page of the book and then hitting Tools/Page
>> Info
>> > and selecting "media".
>> >
>> > Bash is the default shell in Linux; it's also available to Windows users
>> > by installing the excellent Cygwin.
>> >
>> > Best,
>> > Dominik
>> >
>> >
>>
>
>
More information about the INDOLOGY
mailing list