wget (was: Re: Abhandlungen der K önigliche n Ak ademie der Wissensch aften zu B erli)
Kengo Harimoto
kengo.harimoto at UNI-HAMBURG.DE
Wed May 19 11:11:42 UTC 2010
Why not C-like syntax:
for ((i = 1; i <= 397; i++)) do
? bash 3 (default on OS X) accepts this syntax. It works even when bash called as sh. The ksh, too, has that syntax; I wouldn't be surprised if the original bourne shell had it.
I personally use a perl script calling several instances of curl at the same time.
All the best,
--
kengo harimoto
On May 19, 2010, at 12:49 , Dominik Wujastyk wrote:
> I was recently trying to help a colleague with the bash script below for
> fetching entire books from the Digital Library of India. He had Windows, so
> we installed CygWin, in order to get bash and wget. However, the script
> wouldn't work.
>
> I finally discovered that the essential syntax I'm using below, "for i in
> {X..Y..Z}", works with bash version 4, but not earlier. And CygWin's bash
> is still at version 3 (so in MinGW's).
>
> When I type "bash --version", I get this:
>
> $ bash --version
>> GNU bash, version 4.1.5(1)-release (i486-pc-linux-gnu)
>> Copyright (C) 2009 Free Software Foundation, Inc.
>> License GPLv3+: GNU GPL version 3 or later <
>> http://gnu.org/licenses/gpl.html>
>>
>
> Sorry, Windows users, it looks like you'll have to wait until bash 4 gets
> into CygWin or MinGW.
>
> Best,
> Dominik
>
> 2010/1/14 Dominik Wujastyk <ucgadkw at ucl.ac.uk>
>
>> Birgit is quite right about the value of wget. It's an amazing little
>> tool. I use it routinely to get books from the Digital Library of India,
>> where texts are presented only as individual pages.
>>
>> Until about a year ago, one could use the "-r" recursion setting of wget
>> to fetch a whole directory-full of files in one go. Then the DLI disabled
>> that feature. So now one has to issue a wget command for each page.
>> But it's easy to do with a small bash script like this:
>>
>> ---------- cut here -----------
>> #!/bin/bash
>>
>> # fetch Kapadia_Desc.Cat.Govt.Colls.MSS.BORI-Jaina
>> # Literature and Philosophy XIX.1 Svetambara Works_1957
>>
>> for i in {00000001..397..1}
>> do
>> wget
>> http://www.new.dli.ernet.in/data/upload/0048/903/PTIFF/$i.tif
>> done
>> ---------- cut here -----------
>>
>> The magic number "371" is the number of pages in the book, which DLI tells
>> you. In Firefox, you can find out the directory in which a book's TIFF
>> files live by loading a page of the book and then hitting Tools/Page Info
>> and selecting "media".
>>
>> Bash is the default shell in Linux; it's also available to Windows users
>> by installing the excellent Cygwin.
>>
>> Best,
>> Dominik
>>
>>
More information about the INDOLOGY
mailing list