wget (was: Re: Abhandlungen der K önigliche n Ak ademie der Wissensch aften zu B erli)

Kengo Harimoto kengo.harimoto at UNI-HAMBURG.DE
Wed May 19 11:11:42 UTC 2010


Why not C-like syntax:

for ((i = 1; i <= 397; i++)) do

? bash 3 (default on OS X) accepts this syntax. It works even when bash called as sh. The ksh, too, has that syntax; I wouldn't be surprised if the original bourne shell had it.

I personally use a perl script calling several instances of curl at the same time.

All the best,

-- 
kengo harimoto

On May 19, 2010, at 12:49 , Dominik Wujastyk wrote:

> I was recently trying to help a colleague with the bash script below for
> fetching entire books from the Digital Library of India.  He had Windows, so
> we installed CygWin, in order to get bash and wget.  However, the script
> wouldn't work.
> 
> I finally discovered that the essential syntax I'm using below, "for i in
> {X..Y..Z}", works with bash version 4, but not earlier.  And CygWin's bash
> is still at version 3 (so in MinGW's).
> 
> When I type "bash --version", I get this:
> 
> $ bash --version
>> GNU bash, version 4.1.5(1)-release (i486-pc-linux-gnu)
>> Copyright (C) 2009 Free Software Foundation, Inc.
>> License GPLv3+: GNU GPL version 3 or later <
>> http://gnu.org/licenses/gpl.html>
>> 
> 
> Sorry, Windows users, it looks like you'll have to wait until bash 4 gets
> into CygWin or MinGW.
> 
> Best,
> Dominik
> 
> 2010/1/14 Dominik Wujastyk <ucgadkw at ucl.ac.uk>
> 
>> Birgit is quite right about the value of wget.  It's an amazing little
>> tool.  I use it routinely to get books from the Digital Library of India,
>> where texts are presented only as individual pages.
>> 
>> Until about a year ago, one could use the "-r" recursion setting of wget
>> to fetch a whole directory-full of files in one go.  Then the DLI disabled
>> that feature.  So now one has to issue a wget command for each page.
>> But it's easy to do with a small bash script like this:
>> 
>> ---------- cut here -----------
>> #!/bin/bash
>> 
>> # fetch Kapadia_Desc.Cat.Govt.Colls.MSS.BORI-Jaina
>> # Literature and Philosophy XIX.1 Svetambara Works_1957
>> 
>> for i in {00000001..397..1}
>>       do
>>               wget
>> http://www.new.dli.ernet.in/data/upload/0048/903/PTIFF/$i.tif
>>       done
>> ---------- cut here -----------
>> 
>> The magic number "371" is the number of pages in the book, which DLI tells
>> you.  In Firefox, you can find out the directory in which a book's TIFF
>> files live by loading a page of the book and then hitting Tools/Page Info
>> and selecting "media".
>> 
>> Bash is the default shell in Linux; it's also available to Windows users
>> by installing the excellent Cygwin.
>> 
>> Best,
>> Dominik
>> 
>> 





More information about the INDOLOGY mailing list