[Wikitech-l] Re: bandwidth thieves blocked

13 Mar 2004

On Fri, 12 Mar 2004 19:31:05 +0000, David Rodeback wrote:

...
   Download and
install the texts.  Spider your installation and extract
 images references.  Convert the filenames to those matching the pictures
 at the WP site.  Download the files of this list using 'wget'.

 Or something like that could work.

 Since our current process includes all these steps except the last, at which 
 point we link to the file, not get it, this is easily done.

 Am I to gather that a reasonably well-behaved spider is preferred to linking 
 back to Wikipedia's site as we have been doing?

 Can someone define for me what would be the off-peak hours in which such a 
 spider should run? 
See
http://wikimedia.org/stats/live/org.wikimedia.all.squid.requests-hits.html

...
  Finally, is there a place at Wikipedia (I know of
several elsewhere) for 
 registering such spiders with descriptions and contact information, in case 
 someone observes the spider working and wonders, or in case there is some sort 
 of problem? 
Set the user agent to something descriptive, like 'worldhistory'. Be sure
not to include typical spider UA strings. And throttle the requests, wget
offers a rate setting for that.
-- 
Gabriel Wicke

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

[Wikitech-l] Re: bandwidth thieves blocked