[Foundation-l] Live mirrors

Robert Horning robert_horning at netzero.net
Thu May 31 00:23:23 UTC 2007

Todd Allen wrote:
> Robert Horning wrote:
>> In regards to "live" mirrors that are constantly sucking bandwidth off 
>> of the Wikimedia server farm, I would have to agree that this is a major 
>> problem and something that should be dealt with, both on a legal front 
>> as well as through technical means.  I would be curious about some 
>> comparisons of the bandwidth need of a *very* active Wikimedia 
>> user/administrator who is on-line nearly 24/7 vs. one of these mirror 
>> sites.  I think it would be easy for an active editor/user to suck at 
>> least 1 GB of data/day, but it would be along this order of magnitude of 
>> bandwidth.  It would be an interesting test to see how much it would 
>> actually come out to in practice.
>> Robert Horning
>   But even so, the Foundation has every right to say "It is totally
> acceptable for a very active administrator (or even a very voracious
> reader) to use 1 GB a day if they want to, but it is not acceptable for
> a live mirror to do so." Legitimate (even if heavy) users of a site are
> one thing, bandwidth leeches are quite another.
I guess in part here I'm trying to propose a technical solution of 
sorts.  Perhaps bandwidth for a particular IP address could be throttled 
in some way that would allow a very heavy but legitimate user to access 
the 50-100 pages or so a day that they actually read in some depth (just 
to give a figure), but not allow a mirror to suck up every change unless 
they have made some sort of financial arrangement with the WMF to pay 
for this extra bandwidth.  The WMF will certainly not "make a profit" 
doing this, and even if it became a problem with the IRS, solutions 
could still be found to help get these mirrors to pay for the resources 
they are taking up.

The trick is that some of these mirrors can and do use some very sneaky 
methods to keep their sites up to date with current data.  While you can 
cull out some of these sites and segregating them from ordinary users, 
this is a technical arms race to see who can block the live mirrors and 
those sites with hackers that can "fake" the ability to send requests 
that look like ordinary user requests to keep the pages up to date.  
That I can come up with an algorithm right now to seem like random user 
pages is enough to make me think this could get to the point that would 
make it nearly impossible to detect, except for seeing the pages on the 
mirror show up with very recent changes.  Most mirrors aren't that 
cleaver, so this may not apply in practice.

More information about the foundation-l mailing list