Ashar Voultoiz wrote:
The squid statistics report show us that some site are
leaking our
bandwidth. How to tell? They have a huge number of images referral and
barely none for pages.
One example:
In December,
channelsurfing.net has been seen as a referrer for:
- 1000 pages roughly
- 1 740 000 images
whatchnewfilms.com is 14 000 / 581 000.
By looking at their pages, they use
upload.wikimedia.org and glue some
advertisement around there.
Given the cost in bandwidth, hard drives, CPU, architecture ... I do
think we should find a solution to block thoses sites as much as
possible. Would it be possible at the squid level?
You're talking about hotlinking, right? Looking at the page source of
channelsurfing.net, they're clearly hotlinking quite a bit. But as David
notes, we generally encourage our content to be spread and used.
Tim did some investigation into the issue of hotlinking in July 2008. His
statistics and some of his findings are here:
http://meta.wikimedia.org/w/index.php?oldid=1104187#Statistics
To quote Tim directly:
[quote]
I'll save my comments on the bulk of the proposal for later, but I'll say
this now: it's certainly not worth my time (or that of any other system
administrator) to deal with these sites on a case-by-case basis. Bandwidth
may be valuable, but staff time is also valuable.
[/quote]
In his view, the costs outweighed any benefit to looking at hotlinking on a
case-by-case basis, particularly when you factor in CPU time to process
regexes at the Squid level and sysadmin time to monitor and update these
records.
That having been said, it may make sense to make specific exceptions for
statistical outliers in the logs. Of course you can read his comments
directly at the linked page and make your cost/benefit analysis. :-)
MZMcBride