Ashar Voultoiz wrote:
The squid statistics report show us that some site are leaking our bandwidth. How to tell? They have a huge number of images referral and barely none for pages.
One example: In December, channelsurfing.net has been seen as a referrer for:
- 1000 pages roughly
- 1 740 000 images
whatchnewfilms.com is 14 000 / 581 000.
By looking at their pages, they use upload.wikimedia.org and glue some advertisement around there.
Given the cost in bandwidth, hard drives, CPU, architecture ... I do think we should find a solution to block thoses sites as much as possible. Would it be possible at the squid level?
You're talking about hotlinking, right? Looking at the page source of channelsurfing.net, they're clearly hotlinking quite a bit. But as David notes, we generally encourage our content to be spread and used.
Tim did some investigation into the issue of hotlinking in July 2008. His statistics and some of his findings are here: http://meta.wikimedia.org/w/index.php?oldid=1104187#Statistics
To quote Tim directly: [quote] I'll save my comments on the bulk of the proposal for later, but I'll say this now: it's certainly not worth my time (or that of any other system administrator) to deal with these sites on a case-by-case basis. Bandwidth may be valuable, but staff time is also valuable. [/quote]
In his view, the costs outweighed any benefit to looking at hotlinking on a case-by-case basis, particularly when you factor in CPU time to process regexes at the Squid level and sysadmin time to monitor and update these records.
That having been said, it may make sense to make specific exceptions for statistical outliers in the logs. Of course you can read his comments directly at the linked page and make your cost/benefit analysis. :-)
MZMcBride