Hello,
The squid statistics report show us that some site are leaking our bandwidth. How to tell? They have a huge number of images referral and barely none for pages.
One example: In December, channelsurfing.net has been seen as a referrer for: - 1000 pages roughly - 1 740 000 images whatchnewfilms.com is 14 000 / 581 000.
By looking at their pages, they use upload.wikimedia.org and glue some advertisement around there.
Given the cost in bandwidth, hard drives, CPU, architecture ... I do think we should find a solution to block thoses sites as much as possible. Would it be possible at the squid level?
http://stats.wikimedia.org/archive/squid_reports/2010-12/SquidReportOrigins....
On 21 January 2011 22:49, Ashar Voultoiz hashar+wmf@free.fr wrote:
Given the cost in bandwidth, hard drives, CPU, architecture ... I do think we should find a solution to block thoses sites as much as possible. Would it be possible at the squid level?
Given we actively endorse hotlinking (we merely caution against doing it for thumbnails) -
http://commons.wikimedia.org/wiki/Commons:Reusing_content_outside_Wikimedia#...
- is there a problem to solve here?
- d.
Ashar Voultoiz wrote:
The squid statistics report show us that some site are leaking our bandwidth. How to tell? They have a huge number of images referral and barely none for pages.
One example: In December, channelsurfing.net has been seen as a referrer for:
- 1000 pages roughly
- 1 740 000 images
whatchnewfilms.com is 14 000 / 581 000.
By looking at their pages, they use upload.wikimedia.org and glue some advertisement around there.
Given the cost in bandwidth, hard drives, CPU, architecture ... I do think we should find a solution to block thoses sites as much as possible. Would it be possible at the squid level?
You're talking about hotlinking, right? Looking at the page source of channelsurfing.net, they're clearly hotlinking quite a bit. But as David notes, we generally encourage our content to be spread and used.
Tim did some investigation into the issue of hotlinking in July 2008. His statistics and some of his findings are here: http://meta.wikimedia.org/w/index.php?oldid=1104187#Statistics
To quote Tim directly: [quote] I'll save my comments on the bulk of the proposal for later, but I'll say this now: it's certainly not worth my time (or that of any other system administrator) to deal with these sites on a case-by-case basis. Bandwidth may be valuable, but staff time is also valuable. [/quote]
In his view, the costs outweighed any benefit to looking at hotlinking on a case-by-case basis, particularly when you factor in CPU time to process regexes at the Squid level and sysadmin time to monitor and update these records.
That having been said, it may make sense to make specific exceptions for statistical outliers in the logs. Of course you can read his comments directly at the linked page and make your cost/benefit analysis. :-)
MZMcBride
On Fri, Jan 21, 2011 at 5:02 PM, MZMcBride z@mzmcbride.com wrote:
You're talking about hotlinking, right? Looking at the page source of channelsurfing.net, they're clearly hotlinking quite a bit. But as David notes, we generally encourage our content to be spread and used.
I particularly enjoyed the part that their header is hotlinked and hosted on commons.
http://commons.wikimedia.org/wiki/File:Cslogo.gif
Seems like thats definitively using commons as their image host.
On Fri, Jan 21, 2011 at 3:05 PM, OQ overlordq@gmail.com wrote:
On Fri, Jan 21, 2011 at 5:02 PM, MZMcBride z@mzmcbride.com wrote:
You're talking about hotlinking, right? Looking at the page source of channelsurfing.net, they're clearly hotlinking quite a bit. But as David notes, we generally encourage our content to be spread and used.
I particularly enjoyed the part that their header is hotlinked and hosted on commons.
http://commons.wikimedia.org/wiki/File:Cslogo.gif
Seems like thats definitively using commons as their image host.
I think that's an abuse of Commons, and someone (OQ?) seems to have started a deletion request for that (also, you might want to grab and AFD nominate the other header that the same account uploaded, for another part of the site...).
If they're linking to images we legitimately host and which meet our image guidelines, are used in WP or other WMF projects, etc, then ... Shrug. I didn't realize we were ok with hotlinking like that, but if that's the published policy, that's the published policy.
On 21 January 2011 23:31, George Herbert george.herbert@gmail.com wrote:
If they're linking to images we legitimately host and which meet our image guidelines, are used in WP or other WMF projects, etc, then ... Shrug. I didn't realize we were ok with hotlinking like that, but if that's the published policy, that's the published policy.
It's the published guideline. Presumably a techie could change it at any time if it were no longer true.
- d.
On Fri, Jan 21, 2011 at 6:32 PM, David Gerard dgerard@gmail.com wrote:
On 21 January 2011 23:31, George Herbert george.herbert@gmail.com wrote:
If they're linking to images we legitimately host and which meet our image guidelines, are used in WP or other WMF projects, etc, then ... Shrug. I didn't realize we were ok with hotlinking like that, but if that's the published policy, that's the published policy.
It's the published guideline. Presumably a techie could change it at any time if it were no longer true.
Default InstantCommons configuration (and suggested manual configuration) includes caching; so sites using it should (mostly) not be hotlinking.
However, you can disable the caching in which case you'll be hotlinking the thumbs.
-Chad
wikitech-l@lists.wikimedia.org