The user also mentioned that there was no clear place to contact about this type of problem. He now knows to contact Wikitech-l, but we may wish to advertise that more. (But that is a separate discussion entirely.)
I thought the procedure was to report it to http://meta.wikimedia.org/wiki/Live_mirrors
I thought the procedure was to report it to http://meta.wikimedia.org/wiki/Live_mirrors
PS In fact, that page explicitly says *not* to report them to this mailing list. We definitely need to get our message straight.
Thomas Dalton <thomas.dalton@...> writes:
I thought the procedure was to report it to http://meta.wikimedia.org/wiki/Live_mirrors
PS In fact, that page explicitly says *not* to report them to this mailing list. We definitely need to get our message straight.
I am the user who initiated this thread; it was kindly posted by Casey B, as I was unsure as what to do. Apparently we have two distinct issues to deal with: #1: Improve the FAQ and self-help messages so that folks who wish to report or act upon these "live mirrors" would know what to do (and not add noise to this group). #2: Figure out if some form of IP-based filtering or other deterrent should be used against this particular site, and/or "live mirrors" in general.
To address #2 first:
... and generally the devs tell me that whenever they block one, it will spring up from another IP, and that they don't bother ...
This indicates that WP tech folks are generally discouraged about implementing any IP filtering as the sites tend to work around such measures. That's a fair position: "Let's not do anything unless it becomes too much of a resource drain". As a occasional contributor, I certainly won't try and tell more dedicated or permanent folks what to do. My only suggestion is to maybe mine the web usage logs/stats with the goal of identifying the worst offenders and possibly target these above a particular threshold for action (GFDL emails / propose them off-line mirroring / filter to deny service or to return "bogus" pages)
Of course, if this type of abuse eventually becomes too much of a nuisance, one could introduce a semi-automated way to red-tag the offending IPs; discussing ideas about how to achieve this is obviously beyond the scope of this thread, and indeed probably a topic for a more private forum, lest we help the would-be-offenders, by offering to much transparency.
Now, _because_ of this potentially lax enforcement, the issue #1 should be dealt with particular caution and with the following goals: -be clear and easily located in the appropriate help / FAQ / Wizards -provide a [simple] procedure of sorts that would be satisfactory to WP users who try and report this type of abuse -include some language that may discourage potential implementers of "live mirrors", or for the least not hint in any way at the fact that WP currently doesn't do anything about this issue.
In the spirit of moving forward, here's a draft for something that may serve the above. [Attention, IANAL and quite the newbie with regards to WP's policies. What follows certainly requires review by more qualified people]
Live "mirror" sites: =================== Some sites query WP behind the scene and integrate WP's pages' content, verbatim or somewhat modified, within their own web pages. This practice is illegal, _even_ if the resulting page includes the proper GDFL notice and WP credit. One should ensure sure that such sites are actually live "mirrors" rather than off-line (legal) mirrors. For example one can check that recently modified pages such as these listed in http://en.wikipedia.org/wiki/Special:Recentchanges are in effect provided at the suspected site in their latest version. Such sites should be reported on http://meta.wikimedia.org/wiki/Live_mirrors so they can be blocked and/or legal action may be undertaken if appropriate. Site managers who wish to provide a regular mirror (legal) of WP can do so by following the instructions at http://en.wikipedia.org/wiki/Wikipedia:Forking_FAQ.
The basic fact of the matter is, Wikipedia is a top-ten website. The number of websites that are large enough to cause any noticeable effect on server performance by live mirroring is probably in the hundreds. Google could literally (I once did some quick calculations) hotlink a Wikimedia image on their front page without much slowing down the image servers. The only reasons Wikimedia has to discourage unapproved live mirroring are 1) it can and does get money from commercially-operated sites for that privilege and 2) we don't, in principle, want people using Wikipedia content without proper GFDL compliance. #2 is a pretty weak reason to spend developer-hours on whack-a-mole, and in the case of #1, practically all of the mirrors would either just stop using Wikipedia content or make use of dumps instead, gaining nothing for the Foundation. So if someone wants to make a script that will find and block these things, okay, but it's not a very high priority.
Or at least that's my two cents, as a non-sysadmin. By all means update the docs, scaring people is good. ;) It's a wiki, feel free. Your text looks okay (although I'm not clear on whether it's actually illegal to hotlink content without permission).
Marc Veillet wrote:
Thomas Dalton <thomas.dalton@...> writes:
I thought the procedure was to report it to http://meta.wikimedia.org/wiki/Live_mirrors
PS In fact, that page explicitly says *not* to report them to this mailing list. We definitely need to get our message straight.
I am the user who initiated this thread; it was kindly posted by Casey B, as I was unsure as what to do. Apparently we have two distinct issues to deal with: #1: Improve the FAQ and self-help messages so that folks who wish to report or act upon these "live mirrors" would know what to do (and not add noise to this group). #2: Figure out if some form of IP-based filtering or other deterrent should be used against this particular site, and/or "live mirrors" in general.
There IS such filtering. And I've seen live mirrors getting such block. My understanding was that we still filtered them.
To address #2 first:
... and generally the devs tell me that whenever they block one, it will spring up from another IP, and that they don't bother ...
This indicates that WP tech folks are generally discouraged about implementing any IP filtering as the sites tend to work around such measures. That's a fair position: "Let's not do anything unless it becomes too much of a resource drain".
There was a discussion about their workarounds, regarding a site mirroring wikipedia by proxy. We can deny access to wikipedia for any proxy they use. Problem is, this also affects proxies used by legitimate readers.
As a occasional contributor, I certainly won't try and tell more dedicated or permanent folks what to do. My only suggestion is to maybe mine the web usage logs/stats with the goal of identifying the worst offenders and possibly target these above a particular threshold for action (GFDL emails / propose them off-line mirroring / filter to deny service or to return "bogus" pages)
If not filtering them, having some list of them for usage comparing could be good.
We might serve them pages with a notice, or advertisements (as was proposed some time ago) but the mirrors will simply strip them.
If filtering is so much a trouble for sysadmins (is it?), it could be done by stewards/meta-admins. Add to a list synchronized each X time.
PS: the image section of http://meta.wikimedia.org/wiki/Live_mirrors should be clearer about if the live mirrors 'hotlink' or 'proxy' the images.
wikitech-l@lists.wikimedia.org