On Tomk32's request I've blocked www.themensuche.de (217.172.181.98) from access to our servers. It was "hosting" German Wikipedia pages by sending every request on to de.wikipedia.org and slurping the content, but using it solely for the purpose of bringing in search hits. The text is never displayed; instead every hit uses a JavaScript redirect through an intermediary or two to amazon.de.
Pages looked like this: <html> <head> <title>Der Jäger aus Kurpfalz</title>
<script language="javascript"> what = 'site www themensuche de aus'; </script> <script language="javascript" src="/daten/dat.js" type="text/javascript"> </script> <link rel="stylesheet" type="text/css" href="/daten/lay.css"> </head> <body bgcolor="white"> <div id="dat" name="dat"> Friedrich Wilhelm Utsch Friedrich Wilhelm Utsch wurde [actual content from wiki snipped for brevity] <center> <a href="index.html">HOME</a> | <a href="De.htm">INDEX</a> | <a href="mailto:webmaster@themensuche.de">MAIL</a> <hr> <p><a href="http://www.google.de/search?q=Der Jäger aus Kurpfalz">SUCHE BEI GOOGLE</a> | <a href="http://search.msn.com/results.aspx?q=Der Jäger aus Kurpfalz">SUCHE BEI MSN</a></p> <p> History:<br> Copyright (c) 2004 <D-E.W-I-K-I-P-E-D-I-A.O-R-G></wiki/Der_Jäger_aus_Kurpfalz><br> Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation. A copy of the license is included in the section entitled<br> <a href="gnu_fdl.txt">"GNU Free Documentation License"</a>. </p> </center> </div> </body> </html>
Where the script brought up at the top contains only: top.location.replace('http://www.steine.de/partnersys/index.html?swi=' + what)
which if you go there, sends you on to the amazon.de search page. This script is executed automatically before any content is shown (and the markup is invalid and nothing at all shows in some browsers even if JS is disabled). There is no way to read Wikipedia content at that site short of turning off JavaScript and using 'view source'.
This is not even *vaguely* legitimate. I've cut them off in the IP firewall on coronelli and browne, so they're no longer able to steal bandwidth on every page hit just to promote their referrer links.
-- brion vibber (brion @ pobox.com)
On Thu, May 20, 2004 at 11:49:26PM -0700, Brion Vibber wrote:
On Tomk32's request I've blocked www.themensuche.de (217.172.181.98) from access to our servers. It was "hosting" German Wikipedia pages by sending every request on to de.wikipedia.org and slurping the content, but using it solely for the purpose of bringing in search hits. The text is never displayed; instead every hit uses a JavaScript redirect through an intermediary or two to amazon.de.
I wrote mails to three other sites obviously running proxies without cache (and have more than 500MB traffic per month). There might be a few others, surely not as evil as themensuche.de but it still costs a lot of traffic which could be avoided by using the database dumps instead. Is anybody up for hunting and contacting them? We would need more stats because the webalizer stats by kbyte only show the TOP10. Also many hits/files but only a few visits is a good indicator.
which if you go there, sends you on to the amazon.de search page. This script is executed automatically before any content is shown (and the markup is invalid and nothing at all shows in some browsers even if JS is disabled). There is no way to read Wikipedia content at that site short of turning off JavaScript and using 'view source'.
There are plugins for firefox for turning off css too.
Brion Vibber wrote:
This is not even *vaguely* legitimate. I've cut them off in the IP firewall on coronelli and browne, so they're no longer able to steal bandwidth on every page hit just to promote their referrer links.
Bravo!
As is well known, I advocate taking a fairly relaxed approach to "bandwidth thieves" who are framing our content, etc. I don't think it's good, but I think the better approach is usually going to be to contact them first and ask them to stop it, and try to get them on board with a technologically better (and more friendly to our pocketbook) method of reuse!
But as Brion says, this particular case is not even *vaguely* legitimate.
Good call.
--Jimbo
On Fri, May 21, 2004 at 05:56:31AM -0700, Jimmy Wales wrote:
As is well known, I advocate taking a fairly relaxed approach to "bandwidth thieves" who are framing our content, etc. I don't think it's good, but I think the better approach is usually going to be to contact them first and ask them to stop it, and try to get them on board with a technologically better (and more friendly to our pocketbook) method of reuse!
But as Brion says, this particular case is not even *vaguely* legitimate.
We haven't done much investigations yet, but as it seems the website is in some connection to de:Benuzter:Herbye who sometimes adds hidden (using divs) links to pages like steine.de which all belong to Merkel Internet Service. themensuche.de has a different Admin-C but uses steine.de as an intermediary.
the admin of themensuche.de seems to have reacted and deactived his evil script, but I didn't get an answer to my mail from them yet.
ciao, tom
wikitech-l@lists.wikimedia.org