On Nov 24, 2004, at 8:24 PM, Brion Vibber wrote:
On Nov 24, 2004, at 8:02 PM, Rich Holton wrote:
The site
2BuyGood.com/InfoPedia
(
http://www.2buygood.com/wiki/) is grabbing live
content from the English Wikipedia, but gives no link
[snip]
While we work on GFDL issues, could a developer
block
the live content grabs?
Done.
I should note that not only were they forwarding every request to our
servers, stripping off all navigation links and identification text,
and stuffing it full of advertising and JavaScript popups, but their
requests to the web server used false referrer and user-agent fields to
hide their tracks. Here are a couple of hits:
66.152.98.14 - - [25/Nov/2004:04:14:19 +0000] "GET
http://www.wikipedia.org/wiki/ HTTP/1.1" 301 564 "en.wikipedia.org"
"Mozilla/4.0 (compatible; MSIE 5.01; Windows 98; DigExt)"
66.152.98.18 - - [25/Nov/2004:04:14:50 +0000] "GET
http://www.wikipedia.org/wiki/Wikipedia:Community_Portal HTTP/1.1" 301
616 "en.wikipedia.org" "Mozilla/4.0 (compatible; MSIE 5.01; Windows 98;
DigExt)"
Instead of using their own site as a referer URL, their digger is using
our hostname ("en.wikipedia.org"). That's not even a valid referrer,
since it should be a URL! And, they're falsely claiming to be Internet
Explorer so it looks like the hits are coming from some human browser.
2buygood.com's front-end address resolves to 66.152.98.201; the hits to
our servers come from several IPs on the same /24 network; I've noticed
from .12 through .20 in the log extracts I saw. I've blocked the whole
subnet at our squid servers, so they're receiving 403 (permission
denied) errors.
-- brion vibber (brion @
pobox.com)