On Nov 24, 2004, at 8:24 PM, Brion Vibber wrote:
On Nov 24, 2004, at 8:02 PM, Rich Holton wrote:
The site 2BuyGood.com/InfoPedia
) is grabbing live
content from the English Wikipedia, but gives no link
While we work on GFDL issues, could a developer
the live content grabs?
I should note that not only were they forwarding every request to our
servers, stripping off all navigation links and identification text,
requests to the web server used false referrer and user-agent fields to
hide their tracks. Here are a couple of hits:
220.127.116.11 - - [25/Nov/2004:04:14:19 +0000] "GET
HTTP/1.1" 301 564 "en.wikipedia.org"
"Mozilla/4.0 (compatible; MSIE 5.01; Windows 98; DigExt)"
18.104.22.168 - - [25/Nov/2004:04:14:50 +0000] "GET
616 "en.wikipedia.org" "Mozilla/4.0 (compatible; MSIE 5.01; Windows 98;
Instead of using their own site as a referer URL, their digger is using
our hostname ("en.wikipedia.org"). That's not even a valid referrer,
since it should be a URL! And, they're falsely claiming to be Internet
Explorer so it looks like the hits are coming from some human browser.
2buygood.com's front-end address resolves to 22.214.171.124; the hits to
our servers come from several IPs on the same /24 network; I've noticed
from .12 through .20 in the log extracts I saw. I've blocked the whole
subnet at our squid servers, so they're receiving 403 (permission
-- brion vibber (brion @ pobox.com