On 10/28/07, Andrew Garrett <andrew(a)epstone.net> wrote:
On 10/27/07, Anthony <wikimail(a)inbox.org>
wrote:
[[Wikipedia:Database download#Please do not use a web crawler]]
Have Google and Yahoo been informed of this policy?
Context: "Please do not use a web crawler to download large numbers of
articles."
As in "Don't use a web crawler to get big amounts of data for your own
personal use" (i.e. for mirroring). And it's quite valid, if lots of
people downloaded the entire site one article at a time, we'd end up
with big problems - especially seeing as the load would be evenly
distributed across many articles, and hence there'd be a lot of extra
parsing happening.
Google and Yahoo have nothing to do with this, as search engines would
represent a tiny portion of our requests (whereas many users doing a
lot of requesting would not), and use the data obtained for the public
benefit.
The same could be said about Yousef Ourabi, though. He's only one
person, and he's "interested in mirroring this images so people from
the community who need access to them have multiple choices for
download".
I think Simetrical got the right de facto policy. Don't run a live
mirror, and don't slow down or break anything, and no one's going to
care or even notice.