bandwidth thieves blocked

List overview All Threads
Download

newer

older

Re: [Wikitech-l] Re: Spambot

Spambot

Gabriel Wicke

10 Mar 2004 10 Mar '04

11:30 a.m.

No more image bandwidth used by the following sites: # image theft acls acl imgurl urlpath_regex -i .(jpg|gif|png)$ acl badimgref referer_regex -i wikipedia.t-st.de acl badimgref referer_regex -i worldhistory.com acl badimgref referer_regex -i yourencyclopedia.net acl badimgref referer_regex -i wordiq.com acl badimgref referer_regex -i artpolitic.org acl badimgref referer_regex -i ruv.net # deny access http_access deny badimgref imgurl

Thanks to Jeronim who compiled the list.

-- Gabriel Wicke

Show replies by date

erik_moeller＠gmx.de

10 Mar 10 Mar

1:22 p.m.

Gabriel-

...

No more image bandwidth used by the following sites:

We are not currently offering an archive of our images at download.wikipedia.org. It seems unfair to me to block Wikipedia mirrors from referring to images when there's no obvious way to get the images for them to host them on their own.

Regards,

Erik

Gabriel Wicke

1:42 p.m.

On Wed, 10 Mar 2004 19:24:00 +0100, Erik Moeller wrote:

...

Gabriel-

...
No more image bandwidth used by the following sites:

We are not currently offering an archive of our images at download.wikipedia.org. It seems unfair to me to block Wikipedia mirrors from referring to images when there's no obvious way to get the images for them to host them on their own.

right click -> save image as...

I don't think the mirrors have a right to get free hosting, and it's not hard to save them really if somebody wanted to. If there's a concensus (and no copyright problem) to provide a big tar with all images for more convenience, then it's just a matter of adding it at download.wikimedia.org.

-- Gabriel Wicke

erik_moeller＠gmx.de

1:50 p.m.

Gabriel-

...

I don't think the mirrors have a right to get free hosting, and it's not hard to save them really if somebody wanted to.

Well, we don't make it exactly easy either, what with blocking all the spiders ..

...

If there's a concensus (and no copyright problem) to provide a big tar with all images for more convenience, then it's just a matter of adding it at download.wikimedia.org.

If it's OK for us to host the images in the first place, then it's also OK to put them into a tarball.

Regards,

Erik

Arvind Narayanan

11 Mar 11 Mar

5:40 a.m.

On Wed, Mar 10, 2004 at 07:51:00PM +0100, Erik Moeller wrote:

...

Gabriel-

...
I don't think the mirrors have a right to get free hosting, and it's not hard to save them really if somebody wanted to.

Well, we don't make it exactly easy either, what with blocking all the spiders ..

...
If there's a concensus (and no copyright problem) to provide a big tar with all images for more convenience, then it's just a matter of adding it at download.wikimedia.org.

If it's OK for us to host the images in the first place, then it's also OK to put them into a tarball.

I was under the impression that we can't redistribute the fair use images?

But if we're going to make a tarball, then bit torrent might be useful.

Arvind

-- Its all GNU to me

Ashar Voultoiz

10 Mar 10 Mar

11:13 p.m.

On Wed, 10 Mar 2004 19:42:21 +0100, Gabriel Wicke wrote: <snip>

...

right click -> save image as...

Hello,

As there is currently 87 000 pictures on wikipedia, I highly doubt anyone will want to use that. Even a bot grabbing page will certainly be blocked fast.

...

I don't think the mirrors have a right to get free hosting, and it's not hard to save them really if somebody wanted to. If there's a concensus (and no copyright problem) to provide a big tar with all images for more convenience, then it's just a matter of adding it at download.wikimedia.org.

That leads to the problem of fair use and redistribution of the wikipedia datas :( That's the main reason I am against fair use : ease redistribution.

cheers,

-- Ashar Voultoiz http://fr.wikipedia.org/Utilisateur:Hashar do not send mail to listme@listme.dsbl.org

David Rodeback

12 Mar 12 Mar

11:30 a.m.

Gabriel Wicke <groups@...> writes:

...

No more image bandwidth used by the following sites: # image theft acls acl imgurl urlpath_regex -i .(jpg|gif|png)$ acl badimgref referer_regex -i wikipedia.t-st.de acl badimgref referer_regex -i worldhistory.com acl badimgref referer_regex -i yourencyclopedia.net acl badimgref referer_regex -i wordiq.com acl badimgref referer_regex -i artpolitic.org acl badimgref referer_regex -i ruv.net # deny access

Mr. Wicke,

I speak only for WorldHistory.com in making the following comments and posing the following questions, but I suspect our situation is common to most or all of the sites you have labeled "bandwidth thieves."

Comments:

First, please note that WorldHistory.com would like to continue mirroring wikipedia content, but that content has considerably less appeal without the images.

Second, please note that we linked directly to images at wikipedia's site not in an effort to steal bandwidth, but out of respect for your bandwidth. It seemed -- and to me still seems, I must confess -- a very respectful alternative to trying to spider all the images. After considerable exploration, we were able to find no available alternative at Wikipedia for obtaining images, and no clearly defined contact for making inquiries.

Questions:

PhatNav.com apparently has its own copy of all the images. By what mechanism might such a copy be obtained and regularly updated by others?

de.freepedia.org links to wikipedia images in the same way WorldHistory.com does. Have they made some special arrangement to avoid being blocked? If so, what arrangements might the rest of us make? Whom should one contact to discuss such arrangements?

Is there a downloadable archive of images in the immediate future?

Thank you for your consideration. If you would prefer to discuss any of this privately, please e-mail me at DavidRodeback@att.net, and I will be happy to discuss this by e-mail or to telephone you.

David Rodeback Technical Officer WorldHistory.com

Jimmy Wales

12:12 p.m.

David Rodeback wrote:

...

Gabriel Wicke <groups@...> writes:

...
No more image bandwidth used by the following sites: # image theft acls acl imgurl urlpath_regex -i .(jpg|gif|png)$ acl badimgref referer_regex -i wikipedia.t-st.de acl badimgref referer_regex -i worldhistory.com acl badimgref referer_regex -i yourencyclopedia.net acl badimgref referer_regex -i wordiq.com acl badimgref referer_regex -i artpolitic.org acl badimgref referer_regex -i ruv.net # deny access

Mr. Wicke,

I speak only for WorldHistory.com in making the following comments and posing the following questions, but I suspect our situation is common to most or all of the sites you have labeled "bandwidth thieves."

Comments:

First, please note that WorldHistory.com would like to continue mirroring wikipedia content, but that content has considerably less appeal without the images.

Second, please note that we linked directly to images at wikipedia's site not in an effort to steal bandwidth, but out of respect for your bandwidth. It seemed -- and to me still seems, I must confess -- a very respectful alternative to trying to spider all the images. After considerable exploration, we were able to find no available alternative at Wikipedia for obtaining images, and no clearly defined contact for making inquiries.

Questions:

PhatNav.com apparently has its own copy of all the images. By what mechanism might such a copy be obtained and regularly updated by others?

de.freepedia.org links to wikipedia images in the same way WorldHistory.com does. Have they made some special arrangement to avoid being blocked? If so, what arrangements might the rest of us make? Whom should one contact to discuss such arrangements?

Is there a downloadable archive of images in the immediate future?

Thank you for your consideration. If you would prefer to discuss any of this privately, please e-mail me at DavidRodeback@att.net, and I will be happy to discuss this by e-mail or to telephone you.

David Rodeback Technical Officer WorldHistory.com

Wikitech-l mailing list Wikitech-l@Wikipedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l

Jimmy Wales

12:21 p.m.

David Rodeback wrote:

...

Second, please note that we linked directly to images at wikipedia's site not in an effort to steal bandwidth, but out of respect for your bandwidth. It seemed -- and to me still seems, I must confess -- a very respectful alternative to trying to spider all the images. After considerable exploration, we were able to find no available alternative at Wikipedia for obtaining images, and no clearly defined contact for making inquiries.

I'd like to ask that this block on worldhistory be dropped immediately, as David has a point here, and he's being very courteous to come here and chat about it.

Since I am paying for bandwidth, not the other Wikimedia Foundation donors, and since I'm not worried about this right now, we can be lax about it. Perhaps we could even unblock the others on a theory of "innocent until proven guilty".

Let's brainstorm about this problem, because it's increasingly common.

I think we should eventually (but sooner rather than later) have a tarball of images. Inside the tarball there should be a README.Licensing that explains in plain English what's going on, and we should include a copy of the GNU FDL and what not. The license is complicated and tricky to apply in a case like this, in my opinion.

The problem is *not* as some might suppose the "fair use" images. Those are o.k. for us to distribute under the doctrine of fair use, although it's only fair to warn re-users that they need to think about how fair use might apply to whatever it is that they are doing.

The problem is possibly with the attribution and history requirements of the license and so on. It's too early in the morning for me to really consider what would be involved exactly, but we all know the drill.

--Jimbo

Evan Prodromou

1:01 p.m.

...

...
...
...
...
"JW" == Jimmy Wales jwales@bomis.com writes:

JW> The problem is possibly with the attribution and history JW> requirements of the license and so on. It's too early in the JW> morning for me to really consider what would be involved JW> exactly, but we all know the drill.

So, this is an issue on my personal radar screen, and I think it might be nice to start concentrating on how to redistribute MediaWiki sites as a whole.

As a quick fix, it might be reasonable to include database dumps of all the Image: pages, as well as the image tables, to get history and rights info. It's not particularly usable, but it would be there.

A more automated solution might be to include some kind of metadata files in the image dump. Either plain text files like:

Name: this.jpg License: GFDL Author: Evan Prodromou Contributor: Jimmy Wales

...or RDF files. It's debatable whether to make one file per image, or one big listing file.

~ESP

-- Evan Prodromou evan@wikitravel.org Wikitravel - http://www.wikitravel.org/ The free, complete, up-to-date and reliable world-wide travel guide

Gabriel Wicke

1:28 p.m.

On Fri, 12 Mar 2004 09:21:08 -0800, Jimmy Wales wrote:

...

I'd like to ask that this block on worldhistory be dropped immediately, as David has a point here, and he's being very courteous to come here and chat about it.

Worldhistory is not blocked anymore.

In my personal opinion spidering the images and hosting them locally isn't too much to ask for if the sites are commercial. It saves a lot of bandwith as they are only downloaded once, and it's really not hard to do with wget or similar. If this is done in off-peak hours, even better.

Previous discussions on this that i found via google: http://mail.wikipedia.org/pipermail/wikilegal-l/2004-January/000220.html http://en.wikipedia.org/wiki/Wikipedia:Copies_of_Wikipedia_content_(low_degr...)

If the concensus is against blocking i'll switch it off of course.

Cheers

-- Gabriel Wicke

Kelly Anderson

17 Mar 17 Mar

7:35 p.m.

If someone on the WikiMedia project provided code for caching images remotely that could be used by other sites, that would solve the problem to a great extent...

The algorithm could be as simple as 1) Give me a list of all images that were updated in the last N hours 2) download them.

This might be hard, but it doesn't seem so given the capabilities I think already exist in WikiMedia.

-Kelly

At 11:28 AM 3/12/2004, you wrote:

...

On Fri, 12 Mar 2004 09:21:08 -0800, Jimmy Wales wrote:

...
I'd like to ask that this block on worldhistory be dropped immediately, as David has a point here, and he's being very courteous to come here and chat about it.

Worldhistory is not blocked anymore.

In my personal opinion spidering the images and hosting them locally isn't too much to ask for if the sites are commercial. It saves a lot of bandwith as they are only downloaded once, and it's really not hard to do with wget or similar. If this is done in off-peak hours, even better.

Previous discussions on this that i found via google: http://mail.wikipedia.org/pipermail/wikilegal-l/2004-January/000220.html http://en.wikipedia.org/wiki/Wikipedia:Copies_of_Wikipedia_content_(low_degr...)

If the concensus is against blocking i'll switch it off of course.

Cheers

Gabriel Wicke

Wikitech-l mailing list Wikitech-l@Wikipedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l

Karl Eichwalder

12 Mar 12 Mar

1:31 p.m.

David Rodeback DavidRodeback@att.net writes:

...

Second, please note that we linked directly to images at wikipedia's site not in an effort to steal bandwidth, but out of respect for your bandwidth. It seemed -- and to me still seems, I must confess -- a very respectful alternative to trying to spider all the images.

This depends on the traffic caused by your users.

...

After considerable exploration, we were able to find no available alternative at Wikipedia for obtaining images, and no clearly defined contact for making inquiries.

Download and install the texts. Spider your installation and extract images references. Convert the filenames to those matching the pictures at the WP site. Download the files of this list using 'wget'.

Or something like that could work.

-- | ,__o | _-_<, http://www.gnu.franken.de/ke/ | (*)/'(*)

David Rodeback

2:31 p.m.

...

Download and install the texts. Spider your installation and extract images references. Convert the filenames to those matching the pictures at the WP site. Download the files of this list using 'wget'.

Or something like that could work.

Since our current process includes all these steps except the last, at which point we link to the file, not get it, this is easily done.

Am I to gather that a reasonably well-behaved spider is preferred to linking back to Wikipedia's site as we have been doing?

Can someone define for me what would be the off-peak hours in which such a spider should run?

Finally, is there a place at Wikipedia (I know of several elsewhere) for registering such spiders with descriptions and contact information, in case someone observes the spider working and wonders, or in case there is some sort of problem?

Thanks.

Gabriel Wicke

7:31 p.m.

On Fri, 12 Mar 2004 19:31:05 +0000, David Rodeback wrote:

...

...
Download and install the texts. Spider your installation and extract images references. Convert the filenames to those matching the pictures at the WP site. Download the files of this list using 'wget'.

Or something like that could work.

Since our current process includes all these steps except the last, at which point we link to the file, not get it, this is easily done.

Am I to gather that a reasonably well-behaved spider is preferred to linking back to Wikipedia's site as we have been doing?

Can someone define for me what would be the off-peak hours in which such a spider should run?

See http://wikimedia.org/stats/live/org.wikimedia.all.squid.requests-hits.html

...

Finally, is there a place at Wikipedia (I know of several elsewhere) for registering such spiders with descriptions and contact information, in case someone observes the spider working and wonders, or in case there is some sort of problem?

Set the user agent to something descriptive, like 'worldhistory'. Be sure not to include typical spider UA strings. And throttle the requests, wget offers a rate setting for that.

-- Gabriel Wicke

DavidRodeback

11:15 p.m.

Thanks much.

----- Original Message ----- From: "Gabriel Wicke" groups-0dArdoQz2ssGSlwUNQtawg@public.gmane.org Newsgroups: gmane.science.linguistics.wikipedia.technical Sent: Friday, March 12, 2004 5:31 PM Subject: Re: bandwidth thieves blocked

...

On Fri, 12 Mar 2004 19:31:05 +0000, David Rodeback wrote:

...
...
Download and install the texts. Spider your installation and extract images references. Convert the filenames to those matching the

pictures

...

...
...
at the WP site. Download the files of this list using 'wget'.

Or something like that could work.

Since our current process includes all these steps except the last, at

which

...

...
point we link to the file, not get it, this is easily done.

Am I to gather that a reasonably well-behaved spider is preferred to

linking

...

...
back to Wikipedia's site as we have been doing?

Can someone define for me what would be the off-peak hours in which such

...

...
spider should run?

See http://wikimedia.org/stats/live/org.wikimedia.all.squid.requests-hits.html

...
Finally, is there a place at Wikipedia (I know of several elsewhere) for registering such spiders with descriptions and contact information, in

case

...

...
someone observes the spider working and wonders, or in case there is

some sort

...

...
of problem?

Set the user agent to something descriptive, like 'worldhistory'. Be sure not to include typical spider UA strings. And throttle the requests, wget offers a rate setting for that. -- Gabriel Wicke

Timwi

13 Mar 13 Mar

4:50 p.m.

David Rodeback wrote:

...

Finally, is there a place at Wikipedia (I know of several elsewhere) for registering such spiders with descriptions and contact information, in case someone observes the spider working and wonders, or in case there is some sort of problem?

You should probably register a username specifically for the bot (for example: World History Bot) and then leave your information at [[User:World History Bot]].

Greetings, Timwi

7588

Age (days ago)

7596

Last active (days ago)

wikitech-l@lists.wikimedia.org

16 comments

11 participants

tags (0)

participants (11)

Arvind Narayanan
Ashar Voultoiz
David Rodeback
DavidRodeback
erik_moeller＠gmx.de
Evan Prodromou
Gabriel Wicke
Jimmy Wales
Karl Eichwalder
Kelly Anderson
Timwi