There is a misunderstanding, our IP was blocked today as a "Remote Loader". We DO NOT load wikipedia's pages live, we only CACHE the pages in our server. We send a monthly request to update our cache for requested pages, so I don't understand why our IP was blocked.
As you can check, we are still providing the cached pages FROM OUR SERVER:
http://www.ebaita.com/enciclopedia.asp
http://www.tiosam.com/enciclopedia/
I also make sure the license and link to wikipedia are in all pages.
Please remove the block so we can update the pages when necessary.
Thanks,
Tony
Webmaster TioSam.com and eBaita.com
Webmaster wrote:
There is a misunderstanding, our IP was blocked today as a "Remote Loader". We DO NOT load wikipedia's pages live, we only CACHE the pages in our server. We send a monthly request to update our cache for requested pages, so I don't understand why our IP was blocked. As you can check, we are still providing the cached pages FROM OUR SERVER: http://www.ebaita.com/enciclopedia.asp
http://www.tiosam.com/enciclopedia/
I also make sure the license and link to wikipedia are in all pages.
Please remove the block so we can update the pages when necessary.
So I suppose when you made 500,000 requests over 3 days with a faked User-Agent string, for a cumulative server processing time of 14 hours, you were updating your cache?
-- Tim Starling
Tim, You are probably talking about some other mirror. No faked User-Agent string here. If I wanted to hide something, I'd use a proxy. Ebaita.com and enciclopedia.tiosam.com share the same cache in the same server, with more than 200 thousand cached pages, as you can see when you google "site:enciclopedia.tiosam.com". As pages are cached for 30 days from the date they are requested, you can easily verify that the pages are being served by us. Tony
-----Original Message----- From: wikitech-l-bounces@lists.wikimedia.org [mailto:wikitech-l-bounces@lists.wikimedia.org] On Behalf Of Tim Starling Sent: Tuesday, January 30, 2007 12:25 AM To: wikitech-l@lists.wikimedia.org Subject: Re: [Wikitech-l] Our IP was blocked by mistake
Webmaster wrote:
There is a misunderstanding, our IP was blocked today as a "Remote
Loader".
We DO NOT load wikipedia's pages live, we only CACHE the pages in our server. We send a monthly request to update our cache for requested pages, so I don't understand why our IP was blocked. As you can check, we are still providing the cached pages FROM OUR SERVER: http://www.ebaita.com/enciclopedia.asp
http://www.tiosam.com/enciclopedia/
I also make sure the license and link to wikipedia are in all pages.
Please remove the block so we can update the pages when necessary.
So I suppose when you made 500,000 requests over 3 days with a faked User-Agent string, for a cumulative server processing time of 14 hours, you were updating your cache?
-- Tim Starling
_______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 30/01/07, Webmaster webmaster@tiosam.com wrote:
You are probably talking about some other mirror. No faked User-Agent string here. If I wanted to hide something, I'd use a proxy. Ebaita.com and enciclopedia.tiosam.com share the same cache in the same server, with more than 200 thousand cached pages, as you can see when you google "site:enciclopedia.tiosam.com". As pages are cached for 30 days from the date they are requested, you can easily verify that the pages are being served by us.
If you're just caching a regular snapshot, what's wrong with the database dumps which are created specifically to make a regular snapshot available?
- d.
Where can we find them? Thanks
-----Original Message----- From: wikitech-l-bounces@lists.wikimedia.org [mailto:wikitech-l-bounces@lists.wikimedia.org] On Behalf Of David Gerard Sent: Tuesday, January 30, 2007 9:07 AM To: Wikimedia developers Subject: Re: [Wikitech-l] Our IP was blocked by mistake
On 30/01/07, Webmaster webmaster@tiosam.com wrote:
You are probably talking about some other mirror. No faked User-Agent string here. If I wanted to hide something, I'd use a proxy. Ebaita.com and enciclopedia.tiosam.com share the same cache in the same server, with more than 200 thousand cached pages, as you can see when you google "site:enciclopedia.tiosam.com". As pages are cached for 30 days from the date they are requested, you can easily verify that the pages are being served by us.
If you're just caching a regular snapshot, what's wrong with the database dumps which are created specifically to make a regular snapshot available?
- d.
_______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 1/31/07, Webmaster webmaster@tiosam.com wrote:
Where can we find them?
Not hard to find. See also http://meta.wikimedia.org/wiki/Data_dumps
Thank you.
-----Original Message----- From: wikitech-l-bounces@lists.wikimedia.org [mailto:wikitech-l-bounces@lists.wikimedia.org] On Behalf Of Stephen Bain Sent: Tuesday, January 30, 2007 9:19 AM To: Wikimedia developers Subject: Re: [Wikitech-l] Our IP was blocked by mistake
On 1/31/07, Webmaster webmaster@tiosam.com wrote:
Where can we find them?
Not hard to find. See also http://meta.wikimedia.org/wiki/Data_dumps
On 30/01/07, Webmaster webmaster@tiosam.com wrote:
Where can we find them? Thanks
In general, we discourage people remote loading content from our sites, in part because it causes extra load on our servers which hasn't come from one of our direct users, and in part because a vast number of such sites end up violating one or more of the terms of the content licence, usually the attribution side of things.
We provide regular database dumps to make the information freely available in a manner that is friendlier to our servers, and that is usually easier to process and import - it's much more convenient for both parties to import all the revision text at once, rather than crawling for it.
Even a "periodic cache update" is not particularly acceptable, as it implies a burst of load, such as the one Tim mentioned. We have the logs, we know what goes on.
Rob Church
Sorry, I didn't know that a periodic cache update wasn't acceptable. Also, we didn't fake the Agent string - but I'm not sure what string the Microsoft.XMLHTTP object shows when visiting. I will check the link provided and rewrite the code so everybody is happy :) Thanks, Tony
-----Original Message----- From: wikitech-l-bounces@lists.wikimedia.org [mailto:wikitech-l-bounces@lists.wikimedia.org] On Behalf Of Rob Church Sent: Tuesday, January 30, 2007 9:20 AM To: Wikimedia developers Subject: Re: [Wikitech-l] Our IP was blocked by mistake
On 30/01/07, Webmaster webmaster@tiosam.com wrote:
Where can we find them? Thanks
In general, we discourage people remote loading content from our sites, in part because it causes extra load on our servers which hasn't come from one of our direct users, and in part because a vast number of such sites end up violating one or more of the terms of the content licence, usually the attribution side of things.
We provide regular database dumps to make the information freely available in a manner that is friendlier to our servers, and that is usually easier to process and import - it's much more convenient for both parties to import all the revision text at once, rather than crawling for it.
Even a "periodic cache update" is not particularly acceptable, as it implies a burst of load, such as the one Tim mentioned. We have the logs, we know what goes on.
Rob Church
_______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
David Gerard schrieb:
On 30/01/07, Webmaster webmaster@tiosam.com wrote:
You are probably talking about some other mirror. No faked User-Agent string here. If I wanted to hide something, I'd use a proxy. Ebaita.com and enciclopedia.tiosam.com share the same cache in the same server, with more than 200 thousand cached pages, as you can see when you google "site:enciclopedia.tiosam.com". As pages are cached for 30 days from the date they are requested, you can easily verify that the pages are being served by us.
If you're just caching a regular snapshot, what's wrong with the database dumps which are created specifically to make a regular snapshot available?
the ones for the big wikis (en,de) need REALLY good machines to import (good ram, enough disk...) and even with good machines they need ages to import :-(
regards, marco
On 31/01/07, Marco Schuster CDL-Klever@gmx.net wrote:
the ones for the big wikis (en,de) need REALLY good machines to import (good ram, enough disk...) and even with good machines they need ages to import :-(
Lack of hardware is no excuse. To be honest, attempting to mirror a large web site such as Wikipedia on low-end hardware is a foolish idea.
Rob Church
On 31/01/07, Rob Church robchur@gmail.com wrote:
On 31/01/07, Marco Schuster CDL-Klever@gmx.net wrote:
the ones for the big wikis (en,de) need REALLY good machines to import (good ram, enough disk...) and even with good machines they need ages to import :-(
Lack of hardware is no excuse. To be honest, attempting to mirror a large web site such as Wikipedia on low-end hardware is a foolish idea.
You don't need a lot to serve a flat HTML version, except disk space.
- d.
Marco Schuster wrote:
David Gerard schrieb:
If you're just caching a regular snapshot, what's wrong with the database dumps which are created specifically to make a regular snapshot available?
the ones for the big wikis (en,de) need REALLY good machines to import (good ram, enough disk...) and even with good machines they need ages to import :-(
Would the static HTML dumps work for your purposes? They are just HTML files, available in compressed archives per-language: http://static.wikipedia.org/
Note that if you plan to use them for mirroring (as opposed to, say, research), you will need to modify them so that they don't present themselves as being Wikipedia, or otherwise misuse the Wikimedia trademarks.
(Incidentally, the link is to the November 2006 dump, but the log says that the December 2006 dump finished; could somebody update that?)
-Mark
Webmaster wrote:
Tim, You are probably talking about some other mirror. No faked User-Agent string here. If I wanted to hide something, I'd use a proxy. Ebaita.com and enciclopedia.tiosam.com share the same cache in the same server, with more than 200 thousand cached pages, as you can see when you google "site:enciclopedia.tiosam.com". As pages are cached for 30 days from the date they are requested, you can easily verify that the pages are being served by us. Tony
No faked user agent string? So I suppose you were using "save as" in IE?
sq25.wikimedia.org 8575224 1169671516.598 1 65.111.167.50 TCP_MISS/200 13139 GET http://pt.wikipedia.org/wiki/Santa_Maria_delle_Grazieaction=edit SIBLING_HIT/66.230.200.134 text/html - - Mozilla/4.0%20(compatible;%20MSIE%207.0;%20Windows%20NT%205.2;%20.NET%20CLR%201.1.4322;%20.NET%20CLR%202.0.50727)
-- Tim Starling
Tim Starling wrote:
No faked user agent string? So I suppose you were using "save as" in IE? Mozilla/4.0%20(compatible;%20MSIE%207.0;%20Windows%20NT%205.2;%20.NET%20CLR%201.1.4322;%20.NET%20CLR%202.0.50727)
No, he was using the Microsoft.XMLHTTP object from ASP, as he indicated in a previous message. Said object identifies itself as MSIE and gives the .NET CLR version in the User-Agent.
Thanks. That explains it. Also, the IP shown is not enciclopedia.tiosam.com nor ebaita.com: it is www.tiosam.com where I just included the English version a couple of weeks ago (http://www.tiosam.com/Ingles/encyclopedia ). The load was probably google indexing the pages in English, not what we already have cached for the Portuguese version. Anyway, I apologize for my ignorance. I'm downloading the dump and will start coding as soon as I find out how it works. Thanks guys.
-----Original Message----- From: wikitech-l-bounces@lists.wikimedia.org [mailto:wikitech-l-bounces@lists.wikimedia.org] On Behalf Of Ivan Krstic Sent: Tuesday, January 30, 2007 11:40 AM To: Wikimedia developers Subject: Re: [Wikitech-l] Our IP was blocked by mistake
Tim Starling wrote:
No faked user agent string? So I suppose you were using "save as" in IE? Mozilla/4.0%20(compatible;%20MSIE%207.0;%20Windows%20NT%205.2;%20.NET% 20CLR%201.1.4322;%20.NET%20CLR%202.0.50727)
No, he was using the Microsoft.XMLHTTP object from ASP, as he indicated in a previous message. Said object identifies itself as MSIE and gives the .NET CLR version in the User-Agent.
-- Ivan Krstić krstic@solarsail.hcs.harvard.edu | GPG: 0x147C722D
_______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org