Hi,
I use program : webstripper. This program can download site from internet. I'm student and I want download sites with mathematic.
I tray download http://pl.wikipedia.org with program Webstripper.
After not good download I can't see pl.wikipedia.org because server write in my screen this text :
" Forbidden You don't have permission to access / on this server.
Apache/1.3.28 Server at pl.wikipedia.org Port 80 "
I have contact with designer of webstripper and he say : " I can't help you. You must contact with webmaster@wikipedia.org ."
PLEASE HELP ME BACUSE I DON'T HAVE E-MAIL TO WIKIPEDIA AND I'M NOT ROBBER OR PIRAT , I'M STUDENT.
SORRY MY ENGLISH BUT I LERNING 2 WEEKS.
I want see pl.wikipedia.org . If not possible : download wikipedia in extension program how webstripper, then I want see in Internet Explorer.
P> Best regards, P> Andrew
On Sun, 9 Nov 2003, PROJATA wrote:
Hi,
I use program : webstripper. This program can download site from internet. I'm student and I want download sites with mathematic.
I tray download http://pl.wikipedia.org with program Webstripper.
Please do *not* use programs such as Webstripper. Especially on dynamic sites like Wikipedia, they create a huge amount of load on the servers by downloading thousands upon thousands of pages extremely rapidly. One person with one of these programs takes up the equivalent resources of hundreds or even thousands of other users.
Can you imagine if everyone did this? The whole web would collapse, because no server could stand up to the load of everybody in the world requesting thousands and thousands of pages at once.
It's like automatically telephoning everyone in the city because you wanted to call your friend and your sister, and the fact that those two people will pick up the phone (along with everybody else!) saves you the trouble of dialing twice.
Webstripper is banned at Wikipedia. Don't use it. If you've been using it, you've probably also hit the request throttle limit and will be briefly blocked from using the site at all, but this is temporary. (Though if you create a particularly unpleasant amount of server load your IP may be permanently blocked.)
Use a web browser like everybody else, and please stop attacking the web sites that you enjoy.
If you really want the entire wiki, you can download database backups at http://download.wikipedia.org/ , however you'll need some knowledge of using databases to get any use out of them.
-- brion vibber (brion @ pobox.com)
Brion Vibber wrote:
Please do *not* use programs such as Webstripper. Especially on dynamic sites like Wikipedia, they create a huge amount of load on the servers by downloading thousands upon thousands of pages extremely rapidly. [...] Use a web browser like everybody else, and please stop attacking the web sites that you enjoy.
While I of course agree with Brion's statement, I think we should also try to understand the other side. When I was still on a dial-up line that charged by the minute, I also several times tried to download interesting sites all at once, so that I could read them leisurely off-line. That seems to be quite natural behavior. You can't really learn stuff when the clock is ticking.
Don't we have a nice compressed static HTML tree by now that we could offer people under the "Download Wikipedia" heading on the main page?
Our customer here was only interested in math articles, and a download of the whole Wikipedia probably wouldn't even have been attractive for him. So a web site with a static version of Wikipedia that is open to programs like Webstripper would also be a good thing.
Actually, I'm sure that once we offer ready-made HTML trees for download, someone somewhere will set up such a site.
Axel
Dear Sir/Madam,
Brion Vibber wrote:
Please do *not* use programs such as Webstripper. Especially on dynamic sites like Wikipedia, they create a huge amount of load on the servers by downloading thousands upon thousands of pages extremely rapidly. [...] Use a web browser like everybody else, and please stop attacking the web sites that you enjoy.
Don't we have a nice compressed static HTML tree by now that we could offer people under the "Download Wikipedia" heading on the main page?
I haven't looked at the wikipedia software but I expect it's two tier; the wiki markup is stored in a database and converted to HTML with every request. This is not very efficient.
I designed a similar piece of software but with three tiers. The wiki markup is converted to XML before being stored in the database, and back again for editing. This may be time consuming, but it doesn't happen very often. It also allows me to change my wiki markup language (though I don't use that term) if necessary.
When a user requests a page, it is retrieved as XML from the database and transformed to HTML by an XSLT program. The important point is that most users have a browser that can do the XSL transformation. So they can download as much as they want and read it offline, with the XSLT program in their browser cache. The download involves a couple of table lookups per page requested from the server and little processing. (If the user doesn't have a modern browser then they can elect to have transformation done by the server, but that's another story.)
Just an idea...
Regards, John O'Leary.
Axel Boldt wrote:
Our customer here was only interested in math articles, and a download of the whole Wikipedia probably wouldn't even have been attractive for him. So a web site with a static version of Wikipedia that is open to programs like Webstripper would also be a good thing.
Actually, I'm sure that once we offer ready-made HTML trees for download, someone somewhere will set up such a site.
This brings back the issue of classification and categorization schemes (which I think Magnus was working on). Of course, I get gunshy when people turn that into a debate about censorship. Certainly the tool that such schemes give us can also be used to effect censorship. Similarly those who oppose gun control will say that guns have more uses than just shooting people.
I very much support a broadly inclusionist Wikipedia with as few rules as possible for what it allows. At the same time I support multiple classification schemes, and am not troubled at all by the notion that someone might develop a highly censored subset of Wikipedia. If someone wants to set up a subset that would exclude all articles that have the word "and" it would seem goofy, who am I to complain as long as it does not compromise the inclusionist nature of the overall project?
In the past I've tended to support one classification scheme, but really there's no reason not to have multiple classification schemes. All we would need is a way to distinguish which scheme is being used.
Ec
On Fri, Nov 14, 2003 at 07:00:31PM +0100, Axel Boldt wrote:
Brion Vibber wrote:
Please do *not* use programs such as Webstripper. Especially on dynamic sites like Wikipedia, they create a huge amount of load on the servers by
[...]
While I of course agree with Brion's statement, I think we should also try to understand the other side. When I was still on a dial-up line that charged by the minute, I also several times tried to download interesting sites all at once, so that I could read them leisurely off-line. That seems to be quite natural behavior. You can't really learn stuff when the clock is ticking.
Don't we have a nice compressed static HTML tree by now that we could offer people under the "Download Wikipedia" heading on the main page?
I see no problem about the possibility to generate a whole Wikipedia (mirror) site as a static snapshot. Probably it could be done with minimal modificaiton of the scripts. Static content can be retrieved by bots without causing any trouble.
Consider it as a special mirror.
I'm not volunteering though ;-)
g.
wikipedia-l@lists.wikimedia.org