Dear Sir/Madam,
Brion Vibber wrote:
Please do *not* use programs such as Webstripper. Especially on dynamic sites like Wikipedia, they create a huge amount of load on the servers by downloading thousands upon thousands of pages extremely rapidly. [...] Use a web browser like everybody else, and please stop attacking the web sites that you enjoy.
Don't we have a nice compressed static HTML tree by now that we could offer people under the "Download Wikipedia" heading on the main page?
I haven't looked at the wikipedia software but I expect it's two tier; the wiki markup is stored in a database and converted to HTML with every request. This is not very efficient.
I designed a similar piece of software but with three tiers. The wiki markup is converted to XML before being stored in the database, and back again for editing. This may be time consuming, but it doesn't happen very often. It also allows me to change my wiki markup language (though I don't use that term) if necessary.
When a user requests a page, it is retrieved as XML from the database and transformed to HTML by an XSLT program. The important point is that most users have a browser that can do the XSL transformation. So they can download as much as they want and read it offline, with the XSLT program in their browser cache. The download involves a couple of table lookups per page requested from the server and little processing. (If the user doesn't have a modern browser then they can elect to have transformation done by the server, but that's another story.)
Just an idea...
Regards, John O'Leary.