[Toolserver-l] Static dump of German Wikipedia

Marco Schuster marco at harddisk.is-a-geek.org
Thu Sep 23 23:57:55 UTC 2010


Hi all,

I have made a list of all the 1.9M articles in NS0 (including
redirects / short pages) using the Toolserver; now I have the list I'm
going to download every single of 'em (after the trial period tonight,
I want to see how this works out. I'd like to begin with downloading
the whole thing in 3 or 4 days, if noone objects) and then publish a
static dump of it. Data collection will be on the Toolserver
(/mnt/user-store/dewiki-static/articles/); the request rate will be 1
article per second and I'll download the new files once or twice a day
to my home PC, so there should be no problem with the TS or Wikimedia
server load.
When this is finished in ~ 21-22 days, I'm going to compress them and
upload them to my private server (well, if Wikimedia has an archive
server, that 'd be better) as a tgz file so others can play with it.
Furthermore, though I have no idea if I'll succeed, I plan on hacking
a static Vector skin file which will load the articles using jQuery's
excellent .load() feature, so that everyone with JS can enjoy a truly
offline Wikipedia.

Marco

PS: When trying to invoke /w/index.php?action=render with an invalid
oldid, the server returns HTTP/1.1 200 OK and an error message, but
shouldn't this be a 404 or 500?

-- 
VMSoft GbR
Nabburger Str. 15
81737 München
Geschäftsführer: Marco Schuster, Volker Hemmert
http://vmsoft-gbr.de



More information about the Toolserver-l mailing list