Okay, I think I've managed to semi-automate this now. In the future, updates should appear within a day of new backup dumps going up.
http://www.wikipedia.org/wikistats/EN/Sitemap.htm
This is generated from the backup dumps by Erik Zachte's processing scripts: http://members.chello.nl/epzachte/Wikipedia/Statistics/
(To change the language of the page, change the "EN" to "NL" etc. Most languages are not translated yet; contact Erik for info. All versions should show the same data set.)
Erik: I made some slight tweaks to support filtering the data through bzip2 as it reads; I don't know if that will work on Windows, though it ought to. Patch attached.
-- brion vibber (brion @ pobox.com)
51,52c51,54 < $file_in_old = $path_in . $dumpdate . "_old_table.sql" ; < $file_in_cur = $path_in . $dumpdate . "_cur_table.sql" ; ---
#$file_in_old = $path_in . $dumpdate . "_old_table.sql" ; #$file_in_cur = $path_in . $dumpdate . "_cur_table.sql" ; $file_in_old = $path_in . $language . "/old_table.sql.bz2" ; $file_in_cur = $path_in . $language . "/cur_table.sql.bz2" ;
160c162,166 < open "FILE_IN", "<", $file_in || abort ("Input file " . $file_in . " could not be opened.") ; ---
if($file_in =~ /.bz2$/) { open "FILE_IN", "-|", "bzip2 -dc "$file_in"" || abort ("Input file " . $file_in . " could not be opened.") ; } else { open "FILE_IN", "<", $file_in || abort ("Input file " . $file_in . " could not be opened.") ; }
wikipedia-l@lists.wikimedia.org