Debian Woody is my mortal enemy. I thought it was dead
finally, but no...
Ah, you see to a distro-laggard like our company, Woody is still a
very viable option, in fact it's considered quite safe and stable,
whereas Sarge is considered a little bit too new and risque to be
trusted yet with mission-critical stuff. That mentality will probably
only change when the updates for woody stop (which I believe is
scheduled to happen on 1/05/2006, unless Etch is released before that
date, which honestly seems rather unlikely). So only after the 1st May
2006 will Woody finally really be dead :-)
On the data import front, today I've tried installing MediaWiki
1.5RC4, and importing via importDump, but I didn't have much luck:
======================================================
# wget
http://download.wikimedia.org/wikipedia/en/20050909_pages_public.xml.gz
# gzip --test ~nickj/wikipedia/20050909_pages_public.xml.gz
// no output or error, so presumably the gzip file is not corrupt
# md5sum ~nickj/wikipedia/20050909_pages_public.xml.gz
1de5093f1dd6c5afd4ed080474456d54
/home/nickj/wikipedia/20050909_pages_public.xml.gz
// this matches the sum in
http://download.wikimedia.org/wikipedia/en/20050909-md5sums
# gzip -dc ~nickj/wikipedia/20050909_pages_public.xml.gz | php
maintenance/importDump.php
100 (58.59332294078 pages/sec 58.59332294078 revs/sec)
200 (54.030806663729 pages/sec 54.030806663729 revs/sec)
300 (51.490838593282 pages/sec 51.490838593282 revs/sec)
400 (50.320459245887 pages/sec 50.320459245887 revs/sec)
500 (49.26519486778 pages/sec 49.26519486778 revs/sec)
600 (48.360482472114 pages/sec 48.360482472114 revs/sec)
700 (48.871787388943 pages/sec 48.871787388943 revs/sec)
800 (49.154366513935 pages/sec 49.154366513935 revs/sec)
900 (49.266177573961 pages/sec 49.266177573961 revs/sec)
1000 (49.018343826441 pages/sec 49.018343826441 revs/sec)
1100 (49.167214599006 pages/sec 49.167214599006 revs/sec)
1200 (49.605583964957 pages/sec 49.605583964957 revs/sec)
1300 (49.425412530694 pages/sec 49.425412530694 revs/sec)
1400 (49.357795012659 pages/sec 49.357795012659 revs/sec)
1500 (49.401458453695 pages/sec 49.401458453695 revs/sec)
1600 (49.248578795592 pages/sec 49.248578795592 revs/sec)
1700 (49.205397806241 pages/sec 49.205397806241 revs/sec)
1800 (49.139689484041 pages/sec 49.139689484041 revs/sec)
1900 (49.369342847918 pages/sec 49.369342847918 revs/sec)
2000 (49.706945229133 pages/sec 49.706945229133 revs/sec)
2100 (49.860871622316 pages/sec 49.860871622316 revs/sec)
2200 (49.935237390351 pages/sec 49.935237390351 revs/sec)
2300 (49.976472942288 pages/sec 49.976472942288 revs/sec)
2400 (49.965834881883 pages/sec 49.965834881883 revs/sec)
2500 (50.095592279086 pages/sec 50.095592279086 revs/sec)
2600 (49.913163596511 pages/sec 49.913163596511 revs/sec)
2700 (50.346513647263 pages/sec 50.346513647263 revs/sec)
2800 (50.554639314109 pages/sec 50.554639314109 revs/sec)
2900 (50.30025952798 pages/sec 50.30025952798 revs/sec)
3000 (50.235683978552 pages/sec 50.235683978552 revs/sec)
3100 (49.935743124336 pages/sec 49.935743124336 revs/sec)
3200 (49.898859597319 pages/sec 49.898859597319 revs/sec)
Content-type: text/html
#
// (i.e. spontaneously aborts after ~3200 pages and ~60 seconds).
======================================================
I'm beginning to suspect that some kind of higher-being is determined
that under no circumstances will I be able to load this data into a
database ;-)
All the best,
Nick.