Vijay wrote:
First off pardon me if this is not the right list for posting this question. I am new to wikipedia mailing list.
You probably want mediawiki-l or wikitech-l for software issues.
I have just downloaded the wikipedia en dump
Please tell me *exactly* which file you've got (with url), and *exactly* how you're extracting it (with program versions and command lines, if applicable).
and am trying to cofigure wikipedia on my local server. I have mysql and php on windows 2003 server. I installed mediawiki and have extracted the wikipedia en dump file which gave me an xml file (4 GB).
Is it 4gb exactly? The file may have been cut off; perhaps you used a faulty decompression utility or you extracted it on a FAT32 filesystem that does not support large files?
Have you tried piping output instead of decompressing it first? Tried tailing the end of the file to confirm it closes the XML properly?
When I tried importing the dump to mysql using importDump.php, the process started fine. But when the record count reached 18600, the process stopped with the following error. I wonder what could be the problem? Any help in this regard is highly appreciated.
Try using mwdumper instead of importDump. Any difference? (mwdumper is faster, too)
By the way, the wikipedia dump file is the current version and not the complete one. The zip file is around 900 MB in size.
This one, or an older version? http://download.wikimedia.org/enwiki/20060125/enwiki-20060125-pages-articles...
I'm still downloading it to double-check, but I can confirm that the XML in this file is valid at least up to 29120 pages in, some ways past the point where your import broke.
-- brion vibber (brion @ pobox.com)