Vijay wrote:
First off pardon me if this is not the right list for
posting this question.
I am new to wikipedia mailing list.
You probably want mediawiki-l or wikitech-l for software issues.
I have just downloaded the wikipedia en dump
Please tell me *exactly* which file you've got (with url), and *exactly* how
you're extracting it (with program versions and command lines, if applicable).
and am trying to cofigure
wikipedia on my local server. I have mysql and php on windows 2003 server. I
installed mediawiki and have extracted the wikipedia en dump file which gave
me an xml file (4 GB).
Is it 4gb exactly? The file may have been cut off; perhaps you used a faulty
decompression utility or you extracted it on a FAT32 filesystem that does not
support large files?
Have you tried piping output instead of decompressing it first? Tried tailing
the end of the file to confirm it closes the XML properly?
When I tried importing the dump to mysql using
importDump.php, the process
started fine. But when the record count reached 18600, the process stopped
with the following error. I wonder what could be the problem? Any help in
this regard is highly appreciated.
Try using mwdumper instead of importDump. Any difference? (mwdumper is faster, too)
By the way, the wikipedia dump file is the current
version and not the
complete one. The zip file is around 900 MB in size.
This one, or an older version?
http://download.wikimedia.org/enwiki/20060125/enwiki-20060125-pages-article…
I'm still downloading it to double-check, but I can confirm that the XML in this
file is valid at least up to 29120 pages in, some ways past the point where your
import broke.
-- brion vibber (brion @
pobox.com)