I figured I would go into the XML and manually remove the offending duplicate page/revision, but couldn't find it.
I have gone from top to bottom of the XML file and find no template information, even though "include templates" was ticked.
I know it's a lot to ask, but could you take a quick look Daniel? http://dawson.md/Wikipedia-20090113203939.xml.zip (XML/1.9mb)
Basically, I'm working on a wiki project that stores information about diseases and I just want to use wikipedia's Template:Infobox_Disease. I tried to download it manually and all associated templates and transcended template files but this was just too complicated and would of taken forever. Someone on the list suggested I use Special:Export and tick the "include templates" box. This is where I'm now up to.
All suggestions/help welcomed.
Thank you, Dawson
On 15 Jan 2009, at 12:22, Daniel Kinzler wrote:
Dawson schrieb:
Hello,
I have used Special:Export at en.wikipedia to export "Diabetes_mellitus" and ticked the box "include templates" (I'm only really after the templates).
The resulting XML file is 40.1mb so I decided to go with mwdumper.js rather than Special:Import.
I'm working on a fresh build of mediawiki on my local system. When running the command:
java -jar mwdumper.jar --format=sql:1.5 Wikipedia-20090113203939.xml | mysql -u root -p wiki
It is returning the following error:
1 pages (0.102/sec), 1,000 revs (102.062/sec) ERROR 1062 (23000) at line 99: Duplicate entry '45970' for key 1
This happens when the XML dump contains the same page twice (or was it the same revision, even?). Which shouldn't happen. And if it happens, mwdumper shouldn't crash and burn.
I don't know a goos way around this, really, sorry. The question is: *why* does the dump include the same page twice? Is that legal in terms of the dump format? If yes, why can't mwdumper cope with it?
-- daniel
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l