On Jul 30, 2007, at 3:49 PM, Platonides wrote:
Jim Hu wrote:
Ran into a weird dump problem yesterday, which has me wondering if there's a problem with my artificially created xml for upload. Here's what happens. I have a script that builds wiki pages from an external source and embeds them in xml for upload via importDump. The script can be toggled to either generate a single page or a bunch of them. The same script failed to load some pages that load just fine if you specify them individually, but ImportDump.php does NOT crash during the import.
I understand the failing is your building script. What does it load? Wiki pages from Special:Export? How does this script handle the XML? Are you using some XML library? Search and replace?
Yes, I'm sure it's my hacky script ; ). It's based on search and replace. I make a template for the <page> container with tags like MW template params, e.g. {{{}}}, and I replace things in that and write to a file. I have a pair of functions that get called each time I make a page.
function xml_page_template() { return '<page> <title>{{{TITLE}}}</title> <id>{{{PAGEID}}}</id> <revision> <id>1</id> <timestamp>{{{TIMESTAMP}}}</timestamp> <contributor> <username>{{{USERNAME}}}</username> <id>{{{UID}}}</id> </contributor> <comment>Automated import of articles</comment> <text xml:space="preserve">{{{TEXT}}}</text> </revision> </page>'; }#end function xml_page_template
function make_page($title,$text){ global $xml_user, $uid, $change_count; $page = xml_page_template(); $page = str_replace("{{{TITLE}}}",fix_title($title),$page); $page = str_replace("{{{PAGEID}}}",$change_count,$page); $page = str_replace("{{{TIMESTAMP}}}", gmdate("Y-m- d").'T'.gmdate("H:i:s")."Z",$page); $page = str_replace("{{{USERNAME}}}",$xml_user,$page); $page = str_replace("{{{UID}}}",$uid,$page); $page = str_replace("{{{TEXT}}}",htmlentities($text),$page); return $page; }
The scripts that call these are responsible for generating the parameters passed to them. From what I could tell with earlier versions of MW, the page id wasn't being used for import, so I just put a counter in to help me debug the XML.
Jim
I suspect that there is something wrong with the upstream items, but I can't find it. The Brown Univ XML validator complains about the following:
I guess these has to do with not having a DOCTYPE declaration
That's what I thought. Is there supposed to be one there? I don't see one from the output of Special:Export, which is my model for building the xml.
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
===================================== Jim Hu Associate Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054