On Jul 30, 2007, at 3:49 PM, Platonides wrote:
Jim Hu wrote:
Ran into a weird dump problem yesterday, which
has me wondering if
there's a problem with my artificially created xml for upload.
Here's what happens. I have a script that builds wiki pages from an
external source and embeds them in xml for upload via importDump.
The script can be toggled to either generate a single page or a bunch
of them. The same script failed to load some pages that load just
fine if you specify them individually, but ImportDump.php does NOT
crash during the import.
I understand the failing is your building script. What does it load?
Wiki pages from Special:Export?
How does this script handle the XML? Are you using some XML library?
Search and replace?
Yes, I'm sure it's my hacky script ; ). It's based on search and
replace. I make a template for the <page> container with tags like
MW template params, e.g. {{{}}}, and I replace things in that and
write to a file. I have a pair of functions that get called each
time I make a page.
function xml_page_template() {
return '<page>
<title>{{{TITLE}}}</title>
<id>{{{PAGEID}}}</id>
<revision>
<id>1</id>
<timestamp>{{{TIMESTAMP}}}</timestamp>
<contributor>
<username>{{{USERNAME}}}</username>
<id>{{{UID}}}</id>
</contributor>
<comment>Automated import of articles</comment>
<text xml:space="preserve">{{{TEXT}}}</text>
</revision>
</page>';
}#end function xml_page_template
function make_page($title,$text){
global $xml_user, $uid, $change_count;
$page = xml_page_template();
$page = str_replace("{{{TITLE}}}",fix_title($title),$page);
$page = str_replace("{{{PAGEID}}}",$change_count,$page);
$page = str_replace("{{{TIMESTAMP}}}", gmdate("Y-m-
d").'T'.gmdate("H:i:s")."Z",$page);
$page = str_replace("{{{USERNAME}}}",$xml_user,$page);
$page = str_replace("{{{UID}}}",$uid,$page);
$page = str_replace("{{{TEXT}}}",htmlentities($text),$page);
return $page;
}
The scripts that call these are responsible for generating the
parameters passed to them. From what I could tell with earlier
versions of MW, the page id wasn't being used for import, so I just
put a counter in to help me debug the XML.
Jim
I suspect that there is something wrong with the
upstream items, but
I can't find it. The Brown Univ XML validator complains about the
following:
I guess these has to do with not having a DOCTYPE declaration
That's what I thought. Is there supposed to be one there? I don't
see one from the output of Special:Export, which is my model for
building the xml.
_______________________________________________
MediaWiki-l mailing list
MediaWiki-l(a)lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
=====================================
Jim Hu
Associate Professor
Dept. of Biochemistry and Biophysics
2128 TAMU
Texas A&M Univ.
College Station, TX 77843-2128
979-862-4054