[Mediawiki-l] Is there a dump XML validator?

Brion Vibber brion at wikimedia.org
Wed Aug 8 13:37:47 UTC 2007


Jim Hu wrote:
> Ran into a weird dump problem yesterday, which has me wondering if  
> there's a problem with my artificially created xml for upload.   
> Here's what happens.  I have a script that builds wiki pages from an  
> external source and embeds them in xml for upload via importDump.   
> The script can be toggled to either generate a single page or a bunch  
> of them.  The same script failed to load some pages that load just  
> fine if you specify them individually, but ImportDump.php does NOT  
> crash during the import.
> 
> I suspect that there is something wrong with the upstream items, but  
> I can't find it.   The Brown Univ XML validator complains about the  
> following:
> 
> line 3, ecoliwiki20070730123135.xml:
> error (1102): tag uses GI for an undeclared element: mediawiki

That sounds like you didn't include a schema declaration (dunno what
your thingy takes, maybe it's doctype only?)

There's an XML Schema description file -- you can use any XML Schema
validator, such as one of the demo scripts packaged with the Apache
Xalan java library, to run over your .xml file.

In theory, anyway. :)

> line 166616, ecoliwiki20070730123135.xml:
> error (1012): reference to undeclared entity:  
> line 166616, ecoliwiki20070730123135.xml:
> error (1003): entity (or its expansion) is invalid:  
> line 166616, ecoliwiki20070730123135.xml:
> error (1012): reference to undeclared entity:  
> line 166616, ecoliwiki20070730123135.xml:
> error (1003): entity (or its expansion) is invalid:  
> line 184234, ecoliwiki20070730123135.xml:

The only predefined named character reference entities in XML are <
> and &.

For any other characters that you really intend to be interpreted *as
the character*, use decimal or binary codes -- eg   or  

For things you want to appear *as the HTML character reference* you need
to escape the & as & for instance " " to be producing
correct XML.

> error (402): EOF encountered; no doctype declaration found: mediawiki
> 
> but I'm pretty sure these are all red herrings.  So...is there a  
> validator out there I should be using?

-- brion vibber (brion @ wikimedia.org)



More information about the MediaWiki-l mailing list