Hi all,
I just installed Mediawiki 1.7.1 on our intranet and imported successfully 206 pages - however after import each page has had its HTML markup completely removed.
<h2>Some Title</h2> and <p>Some Text</p> are all that was used as markup. I have added the FCKEdititor as a exstention and written a custom LDAP extension after all the others failed, otherwise the mediawiki is stock.
I have confirmed this by viewing and exporting the imported pages.
In order to do the import I firstly exported a existing page - copied the format(I removed the namespaces - which is the only difference) and then combined all of the pages in to one xml (separated by <page> of course).
It imported all perfectly - but without formating.
What am I doing wrong?
Below is my xml import the subsequest export code.
<mediawiki xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.3/ http://www.mediawiki.org/xml/export-0.3.xsd" version="0.3" xml:lang="en"> <siteinfo> <sitename>Telstrapedia</sitename> <generator>MediaWiki 1.7.1</generator> <case>first-letter</case> </siteinfo> <page> <id/><title>3G</title> <revision> <timestamp>2006-11-08T04:30:26Z</timestamp> <contributor> <username>Root</username> </contributor> <text xml:space="preserve"> <h2>URL</h2><p> http://www.telstra.com.au/video/</p> <h2>Summary</h2> <p>Central repository for all 3G information including video services, mobiles, coverage, demos and useful tools. </p> <comment>Imported from XML Excel document - Telstra.com asset list v2.xls (See DME for original)</comment> <minor>false</minor> </revision> </page> </mediawiki>
After importing the export produces a stripped file.
<mediawiki xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.3/ http://www.mediawiki.org/xml/export-0.3.xsd" version="0.3" xml:lang="en"> <siteinfo> <sitename>Telstrapedia</sitename> <base>http://localhost/index.php?title=Main_Page</base> <generator>MediaWiki 1.7.1</generator> <case>first-letter</case> <namespaces> <namespace key="-2">Media</namespace> <namespace key="-1">Special</namespace> <namespace key="0"/> <namespace key="1">Talk</namespace> <namespace key="2">User</namespace> <namespace key="3">User talk</namespace> <namespace key="4">Telstrapedia</namespace> <namespace key="5">Telstrapedia talk</namespace> <namespace key="6">Image</namespace> <namespace key="7">Image talk</namespace> <namespace key="8">MediaWiki</namespace> <namespace key="9">MediaWiki talk</namespace> <namespace key="10">Template</namespace> <namespace key="11">Template talk</namespace> <namespace key="12">Help</namespace> <namespace key="13">Help talk</namespace> <namespace key="14">Category</namespace> <namespace key="15">Category talk</namespace> </namespaces> </siteinfo> <page> <title>3G</title> <id>1410</id> <revision> <id>1427</id> <timestamp>2006-12-18T01:57:46Z</timestamp> <contributor> <username>Root</username> <id>1</id> </contributor> <text xml:space="preserve"> URL http://www.telstra.com.au/video Summary Central repository for all 3G information including video services, mobiles, coverage, demos and useful tools. </text> </revision> </page> </mediawiki>
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Connolly, Wayne wrote:
I just installed Mediawiki 1.7.1 on our intranet and imported successfully 206 pages - however after import each page has had its HTML markup completely removed.
<h2>Some Title</h2> and <p>Some Text</p> are all that was used as markup. I have added the FCKEdititor as a exstention and written a custom LDAP extension after all the others failed, otherwise the mediawiki is stock.
[snip]
<mediawiki xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.3/ http://www.mediawiki.org/xml/export-0.3.xsd" version="0.3" xml:lang="en">
[snip]
<text xml:space="preserve">
<h2>URL</h2><p> http://www.telstra.com.au/video/</p>
[snip]
If you attempt to validate this file, I believe you will find that it violates the .xsd schema.
Special:Import probably _should_ reject it, but probably does not correctly detect all invalid input. The results are undefined, but the result you describe sounds like a likely outcome knowing how the XML parser is used.
The contents of the <text> element must be character data; child elements will not be interpreted correctly. Normally it should thus appear like this:
<text xml:space="preserve"> <h2>URL</h2><p> http://www.telstra.com.au/video/%3C/p%3E;
etc.
If you're generating these export files with a custom tool, the tool needs to be fixed. If you're generating them from a patched MediaWiki, the patch is probably faulty, damaging the export code, and needs to be corrected.
- -- brion vibber (brion @ pobox.com)
mediawiki-l@lists.wikimedia.org