Hi,
I have a large volume of valid xhtml [xhtml1-transitional.dtd] that I want to import into my MediaWiki instance.
Are any good filters available that might convert my xhtml to MediaWiki's wiki-text format?
Thanks,
~mm
In case you don't get an answer that provides an immediate solution to your need, here is a reasonable do it yourself way.
We provide XSLT scripts at: http://www.riehle.org/category/wiki-tech/
There is no immediate solution for your problem (we only have the other way, Wiki Creole to XHTML), but we have pieces of what you need to do there.
(Wiki Creole is a subset of MediaWiki syntax; it might be sufficient for what your XHTML files express---you probably don't have templates etc.)
You would need to write an XSLT script that converts XHTML to our XML-interchange format for wiki markup. This is pretty linear, as it is quite similar. (We simply felt XHTML is not a good XML interchange format.) Once you have it in the XML interchange format, you can use our XSLT scripts to convert from XML to markup. Since we provide source you can tweak it to your needs.
Hope this helps,
Dirk
On Wed, Feb 27, 2008 at 10:05 AM, Michael Monaghan mickmon@gmail.com wrote:
Hi,
I have a large volume of valid xhtml [xhtml1-transitional.dtd] that I want to import into my MediaWiki instance.
Are any good filters available that might convert my xhtml to MediaWiki's wiki-text format?
Thanks,
~mm _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Wed, Feb 27, 2008 at 2:24 PM, Dirk Riehle dirk@riehle.org wrote:
(Wiki Creole is a subset of MediaWiki syntax; it might be sufficient for what your XHTML files express---you probably don't have templates etc.)
WikiCreole is not a subset of MediaWiki syntax, or even close. Translating XHTML to WikiCreole will cause all of the following to completely fail to work in MediaWiki: bold, italics, explicit line breaks, image inclusions, tables, and escaping of all kinds. It will also cause unintended behavior for external links, and possibly some weirdness with headings. In fact the only syntax that *is* the same is headings (mostly), internal wikilinks, free links, interwiki links, paragraphs, lists, and horizontal rules. Of those, only headings, paragraphs, and lists are likely to be common constructs that can easily be translated from XHTML: free links (text the same as target) are rare in manually-written XHTML, horizontal rules are fairly rare generally, and both internal and interwiki links would not be easy to discern from external links for an automated tool. You're probably going to get a mess if you try this.
I know that there are some tools to make a rough job at converting XHTML to MediaWiki markup, although it's never going to be very reliable or consistent. I can't seem to find them in a couple minutes' Googling, unfortunately.
On Wed, Feb 27, 2008 at 10:31 PM, Simetrical Simetrical+wikilist@gmail.com wrote:
On Wed, Feb 27, 2008 at 2:24 PM, Dirk Riehle dirk@riehle.org wrote:
(Wiki Creole is a subset of MediaWiki syntax; it might be sufficient for what your XHTML files express---you probably don't have templates etc.)
I know that there are some tools to make a rough job at converting XHTML to MediaWiki markup, although it's never going to be very reliable or consistent. I can't seem to find them in a couple minutes' Googling, unfortunately.
Can OpenOffice be run with shell parameters? It should be able to read XHTML, and it exports MediaWiki AFAIK.
Magnus
* Magnus Manske magnusmanske@googlemail.com [2008-02-29 11:05]:
On Wed, Feb 27, 2008 at 10:31 PM, Simetrical Simetrical+wikilist@gmail.com wrote:
On Wed, Feb 27, 2008 at 2:24 PM, Dirk Riehle dirk@riehle.org wrote:
(Wiki Creole is a subset of MediaWiki syntax; it might be sufficient for what your XHTML files express---you probably don't have templates etc.)
I know that there are some tools to make a rough job at converting XHTML to MediaWiki markup, although it's never going to be very reliable or consistent. I can't seem to find them in a couple minutes' Googling, unfortunately.
Can OpenOffice be run with shell parameters? It should be able to read XHTML, and it exports MediaWiki AFAIK.
It can [1] and I was going to suggest exactly the same, until I looked at the output of the export. In my limited tests, it didn't handle nested lists at all (OpenOffice v2.3 from Ubuntu Gutsy), so it's not a solution right now, but could be a nice thing in the future.
Thomas
[1] Shell alone doesn't work, you need to script OOo with e.g. Python, but it's definitely possible.
wikitech-l@lists.wikimedia.org