[Mediawiki-l] What is the current state of content importing/exporting?

Rowan Collins rowan.collins at gmail.com
Sun Aug 14 22:02:35 UTC 2005


On 14/08/05, Matt England <mengland at mengland.net> wrote:

> >1) There is already no need to keep going back to a web page to create
> >the initial import.  [...] create the initial content locally, save a
> >copy in case something goes wrong [which it hasn't yet, but I'm used
> >to Murphy's law], and then copy/paste it into the edit box, and post
> >it.  "Initial revision" is usually my tag for those imports.
> 
> I don't follow this point, unfortunately.  :(

I think this is in response to your comment about preferring to use
another editor for the initial version of content, but then edit in
the wiki later - the idea being that you type Wiki-markup in whatever
editor you prefer, and paste it to the wiki in one big "edit". Of
course, this loses any approximation of wysiwyg formatting which
either your editor or MediaWiki would otherwise provide, so it
probably misses the point to some degree.

> Meanwhile I see lots of other work/references to create wiki-markup
> editors. Seems like they are trying to solve the same problems.  Why not
> just make an external markup language, possibly in the form of a XML-based
> DTD...like DocBook??

Because this doesn't solve the central problems, it just adds a layer
of complexity; the real problems being:
* In terms of export / editting, parsing wikitext reliably; the actual
MediaWiki code for this is an ugly mess of regexes, which output
"correctly" only because they define what the correct output is.
Whether you're trying to output HTML, DocBook, "an external markup
language" (how is wiki-syntax not "external"?) or just an in-memory
representation for WYSIWYG/automated manipulation, you've still got to
parse the real wikitext.
* In terms of import / saving, creating a "sane" piece of wikitext
without sacrificing richness of format - this is of course essential
if the text is to be meaningfully editted "by hand" within the wiki.
For an inline editor, this includes making tools which closely match
the features available to someone typing in wikitext; for a converter,
or a set of macros for something like Word, it means outputting
something which isn't bloated with meta-information which a human
editor would not have put there (i.e. the opposite of MS Office's HTML
exporters).

That said, one of the sub-discussions of the occasional ponderings of
"standard wiki markup" has been the idea of a "Wiki interchange
format" with importers and exporters available for all markup
variants; this would at least mean someone could write a
wiki-software-agnostic "Word<->wiki converter" or whatever.


So much for what not to do, what *should* we/you/anyone do?

My own approach would probably be to look at some existing and
relatively straight-forward format such as RTF (or, I guess, HTML,
although that probably covers a wider range of actual formattings),
and attempt to write a "simplifying converter" - where markup has
become bloated by export from a feature-rich editor, extract the
general gist (e.g. "this is a heading") and create appropriate markup
for that in Wikitext. While not ideal, a lossy converter like this
would probably be fine for "initial import" conversion - you write the
code in Word, export it mostly in tact (via RTF, or maybe just through
macros) to wikitext, and then tidy it up within the wiki.

If you want people to be able to repeatedly "check out" the document
into some other editor, it seems more sensible to me to essentially
make your own editor - even if that takes the form of a customised MS
Word which can only do wiki-like things, so there's no risk of loss on
conversion. Taking that view leads to the question of what you want to
offer with this editor - familiarity? WYSIWYGness? particular
features? - and thus to something which may not resemble
"import/export" after all...

Meanwhile, though, there are already things which people have done,
including various wikitext parsers, a few reasonably operable editors,
and even a few attempts at exactly the kind of importer being
discussed. Pages to explore before heading out on your own:
* http://meta.wikimedia.org/wiki/Category:MediaWiki_tools
* http://en.wikipedia.org/wiki/Wikipedia:Tools
* http://meta.wikimedia.org/wiki/Alternative_parsers

As ever, it seems there is scattered information which it would be
very useful to collect together...

-- 
Rowan Collins BSc
[IMSoP]



More information about the MediaWiki-l mailing list