On Jan 25, 2008 6:44 AM, Tim Starling <tstarling(a)wikimedia.org> wrote:
HTML+CSS is a well-specified format which aims to
support output to all
media types. It separates structure from presentation and provides for
semantic annotation. There is a lot more content available in HTML+CSS
than in any of the wikitext markup languages. That's why I wish research
efforts were focused on analysis and conversion of this common language.
Why would you want to convert directly from one restricted subset of HTML
to an even more restricted subset? Why not improve annotation of
MediaWiki's HTML output to make it more reuseable, and produce an HTML to
MediaWiki wikitext converter?
Because for some purposes (e.g., a WYSIWYG editor), conversion both
ways needs to be lossless, which it almost certainly won't be. You
could do very verbose comments, but those immediately break when the
user actually edits something (in the WYSIWYG case). If MediaWiki had
been designed from the beginning to internally store almost everything
as HTML, it would be at least conceivably doable to have a JavaScript
HTML-to-wikitext converter that would be lossless, given appropriate
annotation of the HTML (for templates, images, extensions, etc.).
With wikitext, that's almost certainly impossible, so you have to just
throw something together and hope it works well enough in most cases.
WYSIWYG is really the main motivation I see behind constructing a
well-defined format of any kind. It would be nice if we could
interoperate a little better with third parties, but I haven't seen
any compelling application that needs this, and couldn't just use the
HTML with maybe some comments to indicate templates. The compelling
utility of interoperability is not with third parties, but with
clients -- our own users' web browsers.
Unfortunately, this is probably not ever going to happen. Or at least
not for a long, long time.