On Tue, Dec 28, 2010 at 5:28 PM, Rob Lanphier robla@wikimedia.org wrote:
Let me riff on what you're saying here (partly just to confirm that I understand fully what you're saying). It'd be very cool to have the ability to declare a single article, or probably more helpfully, a single revision of an article to use a completely different syntax.
Yes, though I'd recommend jettisoning the word "syntax" entirely from the discussion at this stage, as I worry it distracts towards bikeshedding about unimportant details.
Rather, it could be more useful to primarily think of data resources having "features" or "structure". With images for instance, we don't make people pay too much attention about whether something's in JPEG, PNG, GIF, or SVG format.
At the level of actual people working with the system, the file's *format* is completely unimportant -- only its features and metadata are relevant. Set a size, give a caption, specify a page if it's a paged format, or a time if it's a video format. Is it TIFF or PDF? Ogg Theora or WebM? Don't know, don't care, and any time a user has to worry about it we've let them down.
We need to think about similarly concentrating on document structure rather than markup syntax for text pages.
I definitely agree that the idea of progressively moving bits and pieces in that direction is a wise one. If we can devise a *document structure* that lets us embed magic templatey _things_ into a paragraph-oriented-text document and maintain their structural identity all the way to browser-ready HTML and back, then we can have a useful migration path:
* identify possibly unsafe uses of templates, extensions, and parserfunctions (machines are great at this!) * clean them up bit by bit (bots are often good at many common cases) * once a page can be confirmed as not using Weird Template Magic, but only using templates/images/plugins that fit within the structure, it's golden. * depending on which flavor of overlords we have, we might have various ways of enforcing that a page will always *remain* well-structured from then on.
That might not even involve changing syntax per se -- we shouldn't care too much about whether italic is <i> or ''. But knowing where a table or a div block starts and ends reliably is extremely important to being able to tell which part of your document is which.
And heck, even if not everything gets fixed along that kind of path, just being able to *have* pages and other resource types that *are* well-structured mixed into the system is going to be hugely useful for the non-Wikipedia projects.
-- brion vibber (brion @ pobox.com)