On Tue, Jan 4, 2011 at 1:39 PM, Brion Vibber brion@pobox.com wrote:
Exactly my point -- spending time tinkering with sortof-human-readable-but-not-powerful-enough syntax distracts from thinking about what needs to be *described* in the data... which is the important thing needed when devising an actual storage or interchange format.
Below is an outline, which I've also posted to mediawiki.org[1] for further iteration. There's a lot of different moving parts, and I think one thing that's been difficult about this conversation is that different people are interested in different parts. I know a lot of people on this list are already overwhelmed or just sick of this conversation, so maybe if some of us break off in an on-wiki discussion, we might actually be able to make some progress without driving everyone else nuts. Optimistically, we might make some progress, but the worst case scenario is that we'll at least have documented many of the issues so that we don't have to start from zero the next time the topic comes up.
Here's the pieces of the conversation that I'm seeing: 1. Goals: what are we trying to achieve? * Tool interoperability ** Alternative parsers ** GUIs ** Real-time editing (ala Etherpad) * Ease of editing raw text * Ease of structuring the data * Template language with fewer squirrelly brackets * Performance * Security * What else?
2. Abstract format: regardless of syntax, what are we trying to express? * Currently, we don't have an abstract format; markup just maps to a subset of HTML (so perhaps the HTML DOM is our abstract format) * What subset of HTML do we use? * What subset of HTML do we need? * What parts of HTML do we *not* want to allow in any form? * What parts of HTML do we only want to allow in limited form (e.g. only safely generated from some abstract format) * Is the HTML DOM sufficiently abstract, or do we want/need some intermediate conceptual format? * Is browser support for XML sufficiently useful to try to rely on that? * Will it be helpful to expose the abstract format in any way
3. Syntax: what syntax should we store (and expose to users)? * Should we store some serialization of the abstract format instead of markup? * Is hand editing of markup a viable long term strategy? * How important is having something expressible with BNF? * Is XML viable as an editing format? JSON? YAML?
4. Tools (e.g. WYSIWYG) * Do our tool options get better if we fix up the abstract format and syntax? * Tools: ** Wikia WYSIWYG editor ** Magnus Manske's new thing ** Line-by-line editing ....list goes on...
5. Infrastructure: how would one support mucking around with the data? * Support for per-wiki data formats? * Support for per-page data formats? * Support for per-revision data formats? * Evolve existing syntax with no infrastructure changes?
[1] http://www.mediawiki.org/wiki/User:RobLa-WMF/2011-01_format_discussion