On Tue, Jan 4, 2011 at 1:39 PM, Brion Vibber <brion(a)pobox.com> wrote:
Exactly my point -- spending time tinkering with
sortof-human-readable-but-not-powerful-enough syntax distracts from thinking
about what needs to be *described* in the data... which is the important
thing needed when devising an actual storage or interchange format.
Below is an outline, which I've also posted to mediawiki.org for
further iteration. There's a lot of different moving parts, and I
think one thing that's been difficult about this conversation is that
different people are interested in different parts. I know a lot of
people on this list are already overwhelmed or just sick of this
conversation, so maybe if some of us break off in an on-wiki
discussion, we might actually be able to make some progress without
driving everyone else nuts. Optimistically, we might make some
progress, but the worst case scenario is that we'll at least have
documented many of the issues so that we don't have to start from zero
the next time the topic comes up.
Here's the pieces of the conversation that I'm seeing:
1. Goals: what are we trying to achieve?
* Tool interoperability
** Alternative parsers
** Real-time editing (ala Etherpad)
* Ease of editing raw text
* Ease of structuring the data
* Template language with fewer squirrelly brackets
* What else?
2. Abstract format: regardless of syntax, what are we trying to express?
* Currently, we don't have an abstract format; markup just maps to a
subset of HTML (so perhaps the HTML DOM is our abstract format)
* What subset of HTML do we use?
* What subset of HTML do we need?
* What parts of HTML do we *not* want to allow in any form?
* What parts of HTML do we only want to allow in limited form (e.g.
only safely generated from some abstract format)
* Is the HTML DOM sufficiently abstract, or do we want/need some
intermediate conceptual format?
* Is browser support for XML sufficiently useful to try to rely on that?
* Will it be helpful to expose the abstract format in any way
3. Syntax: what syntax should we store (and expose to users)?
* Should we store some serialization of the abstract format instead of markup?
* Is hand editing of markup a viable long term strategy?
* How important is having something expressible with BNF?
* Is XML viable as an editing format? JSON? YAML?
4. Tools (e.g. WYSIWYG)
* Do our tool options get better if we fix up the abstract format and syntax?
** Wikia WYSIWYG editor
** Magnus Manske's new thing
** Line-by-line editing
....list goes on...
5. Infrastructure: how would one support mucking around with the data?
* Support for per-wiki data formats?
* Support for per-page data formats?
* Support for per-revision data formats?
* Evolve existing syntax with no infrastructure changes?