Not only is the current markup a barrier to participation, it's a barrier to development. As I argued on Wikien-l, starting over with a markup that can be syntacticly validated, preferably one that is XML based would reap huge rewards in the safety and effectiveness of automated tools - authors of tools like AWB have just as much trouble making software handle the corner cases in wikitext markup as new editors have understanding it.
It's no wonder really, as the current markup is the result of bolting on feature after feature, often to handle various corner cases, basically a really undisciplined approach to development. I honestly wouldn't be surprised if there were security issues waiting to be found in the parser, because of the way it's been thrown together.
The switching cost to a new markup and the development cost for an editor for it would probably be comparable to that of writing a WYSIWYG editor that understands the current one. A new markup gives us some huge advantages. A new parser could potentially handle different output formats (print, web, mobile, pdf, etc) seamlessly, could largely eliminate accessibility issues that currently are created from ad-hoc formatting capabilities, and would enable tools that actually understand the markup to start being employed. Making that parser XML based would mean we can reuse the existing ecosystem of libraries and tools for manipulating and parsing XML text, and would have a truly seamless transition into any future markup changes, thanks to XSLT.
XML also gives us the benefit of being able to do validation, possibly at the time an edit is saved, allowing us to stop broken markup, rather than having to fix it later, and it allows us to completely remove presentation from the hands of casual editors - the presentation of articles could be controlled at site level, with existing consensus processes similar to that used to change major templates being required to get changes to the site templates (the article template basically becomes part of the interface at this stage.)
Transition will be ugly regardless of whether we keep the current parser or replace it. There are a lot of complex features in the current markup that need to go because a WYSIWYG editor wouldn't understand them, and it would also be ideal going forward to "flatten" the formatting capabilities into a subset that assures consistency - ad-hoc HTML and CSS formatting would likely have to go.
I honestly think WYSIWYM is a more realistic target once the more problematic features in the current markup are gone. The editing experience is largely the same, with the key difference being that what you see in the editor doesn't have to look exactly like what you see in the rendered article. By aiming for WYSIWYM, some things would render in the editor in a way that makes them easier to understand and edit. For example, templates could render in the editor as tables or as a block that loads the template parameters into a sidebar when clicked. The same could be done for references. This has a very shallow learning curve, and a tremendous advantage over WYSIWYG in that elements that don't lend well to editing in a WYSIWYG environment are presented in the manner easiest to edit, rather than in the manner in which they appear. It may take one or two "second looks" to figure out what's happening, but after that, it's smooth sailing, and by doing it this way, we avoid the downsides of editing complicated parts of a page with a minimum cost of initial confusion. WYSIWYM editors can be friendly to both experienced and new users alike - take LyX as a good example of such an editor - being WYSIWYM, things that are naturally complex and unwieldy in WYSIWYG mode become easy because the interface is built to provide a visual understanding of what is going on rather than 1:1 fidelity with the final document - as a result, you spend more time editing and less time worrying about pixel perfect formatting, because you can trust that the underlying formatting engine will handle things right when you do go to render the finished document.
As far as current markup goes, a creative solution would be a fork of the parser into three parts, with a corresponding fork in namespaces as well. The resulting parts would be an article parser, a template parser, and a parser for a new layout namespace. Initially, the existing parser is lumped into the article parser, and class inheritance is used so that the template parser inherits all markup from the article parser, and in turn, the layout parser inherits all markup from the template parser. (I'll get more into layouts below.) This gives us a framework to do several things that improve usability and consistency. Once the initial "split" is set up, we begin to move markup features from the article parser into the template and layout parsers to remove ugly markup from general use, and to restrict formatting capabilities to a subset that both allows a WYSIWYG or WYSIWYM editor to work correctly, and allows some level of consistency to be enforced at a site level. Layouts would be a new form of template, designed to apply as a block-level outline to an article, providing both a framework to build a particular type of article, and defining the formatting for that article in a manner that templates and article markup would no longer be permitted to do. It's likely that layouts would be treated like highly used templates and the interface itself, with the ability to create and change a layout restricted by a permission bit. Layouts would be one to an article, so the interface to select one would probably be just selecting it from a dropdown or typing it's name. Every layout would have at least an article body, and would have one or more additional "blocks" defined - so for basics you could have the default layout (a flat article), an article with infobox layout, and a list layout. The end result of this solution is that ad-hoc formatting using HTML and CSS is gone as far as most editors are concerned, complicated or easily misused markup features that might cause problems are removed from the parser that most users will interact with, and the most problematic of markup is effectively reserved for experienced editors. A new interface can then be built at a fraction of the development cost, because the "hard stuff" is out of article space. This both makes sense for usability now, as well as makes sense as a possible first step if we are ever going to change parsers, because the things that we probably couldn't convert would be "contained" rather than spread across articles.
-Steph
David Gerard writes;
Our current markup is one of our biggest barriers to participation.
Yes.
- Starting from a clear field makes it ridiculously easy.
We could start with solutions for first-time posters, new articles, and new talk-page comments -- any comprehensive solution should be compatible with short-term solutions that solve this 'ridiculously easy' part -- which happens to address what many first-time editors need.
Stephanie Daugherty writes:
Not only is the current markup a barrier to participation, it's a barrier to development. As I argued on Wikien-l, starting over with a markup that can
< be syntacticly validated... would reap huge rewards in the safety and effectiveness
of automated tools.
A lack of WYSIWY* is often a barrier to adoption of MediaWiki as opposed to other wiki platforms, independent of whether or not potential editors who visit a MW site feel comfortale editing it. I recall that P2PU for instance wanted to run MW but used pbwiki instead because of its WYSIWYG editor.
By aiming for WYSIWYM, some things would render in the editor in a way that makes them easier to understand and edit. For example, templates could render in the editor as tables or as a block that loads the template
< parameters into a sidebar when clicked... WYSIWYM editors can be friendly
to both experienced and new users alike - take LyX as a good example
Victor writes;
I always viewed wikitext vs. WYSIWYG dilemma as similar to LaTeX vs. Microsoft Word one.
In this context, LyX is a good example; it sees its WYSIWYM implementation as halfway between the two.
Stephanie writes:
Layouts would be a new form of template, designed to apply as a block-level outline to an article, providing both a framework to build a particular type of article, and defining the formatting for that article in a manner that templates and article markup would no longer be permitted to do. It's likely that layouts would be treated like highly used templates and
< the interface itself... one to an article, so the interface to select one would
probably be just selecting it from a dropdown or typing it's name.
I really like the idea of separating article text, local templates, and page-wide layout. I don't know if 'three different paresers' are needed, but just being able to define a stylesheet for a named layout would save time and frustration.
Brion writes:
Getting anything done that would work on the huge, well-developed, wildly-popular Wikipedia has always been a non-starter because it has to deal with 10 years of backwards-compatibility from the get-go. I think it's going to be a *lot* easier to get things going on those smaller projects which are now so poorly served
How do we make it easier to implement new things for individual smaller projects?
For the Wikipedia case, we need to incubate the next generation of templating
Is this a problem space we could tackle in tandem with MindTouch and others who care about simple interfaces to edit and view complex information?
Sam.
On Tue, Dec 28, 2010 at 5:17 PM, Samuel Klein meta.sj@gmail.com wrote:
Stephanie writes:
Layouts would be a new form of template, designed to apply as a block-level outline to an article, providing both a framework to build a particular type of article, and defining the formatting for that article
in
a manner that templates and article markup would no longer be permitted
to
do. It's likely that layouts would be treated like highly used templates
and < the interface itself... one to an article, so the interface to select one would
probably be just selecting it from a dropdown or typing it's name.
I really like the idea of separating article text, local templates, and page-wide layout. I don't know if 'three different paresers' are needed, but just being able to define a stylesheet for a named layout would save time and frustration.
I say 3 different parsers because the effort is as much about restricting what wikitext can be used where as it is about simplifying the templates and layouts, and breaking up the parser in this manner is a straightforward way to keep "ugly" markup out of articlespace. If you make it so the complicated code that would render an editor hopelessly broken can't directly appear in an article, you keep the ability to use that code appropriately by template transclusion. I suggested layouts as an extension of that concept, because with layouts you can make it so HTML/CSS hacks that could negatively affect formatting can be left to a process that is more likely to be tested - there's no reason for every article to have it's own custom CSS positioning and such - those decisions are better done at a site level where they can be done consistently and with consideration for things like screen readers, mobile devices, and print output, something that most editors don't even need to understand.
The point is to "dumb down" the both the required interface and the underlying by only having what you need to write articles available. Advanced editors could continue to develop templates for structured data, and collaborations between experts would do the really dicey stuff to make the site look pretty.
The only (immediate) changes to templates required would be a little extra "glue" to give a WYSIWYM or WYSIWYG editor "hints" about how a template works - whether the template produces a block or an inline element, and what parameters it takes so that the editor can display a nice easy to edit form to fill those parameters while showing the user what they do.
Layouts would likely have a lot more bolted on to them than is currently possible, because you actually want to define a structure for an article, with editable regions - they'd likely be a cross between a stylesheet, a DTD, and a template by the time we got done implementing them.
On 28 December 2010 16:54, Stephanie Daugherty sdaugherty@gmail.com wrote:
Not only is the current markup a barrier to participation, it's a barrier to development. As I argued on Wikien-l, starting over with a markup that can be syntacticly validated, preferably one that is XML based would reap huge rewards in the safety and effectiveness of automated tools - authors of tools like AWB have just as much trouble making software handle the corner cases in wikitext markup as new editors have understanding it.
In every discussion so far, throwing out wikitext and replacing it with something that isn't a crawling horror has been considered a non-starter, given ten years and terabytes of legacy wikitext.
If you think you can swing throwing out wikitext and barring the actual code from human editing - XML is not safely human editable in any circumstances - then good luck to you, but I don't like your chances.
- d.
On Tue, Dec 28, 2010 at 6:43 PM, David Gerard dgerard@gmail.com wrote:
On 28 December 2010 16:54, Stephanie Daugherty sdaugherty@gmail.com wrote:
Not only is the current markup a barrier to participation, it's a barrier
to
development. As I argued on Wikien-l, starting over with a markup that
can
be syntacticly validated, preferably one that is XML based would reap
huge
rewards in the safety and effectiveness of automated tools - authors of tools like AWB have just as much trouble making software handle the
corner
cases in wikitext markup as new editors have understanding it.
In every discussion so far, throwing out wikitext and replacing it with something that isn't a crawling horror has been considered a non-starter, given ten years and terabytes of legacy wikitext.
If you think you can swing throwing out wikitext and barring the actual code from human editing - XML is not safely human editable in any circumstances - then good luck to you, but I don't like your chances.
I'm thinking along the mindset that only "advanced" users would prefer to
directly edit code regardless of the existence of a good WYSIWY* editor, and that validation would be performed as it's saved. A syntax-highlighting and code completing editor would make manual edits of XML code palatable. Also, while XML text isn't that much better for manual edits because of it's verbosity, it's no different than manually editing HTML code, which people successfully manage to do all the time, and it can actually be easier to grok than wikitext because its more generous in where it will allow you to use whitespace, allowing you to use indentation to make the code easy to follow:
<article> <title>Foo</title> <section title="Introduction"> This is the introduction. Maybe you really want the <internal-link article="Sandbox">Sandbox</internal-link> </section> &boilerplate; <!--Some boilerplate text to be transcluded via an entity--!> <category name="Test" /> </article>
Yes, the added verbosity is bad, but if it enables better tools to be used, including an editor that's actually usable by "the rest of us", it's worth it.
On Tue, Dec 28, 2010 at 3:43 PM, David Gerard dgerard@gmail.com wrote:
On 28 December 2010 16:54, Stephanie Daugherty sdaugherty@gmail.com wrote:
Not only is the current markup a barrier to participation, it's a barrier to development. As I argued on Wikien-l, starting over with a markup that can be syntacticly validated, preferably one that is XML based would reap huge rewards in the safety and effectiveness of automated tools - authors of tools like AWB have just as much trouble making software handle the corner cases in wikitext markup as new editors have understanding it.
In every discussion so far, throwing out wikitext and replacing it with something that isn't a crawling horror has been considered a non-starter, given ten years and terabytes of legacy wikitext.
If you think you can swing throwing out wikitext and barring the actual code from human editing - XML is not safely human editable in any circumstances - then good luck to you, but I don't like your chances.
That is true - "We can't do away with Wikitext" always been the intermediate conclusion (in between "My god, we need to do something about this problem" and "This is hopeless, we give up again").
Perhaps it's time to start some exercises in noneuclidian Wiki development, and just assume the opposite and see what happens.
On Tue, Dec 28, 2010 at 7:12 PM, George Herbert george.herbert@gmail.comwrote:
That is true - "We can't do away with Wikitext" always been the intermediate conclusion (in between "My god, we need to do something about this problem" and "This is hopeless, we give up again").
Perhaps it's time to start some exercises in noneuclidian Wiki development, and just assume the opposite and see what happens.
We've got some serious thought going along three lines (with some crossover between them.)
1. Bolt WYSIWY* functionality on top of what we have that will just cordon off what it can't grok and let the user edit the rest. 2. Strip out or restrict the use of some of the current wikitext elements to simplify the markup. 3. Make a fresh start.
All of these are at least remotely feasible with a sustained effort and commitment to implement a finished product. None of these results in an optimal solution, but all of them result in a situation that is way ahead of where we are now. Accepting a bad situation as "the way things have to be" because the alternatives aren't perfect is foolish. Any measurable improvement from where we are now would be just that - an improvement. That's reason enough to go forward with something!!!!
wikimedia-l@lists.wikimedia.org