On 29 December 2010 02:07, Happy-melon happy-melon@live.com wrote:
There are some things that we know:
- as Brion says, MediaWiki currently only presents content in one way: as
wikitext run through the parser. He may well be right that there is a bigger fish which could be caught than WYSIWYG editing by saying that MW should present data in other new and exciting ways, but that's actually a separate question. *If* you wish to solve WYSIWYG editing, your baseline is wikitext and the parser.
Specifically, it only presents content as HTML. It's not really a parser because it doesn't create an AST (Abstract Syntax Tree). It's a wikitext to HTML converter. The flavour of the HTML can be somewhat modulated by the skin but it could never output directly to something totally different like RTF or PDF.
- "guacamole" is one of the more unusual descriptors I've heard for the
parser, but it's far from the worst. We all agree that it's horribly messy and most developers treat it like either a sleeping dragon or a *very* grumpy neighbour. I'd say that the two biggest problems with it are that a) it's buried so deep in the codebase that literally the only way to get your wikitext parsed is to fire up the whole of the rest of MediaWiki around it to give it somewhere comfy to live in,
I have started to advocate the isolation of the parser from the rest of the innards or MediaWiki for just this reason: https://bugzilla.wikimedia.org/show_bug.cgi?id=25984
Free it up so that anybody can embed it in their code and get exactly the same rendering that Wikipedia et al get, guaranteed.
We have to find all the edges where the parser calls other parts of MediaWiki and all the edges where other parts of MediaWiki call the parser. We then define these edges as interfaces so that we can drop an alternative parser into MediaWiki and drop the current parser into say an offline viewer or whatever.
With a freed up parser more people will hack on it, more people will come to grok it and come up with strategies to address some of its problems. It should also be a boon for unit testing.
(I have a very rough prototype working by the way with lots of stub classes)
and b) there is as David says no way of explaining what it's supposed to be doing except saying "follow the code; whatever it does is what it's supposed to do". It seems to be generally accepted that it is *impossible* to represent everything the parser does in any standard grammar.
I've thought a lot about this too. It certainly is not any type of standard grammar. But on the other hand it is a pretty common kind of nonstandard grammar. I call it a "recursive text replacement grammar".
Perhaps this type of grammar has some useful characteristics we can discover and document. It may be possible to follow the code flow and document each text replacement in sequence as a kind of parser spec rather than trying and failing again to shoehorn it into a standard LALR grammar.
If it is possible to extract such a spec it would then be possible to implement it in other languages.
Some research may even find that is possible to transform such a grammar deterministically into an LALR grammar...
But even if not I'm certain it would demysitfy what happens in the parser so that problems and edge cases would be easier to locate.
Andrew Dunbar (hippietrail)
Those are all standard gripes, and nothing new or exciting. There are also, to quote a much-abused former world leader, some known unknowns:
- we don't know how to explain What You See when you parse wikitext except
by prodding an exceedingly grumpy hundred thousand lines of PHP and *asking What it thinks* You Get.
- We don't know how to create a WYSIWYG editor for wikitext.
Now, I'd say we have some unknown unknowns.
- *is* it because of wikitext's idiosyncracies that WYSIWYG is so
difficult? Is wikitext *by its nature* not amenable to WYSIWYG editing?
- would a wikitext which *was* representable in a standard grammar be
amenable to WYSIWYG editing?
- would a wikitext which had an alternative parser, one that was not buried
in the depths of MW (perhaps a full JS library that could be called in real-time on the client), be amenable to WYSIWYG editing?
- are questions 2 and 3 synonymous?
--HM
"David Gerard" dgerard@gmail.com wrote in message news:AANLkTimthUx-UndO1CTnexcRqbPP89t2M-PVhA6FkFp8@mail.gmail.com...
[crossposted to foundation-l and wikitech-l]
"There has to be a vision though, of something better. Maybe something that is an actual wiki, quick and easy, rather than the template coding hell Wikipedia's turned into." - something Fred Bauder just said on wikien-l.
Our current markup is one of our biggest barriers to participation.
AIUI, edit rates are about half what they were in 2005, even as our fame has gone from "popular" through "famous" to "part of the structure of the world." I submit that this is not a good or healthy thing in any way and needs fixing.
People who can handle wikitext really just do not understand how offputting the computer guacamole is to people who can cope with text they can see.
We know this is a problem; WYSIWYG that works is something that's been wanted here forever. There are various hideous technical nightmares in its way, that make this a big and hairy problem, of the sort where the hair has hair.
However, I submit that it's important enough we need to attack it with actual resources anyway.
This is just one data point, where a Canadian government office got *EIGHT TIMES* the participation in their intranet wiki by putting in a (heavily locally patched) copy of FCKeditor:
http://lists.wikimedia.org/pipermail/mediawiki-l/2010-May/034062.html
"I have to disagree with you given my experience. In one government department where MediaWiki was installed we saw the active user base spike from about 1000 users to about 8000 users within a month of having enabled FCKeditor. FCKeditor definitely has it's warts, but it very closely matches the experience non-technical people have gotten used to while using Word or WordPerfect. Leveraging skills people already have cuts down on training costs and allows them to be productive almost immediately."
http://lists.wikimedia.org/pipermail/mediawiki-l/2010-May/034071.html
"Since a plethora of intelligent people with no desire to learn WikiCode can now add content, the quality of posts has been in line with the adoption of wiki use by these people. Thus one would say it has gone up.
"In the beginning there were some hard core users that learned WikiCode, for the most part they have indicated that when the WYSIWYG fails, they are able to switch to WikiCode mode to address the problem. This usually occurs with complex table nesting which is something that few of the users do anyways. Most document layouts are kept simple. Additionally, we have a multilingual english/french wiki. As a result the browser spell-check is insufficient for the most part (not to mention it has issues with WikiCode). To address this a second spellcheck button was added to the interface so that both english and french spellcheck could be available within the same interface (via aspell backend)."
So, the payoffs could be ridiculously huge: eight times the number of smart and knowledgeable people even being able to *fix typos* on material they care about.
Here are some problems. (Off the top of my head; please do add more, all you can think of.)
- The problem:
- Fidelity with the existing body of wikitext. No conversion flag day.
The current body exploits every possible edge case in the regular expression guacamole we call a "parser". Tim said a few years ago that any solution has to account for the existing body of text.
- Two-way fidelity. Those who know wikitext will demand to keep it and
will bitterly resist any attempt to take it away from them.
FCKeditor (now CKeditor) in MediaWiki is all but unmaintained.
There is no specification for wikitext. Well, there almost is -
compiled as C, it runs a bit slower than the existing PHP compiler. But it's a start! http://lists.wikimedia.org/pipermail/wikitext-l/2010-August/000318.html
- Attempting to solve it:
- The best brains around Wikipedia, MediaWiki and WMF have dashed
their foreheads against this problem for at least the past five years and have got *nowhere*. Tim has a whole section in the SVN repository for "new parser attempts". Sheer brilliance isn't going to solve this one.
- Tim doesn't scale. Most of our other technical people don't scale.
*We have no resources and still run on almost nothing*.
($14m might sound like enough money to run a popular website, but for comparison, I work as a sysadmin at a tiny, tiny publishing company with more money and staff just in our department than that to do *almost nothing* compared to what WMF achieves. WMF is an INCREDIBLY efficient organisation.)
- Other attempts:
- Starting from a clear field makes it ridiculously easy. The
government example quoted above is one. Wikia wrote a good WYSIWYG that works really nicely on new wikis (I'm speaking here as an experienced wikitext user who happily fixes random typos on Wikia). Of course, I noted that we can't start from a clear field - we have an existing body of wikitext.
So, specification of the problem:
- We need good WYSIWYG. The government example suggests that a simple
word-processor-like interface would be enough to give tremendous results.
- It needs two-way fidelity with almost all existing wikitext.
- We can't throw away existing wikitext, much as we'd love to.
- It's going to cost money in programming the WYSIWYG.
- It's going to cost money in rationalising existing wikitext so that
the most unfeasible formations can be shunted off to legacy for chewing on.
- It's going to cost money in usability testing and so on.
- It's going to cost money for all sorts of things I haven't even
thought of yet.
This is a problem that would pay off hugely to solve, and that will take actual money thrown at it.
How would you attack this problem, given actual resources for grunt work?
- d.
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l