Don't worry Mingli,
My concern is that the mediawiki dev team should have some plan whatever the parser will parse one time or many times.
It is almost certainly impossible to parse wikitext "one time" - it's too beautifully complex for that.
Someone should push things to progress gradually.
In another 8 to 10 months, someone will try again, there will be a big flareup of activity regarding a standardized, formalized, perfectly context-free mediawiki grammar and subsequent language-agnostic parser. At the end of that struggle and strife, we'll be back here where we started.
I'm not being cynical here (nor am I trying to prematurely instigate another flamewar) - it's just the nature of the problem. A lot of really bright minds have attempted to fit wikitext into a traditional grammar mold. The problem is that it's not a traditional grammar.
My recommendation is to address the actual reason why someone might want a context-free grammar in the first place. Considering how much time and creative energy has been spent on trying to create the one-true-parser, I wonder whether it would be easier to simply port the existing Parser to other languages directly (regular expressions and all). I bet it would be.
-- Jim R. Wilson (jimbojw)
On Mon, Jul 14, 2008 at 9:10 AM, mingli yuan mingli.yuan@gmail.com wrote:
Thansk, Tomaž and David.
My concern is that the mediawiki dev team should have some plan whatever the parser will parse one time or many times. Someone should push things to progress gradually.
Wikimedia projects have been accumulated so huge a repository of knowledge. And these knowledge should be used in a wider circumstances. Could you imagine that wikipedia articles was always bounded with a php regexp parser? Then any formal description of the wikitext is welcomed. We should free the knowledge from its format.
Thanks again.
Regards, Mingli
On Mon, Jul 14, 2008 at 9:01 PM, David Gerard dgerard@gmail.com wrote:
2008/7/14 Tomaž Šolc tomaz.solc@zemanta.com:
- From my observations I believe that the only possible way that any
formal grammar will replace the current PHP parser is if the MediaWiki team is prepared to change the current philosophy of desperately trying to make sense of any kind of broken string of characters the user provides i.e. if MediaWiki could throw up a syntax error on invalid input and/or they significantly reduce the number of valid constructs (all horrible combinations of bold/italics markup come to mind) Given my understanding of the project I find this extremely unlikely. But then I'm not a MediaWiki developer, so I might be completely wrong here.
I suspect it's highly unlikely that we'll ever have a situation where any wikitext will come up with "SYNTAX ERROR" or equivalent. (Some templates on en:wp do something like this for bad parameters, but they try to make the problem reasonably obvious to fix.) Basically, the stuff's gotta work for someone who can't work a computer or think in terms of this actually being a computer language rather than text with markup. I would *guess* that an acceptable failure mode would be just to render the text unprocessed.
The thing to do with particularly problematic "bad" constructs would be to go through the wikitext corpus and see how often they're actually used and how fixable they are.
Remember also third-party users of MediaWiki, who may expect a given bug effect to work as a feature.
d.
d.
Wikitext-l mailing list Wikitext-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitext-l
Wikitext-l mailing list Wikitext-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitext-l