Re: [Wikitext-l] any new progress of the parser?

14 Jul 2008

      Don't worry Mingli,
...
My concern is that the mediawiki dev team should have some plan whatever the
parser will parse one time or many times.
It is almost certainly impossible to parse wikitext "one time" - it's
too beautifully complex for that.
...
Someone should push things to progress gradually.
In another 8 to 10 months, someone will try again, there will be a big
flareup of activity regarding a standardized, formalized, perfectly
context-free mediawiki grammar and subsequent language-agnostic
parser.  At the end of that struggle and strife, we'll be back here
where we started.
I'm not being cynical here (nor am I trying to prematurely instigate
another flamewar) - it's just the nature of the problem.  A lot of
really bright minds have attempted to fit wikitext into a traditional
grammar mold.  The problem is that it's not a traditional grammar.
My recommendation is to address the actual reason why someone might
want a context-free grammar in the first place.  Considering how much
time and creative energy has been spent on trying to create the
one-true-parser, I wonder whether it would be easier to simply port
the existing Parser to other languages directly (regular expressions
and all).  I bet it would be.
-- Jim R. Wilson (jimbojw)
On Mon, Jul 14, 2008 at 9:10 AM, mingli yuan mingli.yuan@gmail.com wrote:
...
Thansk, Tomaž and David.
My concern is that the mediawiki dev team should have some plan whatever the
parser will parse one time or many times.
Someone should push things to progress gradually.
Wikimedia projects have been accumulated so huge a repository of knowledge.
And these knowledge should be used in a  wider circumstances. Could you
imagine that wikipedia articles was always bounded with a php regexp parser?
Then any formal description of the wikitext is welcomed. We should free the
knowledge from its format.
Thanks again.
Regards,
Mingli
On Mon, Jul 14, 2008 at 9:01 PM, David Gerard dgerard@gmail.com wrote:
...
2008/7/14 Tomaž Šolc tomaz.solc@zemanta.com:
...

From my observations I believe that the only possible way that any

formal grammar will replace the current PHP parser is if the MediaWiki
team is prepared to change the current philosophy of desperately trying
to make sense of any kind of broken string of characters the user
provides i.e. if MediaWiki could throw up a syntax error on invalid
input and/or they significantly reduce the number of valid constructs
(all horrible combinations of bold/italics markup come to mind)
Given my understanding of the project I find this extremely unlikely.
But then I'm not a MediaWiki developer, so I might be completely wrong
here.
I suspect it's highly unlikely that we'll ever have a situation where
any wikitext will come up with "SYNTAX ERROR" or equivalent. (Some
templates on en:wp do something like this for bad parameters, but they
try to make the problem reasonably obvious to fix.) Basically, the
stuff's gotta work for someone who can't work a computer or think in
terms of this actually being a computer language rather than text with
markup. I would *guess* that an acceptable failure mode would be just
to render the text unprocessed.
The thing to do with particularly problematic "bad" constructs would
be to go through the wikitext corpus and see how often they're
actually used and how fixable they are.
Remember also third-party users of MediaWiki, who may expect a given
bug effect to work as a feature.

d.

d.

Wikitext-l mailing list
Wikitext-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitext-l

Wikitext-l mailing list
Wikitext-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitext-l

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Re: [Wikitext-l] any new progress of the parser?