Re: [Wikitech-l] Parser implementaton for MediaWiki syntax

23 Sep 2010

      Op 23 sep 2010, om 14:47 heeft Andreas Jonsson het volgende geschreven:
...
2010-09-23 14:17, Krinkle skrev:
...
Op 23 sep 2010, om 14:14 heeft Andreas Jonsson het volgende  
geschreven:
...
2010-09-23 11:34, Bryan Tong Minh skrev:
...
Hi,
Pretty awesome work you've done!
On Thu, Sep 23, 2010 at 11:27 AM, Andreas Jonsson
andreas.jonsson@kreablo.se   wrote:
...
I think that this demonstrates the feasability of replacing the
MediaWiki parser.  There is still a lot of work to do in order to
turn
it into a full replacement, however.
Have you already tried to run the parsertests that come with
MediaWiki? Do they produce (roughly) the same output as with the  
PHP
parser?
No, I haven't.  I have produced my own set of unit tests that are
based on the original parser.  For the features that I have
implemented, the output should be roughly the same under "normal"
circumstances.
But the original parser have tons of border cases where the behavior
is not very well defined.  For instance, the table on the test page
will render very differently with the original parser (it will
actually turn into two separate tables).
I am employing a consistent and easily understood strategy for
handling html intermixed with wikitext markup; it is easy to explain
that the |} token is disabled in the context of an html-table.   
There
is no such simple explanation for the behavior of the original  
parser,
even though in this particular example the produced html code  
happens
to be valid (which isn't always the case).
So, what I'm trying to say is that for the border cases where my
implementation differs from the original, the behavior of my parser
should be considered the correct one. :-)
/Andreas

Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Hm...
Depending on how 'edge' those edge cases are, and on how much they  
are
known.
Doing that would may render it unusable for established wikis and
would never become the default anytime soon, right ?
We are talking about the edge cases that arise when intermixing
wikitext and html code in "creative" ways.  This is for instance ok
with the original parser:

item 1 <li> item 2
item 3

That may seem harmless and easy to handle, but suprise!  explicitly
adding the </li> token doesn't work as expected:

item 1 <li> item 2 </li>
item 3

And what happens when you add a new html list inside a wikitext list
item without closing it?

item 1 <ul><li> item 2
item 3

Which list should item 3 belong to?  You can can come up with
thousands of situations like this, and without a consistent plan on
how to handle them, you will need to add thousands of border cases to
the code to handle them all.
I have avoided this by simply disabling all html block tokens inside
wikitext list items.  Of course, it may be that someone is actually
relying on being able to mix in this way, but it doesn't seem likely
as the result tends to be strange.
/Andreas
I agree that making in consistant is important and will only cause  
good things (such as people getting used to behaviour and being able  
to predict what something would logically do).
About the html in wikitext mixup: Although not directly, it is most  
certainly done indirectly.
Imagine a template which is consists of a table in wikitext. A certain  
parameters value is outputted in a table cel.
On some page that template is called and the parameter is filled with  
the help of a parser function (like #if or #expr).
To avoid mess and escape templates, the table inside this table cell  
is build from there in HTML in a lot of cases instead of wiki text  
(pipe problem, think {{!}})
Result is html table in wikitext table.
Or for example the thing with whitespace and parser functions /  
template parameters. Starting something like a table or list requires  
the block level hack (like <br /> or <div></div> after the pipe, and  
then the {| table |} or *list on the next time). To avoid those in  
complex templates often HTML is used.
If that template would be called on a page with an already existing  
wikitext list in place there would be a html list inside a wikitext  
list.
I dont know in which order the parser works, but I think if the  
behaviour changes of that lots of complicated templates will break,  
and not just on Wikimedia projects.
--
Krinkle

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Parser implementaton for MediaWiki syntax