2010-09-23 14:56, Krinkle skrev:
Op 23 sep 2010, om 14:47 heeft Andreas Jonsson het volgende geschreven:
2010-09-23 14:17, Krinkle skrev:
Op 23 sep 2010, om 14:14 heeft Andreas Jonsson het volgende geschreven:
2010-09-23 11:34, Bryan Tong Minh skrev:
Hi,
Pretty awesome work you've done!
On Thu, Sep 23, 2010 at 11:27 AM, Andreas Jonsson andreas.jonsson@kreablo.se wrote:
I think that this demonstrates the feasability of replacing the MediaWiki parser. There is still a lot of work to do in order to turn it into a full replacement, however.
Have you already tried to run the parsertests that come with MediaWiki? Do they produce (roughly) the same output as with the PHP parser?
No, I haven't. I have produced my own set of unit tests that are based on the original parser. For the features that I have implemented, the output should be roughly the same under "normal" circumstances.
But the original parser have tons of border cases where the behavior is not very well defined. For instance, the table on the test page will render very differently with the original parser (it will actually turn into two separate tables).
I am employing a consistent and easily understood strategy for handling html intermixed with wikitext markup; it is easy to explain that the |} token is disabled in the context of an html-table. There is no such simple explanation for the behavior of the original parser, even though in this particular example the produced html code happens to be valid (which isn't always the case).
So, what I'm trying to say is that for the border cases where my implementation differs from the original, the behavior of my parser should be considered the correct one. :-)
/Andreas
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Hm... Depending on how 'edge' those edge cases are, and on how much they are known. Doing that would may render it unusable for established wikis and would never become the default anytime soon, right ?
We are talking about the edge cases that arise when intermixing wikitext and html code in "creative" ways. This is for instance ok with the original parser:
- item 1<li> item 2
- item 3
That may seem harmless and easy to handle, but suprise! explicitly adding the</li> token doesn't work as expected:
- item 1<li> item 2</li>
- item 3
And what happens when you add a new html list inside a wikitext list item without closing it?
- item 1<ul><li> item 2
- item 3
Which list should item 3 belong to? You can can come up with thousands of situations like this, and without a consistent plan on how to handle them, you will need to add thousands of border cases to the code to handle them all.
I have avoided this by simply disabling all html block tokens inside wikitext list items. Of course, it may be that someone is actually relying on being able to mix in this way, but it doesn't seem likely as the result tends to be strange.
/Andreas
I agree that making in consistant is important and will only cause good things (such as people getting used to behaviour and being able to predict what something would logically do).
About the html in wikitext mixup: Although not directly, it is most certainly done indirectly.
Imagine a template which is consists of a table in wikitext. A certain parameters value is outputted in a table cel. On some page that template is called and the parameter is filled with the help of a parser function (like #if or #expr). To avoid mess and escape templates, the table inside this table cell is build from there in HTML in a lot of cases instead of wiki text (pipe problem, think {{!}})
Result is html table in wikitext table.
Yes, but that is supported by the parser. What isn't supported is mixing tokens from html tables with tokens from a wikitext table. So you have:
<table><td>this is a column inside an html table, and as such, | token and |- token are disabled. However, {| | opens up a wikitext table, which changes the context so that now <td> <tr> and </table> tokens are disabled. But it is still possible to once again <table><td> open up a html table and thus the context is switched so that the |} token is disabled. </table> |} </table>
And here we're back to an ordinary paragraph.
Or for example the thing with whitespace and parser functions / template parameters. Starting something like a table or list requires the block level hack (like<br /> or<div></div> after the pipe, and then the {| table |} or *list on the next time). To avoid those in complex templates often HTML is used. If that template would be called on a page with an already existing wikitext list in place there would be a html list inside a wikitext list.
A feasible alternative is to parse these as inline block elements inside wikitext list elements, which I'm already doing for image links with caption. But I think that it is preferable to just disable them.
I dont know in which order the parser works, but I think if the behaviour changes of that lots of complicated templates will break, and not just on Wikimedia projects.
That's possible, but I believe that the set of broken templates can be limited to a great extent. To deploy a new parser on an existing site, one would need a tool that walks the existing pages and warns about suspected problems.
/Andreas