Since we're on that topic again :-) I'd like to announce that I've added a script to my wiki2xml package (svn: wiki2xml/php) that runs the MediaWiki parser tests on it. At first glance, there are many errors, but at closer view, the XML is actually pretty good in most cases; just my XML-to-XHTML script is not entirely up to the task yet.
Also, the "expected results" in the parser tests are sometimes rather MediaWiki-specific. Does it matter if there's a space after <li>? It's not rendered anyway. Or, "X\nY" vs. "X Y" in HTML - no difference, AFAIK (except in <pre>). These "non-errors" make up quite a few "wrong" results in my tests.
Cheers, Magnus
On 15/11/2007, Magnus Manske magnusmanske@googlemail.com wrote:
Also, the "expected results" in the parser tests are sometimes rather MediaWiki-specific. Does it matter if there's a space after <li>? It's not rendered anyway. Or, "X\nY" vs. "X Y" in HTML - no difference, AFAIK (except in <pre>). These "non-errors" make up quite a few "wrong" results in my tests.
Is there any reason not to fix the test in a case such as this?
- d.
Is there any reason not to fix the test in a case such as this?
Fixing the test means updating the Parser to match the new "correct" test. Other than that, I don't see a reason not to...
-- Jim R. Wilson (jimbojw)
On Nov 15, 2007 9:29 AM, David Gerard dgerard@gmail.com wrote:
On 15/11/2007, Magnus Manske magnusmanske@googlemail.com wrote:
Also, the "expected results" in the parser tests are sometimes rather MediaWiki-specific. Does it matter if there's a space after <li>? It's not rendered anyway. Or, "X\nY" vs. "X Y" in HTML - no difference, AFAIK (except in <pre>). These "non-errors" make up quite a few "wrong" results in my tests.
Is there any reason not to fix the test in a case such as this?
- d.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 11/16/07, David Gerard dgerard@gmail.com wrote:
Is there any reason not to fix the test in a case such as this?
Sounds like what needs to happen is the output of MediaWiki needs to go through some XHTML normalising program, and *that* become the test case. If two outputs are equivalent, it doesn't make sense to treat one of them as a failure.
Steve
On 11/15/07, Steve Bennett stevagewp@gmail.com wrote:
Sounds like what needs to happen is the output of MediaWiki needs to go through some XHTML normalising program, and *that* become the test case. If two outputs are equivalent, it doesn't make sense to treat one of them as a failure.
Yep. You need to be careful, though, with things like <pre>.
On 11/17/07, Simetrical Simetrical+wikilist@gmail.com wrote:
Yep. You need to be careful, though, with things like <pre>.
Yep. And <nowiki> and <html>. I don't see why an "XHTML normalising program" would touch the contents of <pre> though.
Steve
On 11/16/07, Steve Bennett stevagewp@gmail.com wrote:
Yep. And <nowiki> and <html>. I don't see why an "XHTML normalising program" would touch the contents of <pre> though.
Right, so what happens if someone has a snippet of code on a line with an initial space on a wiki page, and the new parser changes the whitespace there? Different rendering. I'm not saying we have to fret about that too much, just keep in mind that XML whitespace compression isn't totally consistent.
On 11/18/07, Simetrical Simetrical+wikilist@gmail.com wrote: ;> Right, so what happens if someone has a snippet of code on a line with
an initial space on a wiki page, and the new parser changes the whitespace there? Different rendering. I'm not saying we have to fret about that too much, just keep in mind that XML whitespace compression isn't totally consistent.
This:
<space>some<space><space>code
is converted to:
<pre> some<space><space>code </pre>
No normaliser is going to mess with the formatting inside a <pre> tag. Of course, I don't even know if such a thing even exists, we might have to write one. I don't think it's hard: basically parse every tag, and write it back out in some defined way, with controlled whitespace.
Steve
On 11/17/07, Steve Bennett stevagewp@gmail.com wrote:
This:
<space>some<space><space>code
is converted to:
<pre> some<space><space>code </pre>
No normaliser is going to mess with the formatting inside a <pre> tag.
I realize, and that's not the issue I was thinking of. But what I *was* thinking of is no issue if there are parser tests checking preservation of whitespace inside <pre> constructs, so it's not really an issue after all.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Why don't you parse the HTML into a DOM and compare that?
On 11/17/07, Edward Z. Yang edwardzyang@thewritingpot.com wrote:
Why don't you parse the HTML into a DOM and compare that?
You still have to normalize the text nodes, so you don't gain much of anything.
Steve Bennett wrote:
On 11/18/07, Simetrical wrote: ;> Right, so what happens if someone has a snippet of code on a line with
an initial space on a wiki page, and the new parser changes the whitespace there? Different rendering. I'm not saying we have to fret about that too much, just keep in mind that XML whitespace compression isn't totally consistent.
This:
<space>some<space><space>code
is converted to:
<pre> some<space><space>code </pre>
No normaliser is going to mess with the formatting inside a <pre> tag. Of course, I don't even know if such a thing even exists, we might have to write one. I don't think it's hard: basically parse every tag, and write it back out in some defined way, with controlled whitespace.
Steve
Defining the tag as xml:space="preserve" should do it.
wikitech-l@lists.wikimedia.org