Hi All,
The source code to a MediaWiki Parser fuzz-tester is now available online at: http://files.nickj.org/MediaWiki/wiki-mangleme.phps
Some of the problems it has found are listed at: http://nickj.org/MediaWiki
With MediaWiki 1.6.5, the breakdown of stuff I'm currently seeing that messes up the flow of tags (as opposed to just failing HTML validation) is roughly: * 50% Table-Of-Contents insertion ( See Parser14 test at above URL) * 30% double links ( See Parser22 test at above URL) * 20% is <nowiki> or <pre> insertion + multi-line URLs ( See Parser20 and Parser23 tests respectively ).
All the best, Nick.
Nick Jenkins schrieb:
Hi All,
The source code to a MediaWiki Parser fuzz-tester is now available online at: http://files.nickj.org/MediaWiki/wiki-mangleme.phps
Some of the problems it has found are listed at: http://nickj.org/MediaWiki
Nice! After hearing about fuzzing on the last CCC in Berlin, I did wonder if this could be a parser test. Glad someone finally implemented it, and the results show it was worth the effort.
Just for fun (and to boast of course;-) I ran some of the examples through my wiki-to-XML parser [1], and it seems impervious to them (the ones I've tested, anyway), as it renders invalid XML and wiki markup as plain text. Maybe I should adapt the fuzzer to do testing automatically, just to be sure (and maybe find some bugs).
Magnus
Just for fun (and to boast of course;-) I ran some of the examples through my wiki-to-XML parser [1], and it seems impervious to them (the ones I've tested, anyway), as it renders invalid XML and wiki markup as plain text. Maybe I should adapt the fuzzer to do testing automatically, just to be sure (and maybe find some bugs).
Here's a small simple test script, which when dropped into the w2x/php directory and run, will pass it fuzzed wiki text, and get w2x to generate XHTML output: http://files.nickj.org/MediaWiki/w2x-test.phps
If desired, the "$_POST['text'] =" bit could be updated to read the contents of the files generated by the fuzzer, and run w2x sequentially over all the generated files.
W2x takes a while to generate the XHTML output - the above example takes 120 seconds exactly on a 2 GHz 64-bit Linux box with 2 Gb RAM, for 18 lines / 973 characters of wiki text. (May be quadratic or polynomial run-time?)
It gives this XHTML output: http://files.nickj.org/MediaWiki/w2x-test-output.html , which when validated ( http://validator.w3.org/check?uri=http://files.nickj.org/MediaWiki/w2x-test-... ) gives 2 errors and 1 warning :-(
Of course, I should point out that the current parser does not generate valid XHTML either on fuzzed wiki text, and doing so is a hard problem to solve. Writing software to find problems is much easier ;-)
All the best, Nick.
Nick Jenkins wrote:
Hi All,
The source code to a MediaWiki Parser fuzz-tester is now available online at: http://files.nickj.org/MediaWiki/wiki-mangleme.phps
Some of the problems it has found are listed at: http://nickj.org/MediaWiki
With MediaWiki 1.6.5, the breakdown of stuff I'm currently seeing that messes up the flow of tags (as opposed to just failing HTML validation) is roughly:
- 50% Table-Of-Contents insertion ( See Parser14 test at above URL)
- 30% double links ( See Parser22 test at above URL)
- 20% is <nowiki> or <pre> insertion + multi-line URLs ( See Parser20
and Parser23 tests respectively ).
Hello Nick,
I reviewed your code, made some changes and commited it code in MediaWiki trunk :
http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/maintenance/
cheers,
wikitech-l@lists.wikimedia.org