Chad wrote:
To elaborate on the final point. Sometimes the parser is changed and
it breaks output on purpose. Case in point was when Tim rewrote the
preprocessor. Some parts of syntax were intentionally changed. You'd
have to establish a new baseline for this new behavior at that point.
This also comes down to the fact that we don't have a formal grammar
for wikisyntax (basically it's whatever the Parser says it is at any given
time). This makes testing the parser hard--we can only give it input and
expected output, there's no standard to check against.
Finally, I don't think we need to dump all of enwiki. It can't require that
much content to describe the various combinations of wiki syntax...
In principle, I rather like the idea of using the entire English
Wikipedia (or why limit to that? we have plenty of other projects too)
as a parser test, or at least of having the ability to do that if we want.
You see, the flip side to not having a formal grammar for wikimarkup is
that we also don't have a spec sheet for it: the best description of how
people actually expect the parser to behave and what features they
expect it to support is what they're actually using it for on their
wikis. And en.wikipedia is the biggest and ugliest of the bunch.
There's no way we can ever write a test suite comprehensive enough to
cover every single feature, bug, quirk and coincidence that actual wiki
pages and templates may have come to rely on. That's simply because for
every MediaWiki coder there are dozens or hundreds of template writers
and thousands of other editors.
In a way, all those editors form the biggest, most thorough fuzz tester
there can be. The only problem is that it's also a rather inefficient
one, even for a fuzz tester: most wiki pages exercise only a fairly
small and boring set of parser features. But at least, if one were to,
say, run a random sample of a few thousand Wikipedia pages through the
parser and observe no unexpected changes in the output, one could start
to make some statistical predictions about how many of the remaining
pages one could at worst expect to break.
The real problem, as noted elsewhere in the thread, is of course
filtering the unexpected changes from any expected ones. A partial
solution could be having the test implementation extract the changes --
we conveniently have a word-level diff implementation available already
-- and combining any duplicates.
Another, complementary approach would be to allow the person running the
tests to postprocess the two outputs before they're compared, so as to
try and eliminate any expected differences. Of course, this would
require some significant extra effort on the part of that person, beyond
just typing "php runSomeTests.php" and hitting enter, but then again,
throughly analyzing the effects of a major parser change is a nontrivial
exercise anyway, no matter what. And for things that _shouldn't_ cause
any changes to the parser output, it really could be just as easy, in
principle at least, as running parserTests currently is.
--
Ilmari Karonen