MediaWiki automated test run failure 2007-04-29

List overview All Threads
Download

newer

older

MediaWiki 1.10.0rc1 release...

Live mirror

brion＠pobox.com

29 Apr 2007 29 Apr '07

12:45 p.m.

An automated run of parserTests.php showed the following failures:

This is MediaWiki version 1.10alpha (r21695).

Reading tests from "maintenance/parserTests.txt"... Reading tests from "extensions/Cite/citeParserTests.txt"... Reading tests from "extensions/Poem/poemParserTests.txt"...

18 still FAILING test(s) :( * URL-encoding in URL functions (single parameter) [Has never passed] * URL-encoding in URL functions (multiple parameters) [Has never passed] * Table security: embedded pipes (http://mail.wikipedia.org/pipermail/wikitech-l/2006-April/034637.html) [Has never passed] * Link containing double-single-quotes '' (bug 4598) [Has never passed] * message transform: <noinclude> in transcluded template (bug 4926) [Has never passed] * message transform: <onlyinclude> in transcluded template (bug 4926) [Has never passed] * BUG 1887, part 2: A <math> with a thumbnail- math enabled [Has never passed] * HTML bullet list, unclosed tags (bug 5497) [Has never passed] * HTML ordered list, unclosed tags (bug 5497) [Has never passed] * HTML nested bullet list, open tags (bug 5497) [Has never passed] * HTML nested ordered list, open tags (bug 5497) [Has never passed] * Fuzz testing: image with bogus manual thumbnail [Introduced between 08-Apr-2007 07:15:22, 1.10alpha (r21099) and 25-Apr-2007 07:15:46, 1.10alpha (r21547)] * Inline HTML vs wiki block nesting [Has never passed] * Mixing markup for italics and bold [Has never passed] * dt/dd/dl test [Has never passed] * Images with the "|" character in the comment [Has never passed] * Parents of subpages, two levels up, without trailing slash or name. [Has never passed] * Parents of subpages, two levels up, with lots of extra trailing slashes. [Has never passed]

Passed 493 of 511 tests (96.48%)... 18 tests failed!

Show replies by date

Tim Starling

29 Apr 29 Apr

1:45 p.m.

brion@pobox.com wrote:

...

An automated run of parserTests.php showed the following failures:

This is MediaWiki version 1.10alpha (r21695).

Reading tests from "maintenance/parserTests.txt"... Reading tests from "extensions/Cite/citeParserTests.txt"... Reading tests from "extensions/Poem/poemParserTests.txt"...

18 still FAILING test(s) :( * URL-encoding in URL functions (single parameter) [Has never passed] * URL-encoding in URL functions (multiple parameters) [Has never passed] * Table security: embedded pipes (http://mail.wikipedia.org/pipermail/wikitech-l/2006-April/034637.html) [Has never passed] * Link containing double-single-quotes '' (bug 4598) [Has never passed] * message transform: <noinclude> in transcluded template (bug 4926) [Has never passed] * message transform: <onlyinclude> in transcluded template (bug 4926) [Has never passed] * BUG 1887, part 2: A <math> with a thumbnail- math enabled [Has never passed] * HTML bullet list, unclosed tags (bug 5497) [Has never passed] * HTML ordered list, unclosed tags (bug 5497) [Has never passed] * HTML nested bullet list, open tags (bug 5497) [Has never passed] * HTML nested ordered list, open tags (bug 5497) [Has never passed] * Fuzz testing: image with bogus manual thumbnail [Introduced between 08-Apr-2007 07:15:22, 1.10alpha (r21099) and 25-Apr-2007 07:15:46, 1.10alpha (r21547)] * Inline HTML vs wiki block nesting [Has never passed] * Mixing markup for italics and bold [Has never passed] * dt/dd/dl test [Has never passed] * Images with the "|" character in the comment [Has never passed] * Parents of subpages, two levels up, without trailing slash or name. [Has never passed] * Parents of subpages, two levels up, with lots of extra trailing slashes. [Has never passed]

Passed 493 of 511 tests (96.48%)... 18 tests failed!

Is there any way to obtain regular updates of the database of test results? Since Brion removed the "TODO" tags, it appears to be very difficult to distinguish expected fail tests from newly failing tests without it. The idea of parserTests.php was to detect issues before commit, wasn't it? Not to wait a day for the email?

-- Tim Starling

Rob Church

9:41 p.m.

On 29/04/07, Tim Starling tstarling@wikimedia.org wrote:

...

without it. The idea of parserTests.php was to detect issues before commit, wasn't it? Not to wait a day for the email?

I would hope so. It appears the recent additions were aimed more at the audience of people who *do* wait for the emails, however, this should not prevent people who are working on the software from continuing to run regression tests when altering the parser.

You could, of course, run the parser tests in --record mode, and have a local table of past results, but of course, that means setting the damn things up.

Rob Church

Nick Jenkins

30 Apr 30 Apr

9:32 a.m.

...

...
without it. The idea of parserTests.php was to detect issues before commit, wasn't it? Not to wait a day for the email?

I would hope so. It appears the recent additions were aimed more at the audience of people who *do* wait for the emails, however, this should not prevent people who are working on the software from continuing to run regression tests when altering the parser.

Related to this, what might be useful is an IRC bot that sits on #mediawiki, and every time there was a commit, it would: * Run parserTests --record, and inform the channel of any tests whose status changes. * Run a check for Native EOL style, and inform the channel of any new files that needed native EOL style set. * Run a "php -l" lint-check over the code, and report back any failures.

Finding the above stuff before commit is best, although finding it immediately after a commit is probably the next best option.

-- All the best, Nick.

Brion Vibber

10:17 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Tim Starling wrote:

...

Is there any way to obtain regular updates of the database of test results? Since Brion removed the "TODO" tags, it appears to be very difficult to distinguish expected fail tests from newly failing tests without it.

The big old "[Has never passed]" seems like a clue to me. :)

...

The idea of parserTests.php was to detect issues before commit, wasn't it? Not to wait a day for the email?

Ideally people run them before they commit to make sure they didn't break things. When they don't, an automated daily check helps the cleanup.

In theory we could run the tests as a checkin script of some kind, but that would be pretty slow on the server.

- -- brion vibber (brion @ wikimedia.org)

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGNh2iwRnhpk1wk44RArbEAKCOjJB0vF+W9yM1t7GiF/bbr+WRygCfdNsa uYZWFVe4tIgr9iTYlalZglA= =RdCk -----END PGP SIGNATURE-----

Tim Starling

1 May 1 May

12:07 a.m.

Brion Vibber wrote:

...

Tim Starling wrote:

...
Is there any way to obtain regular updates of the database of test results? Since Brion removed the "TODO" tags, it appears to be very difficult to distinguish expected fail tests from newly failing tests without it.

The big old "[Has never passed]" seems like a clue to me. :)

Here is the output from my working copy:

This is MediaWiki version 1.10alpha (r21715).

Reading tests from "maintenance\parserTests.txt"... Running test URL-encoding in URL functions (single parameter)... FAILED! Running test URL-encoding in URL functions (multiple parameters)... FAILED! Running test Table security: embedded pipes (http://mail.wikipedia.org/pipermail/wikitech-l/2006-April/034637.html)... FAILED! Running test Link containing double-single-quotes '' (bug 4598)... FAILED! Running test message transform: <noinclude> in transcluded template (bug 4926)... FAILED! Running test message transform: <onlyinclude> in transcluded template (bug 4926)... FAILED! Running test BUG 1887, part 2: A <math> with a thumbnail- math enabled... FAILED! Running test HTML bullet list, unclosed tags (bug 5497)... FAILED! Running test HTML ordered list, unclosed tags (bug 5497)... FAILED! Running test HTML nested bullet list, open tags (bug 5497)... FAILED! Running test HTML nested ordered list, open tags (bug 5497)... FAILED! Running test Fuzz testing: image with bogus manual thumbnail... FAILED! Running test Inline HTML vs wiki block nesting... FAILED! Running test Mixing markup for italics and bold... FAILED! Running test dt/dd/dl test... FAILED! Running test Images with the "|" character in the comment... FAILED! Running test Parents of subpages, two levels up, without trailing slash or name.... FAILED! Running test Parents of subpages, two levels up, with lots of extra trailing slashes.... FAILED! Reading tests from "......\C:\htdocs\w2\extensions\Cite\citeParserTests.txt"... Running test Simple <ref>, with <references/>... FAILED! Running test <ref> with a simple template... FAILED! Running test <ref> with a <nowiki>... FAILED! Running test <ref> in a <nowiki>... FAILED! Running test <ref> in a ... FAILED! Running test  in a <ref> (bug 5384)... FAILED! Running test <references> after <gallery> (bug 6164)... FAILED! Running test {{REVISIONID}} on page with <ref> (bug 6299)... FAILED! Running test Blank ref followed by ref with content... FAILED! Running test Regression: non-blank ref "0" followed by ref with content... FAILED! Running test Regression sanity check: non-blank ref "1" followed by ref with content... FAILED! Passed 476 of 505 tests (94.26%)... 29 tests failed!

OK, 29 tests failed, 18 failed in the email, which ones are different? I don't see any [Has never passed] notes. I could put this test output side by side with the email and manually mark off the differences, but avoiding that tedious task was the reason I put the TODO annotations in the test names in the first place. I could make a copy of my working directory, revert any relevant changes, run parserTests.php on each and do a diff, but that's hardly easier either. I could set up a script to run parserTests.php say once a day and save the results to the database, but then new checked-in expected failures would be difficult to distinguish from working copy changes. I could run it every time I do an svn up, but half a dozen complications with that approach immediately come to mind.

Your reason for removing the TODO notes was "The test runner can now indicate items which have never passed, and twiddling with the title constantly means they can't be tracked, so it's better not to have such markings."

They could be tracked if we added a numeric identifier to each test, as has been suggested several times. Then we could tweak the description in all sorts of ways and still be able to track them. And we wouldn't have to try to think of a unique regex to refer to a test when we want to run it.

The only person who knows whether a new test is an expected pass or an expected fail is the person who adds the test to parserTests.txt. If someone commits a new failing test, you have to ask them whether it was intended. It makes perfect sense to me to annotate parserTests.txt in some way (not necessarily in the test title), to indicate expected fails.

Not that I understand why we have expected failing tests at all. But that's a crusade for another day.

-- Tim Starling

Brion Vibber

1:31 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Tim Starling wrote: [snip]

...

OK, 29 tests failed, 18 failed in the email, which ones are different?

Use the --record and --compare options so you know which ones changed while you were working with the code -- that's why they were added.

No test *should* be failing. All the failing tests need to be fixed, sooner or later: either the code is buggy and should be fixed or the test is wrong and should be fixed or both. IMHO there's no such thing as a "TODO" here -- *every* failing test is a "TODO" because it needs to be fixed. So why would anyone mark one as "TODO"? Failing *means* it's a "TODO". Passing *means* it's not a "TODO".

- -- brion vibber (brion @ wikimedia.org)

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGNkswwRnhpk1wk44RAkqrAKCKP3kIPFYw63h5JBqsStz0ApjxJgCcD7x8 u+gzeNRcChuNfLnendqEAVA= =qGi2 -----END PGP SIGNATURE-----

Nick Jenkins

8:04 a.m.

...

...
OK, 29 tests failed, 18 failed in the email, which ones are different?

Use the --record and --compare options so you know which ones changed while you were working with the code -- that's why they were added.

To quickly expand on this, in MySQL do a "source maintenance/testRunner.sql", which will create the two tables needed for storing test results, then do a: php maintenance/parserTests.php --quiet --quick --record --color=no (You can even skip the sourcing the table definitions if you want, as Ashar added sourcing this automatically if you use the --record option and the tables don't exist; and the extra command-line options shown above are just the ones I use) Then you'd make your changes (so it's probably not going to help you right now if you've already modified stuff), and then do a: php maintenance/parserTests.php --quiet --quick --compare --color=no ... and any items which are different should appear under the "previously passing test(s) now FAILING!" or "previously failing test(s) now PASSING!" or "previously PASSING test(s) removed" or "new PASSING test(s)" or "previously passing test(s) now FAILING!" or "previously FAILING test(s) removed" or "new FAILING test(s)" sections. Things that haven't changed should be under "still FAILING test(s)". When you want, you'd update your baseline again with the --record option (e.g. svn checkout time can sometimes be a good time to do this)

...

Not that I understand why we have expected failing tests at all.

Think of a never-passed test as the oppression of hope by reality :-)

-- All the best, Nick.

6420

Age (days ago)

6422

Last active (days ago)

wikitech-l@lists.wikimedia.org

7 comments

5 participants

tags (0)

participants (5)

Brion Vibber
brion＠pobox.com
Nick Jenkins
Rob Church
Tim Starling