Sure, the main problem I see with the proposed set up [1] is that you have no way of ensuring the MediaWiki you are hitting is in a consistent state; having tests fail because of edit conflicts, modified pages, users that already exist, have been blocked, etc. as a result of other tests, is very annoying; (tests can't be relied on to clean up after themselves, and one failure should not cause the rest of the suite to fail until it is manually fixed)
The diagram was created with manual testing against the usability prototypes in mind. I'll update it to include automated testing with a test runner and code review when we work out the plan.
To some extent this can be worked around by using carefully selected random parameters for many things, but that is a horrible hack, and requires extra work in writing test scripts; though as I assume/hope they will be written in PHP, not a huge difficultly, providing we teach everyone how.
Much cleaner is to have a set up like the current parser tests where each test can specify which articles it expects to exist with what content (selenium tests may also wish to specify which users exist with what privileges/preferences as well) in addition to being able to tweak configuration settings (otherwise we're going to need a fair few MediaWiki's even to test configurations that are live at WikiMedia). This is quite readily doable, if you run a MediaWiki instance on the same machine as the test runner, and I imagine it would also be possible to do by building a communication protocol between the two, though that seems like a waste of effort.
This is what I was hoping for. I think the test runner should reconfigure the wiki for each test. If we want to be able to run multiple tests in parallel, we should have multiple wikis that can be reconfigured, and tested against independently.
Do the parser tests only test core parser functionality, or do they also test extensions, like ParserFunctions, and SyntaxHighlight GeSHi? It is likely we'll have tests that will need to dynamically include extensions, and configure them dynamically as well.
This won't be a problem for local developers, the test runner, the browser and MediaWiki are all on localhost, the tests can be written in PHP (or exported from selenium IDE into PHP) and run with a wrapper script (maintenance/seleniumTests.php - or w/e) that handles the configuration, output is handled by PHPUnit, all happy.
For a selenium-grid setup, it's not so obvious how to do it, I'd suggest that, instead of having developers run scripts against the grid themselves, they simply request a run on a server designed for this task, which runs the test through the grid using a hostname that will resolve back to the runner. This allows easy local control over the MediaWiki, and makes it reasonably easy to write an interface for normal developers who won't/can't run selenium to run tests against MediaWiki.
For the grid setup, we were exploring the possibility of a test runner that automatically tests commits, and reports them to Code Review, like the parser tests do now. For the most part, people shouldn't be hitting the grid, only bots, unless we have a QA team that is doing something special.
I don't think reconfiguring MediaWiki per test-script, or per set of test-scripts is an outrageous overhead, Selenium is a very "enterprise" tool, and booting virtual machines with browsers in is likely much more costly than that. The advantages it gives are obvious, tests should not fail because of faults in the testing environment, that just wastes time.
Yeah, depending on the browser, OS, and hardware specs of the machine, browsers can take 10-70 seconds to run even simple scripts. The overhead of re-configuring the wiki is nothing in comparison.
Cleaning the state of the browsers is probably not so critical here, but it's another "gotcha", if one test leaves the user logged in, and the next test tries to click the "Login" link, it explodes, and vice/versa. Selenium can helps somewhat here (if you persuade it to, and to varying extents in various browser versions), but it's likely easier to cleanse the database.
When selenium launches a browser, it does so using a clean profile. It launches a fresh browser from a new profile every test it runs. This shouldn't be an issue.
Respectfully,
Ryan Lane