On 07/23/2013 06:13 PM, John Vandenberg wrote:
On Wed, Jul 24, 2013 at 9:02 AM, Subramanya Sastry ssastry@wikimedia.org wrote:
On 07/23/2013 05:28 PM, John Vandenberg wrote:
On Wed, Jul 24, 2013 at 2:06 AM, Subramanya Sastry ssastry@wikimedia.org wrote:
Hi John and Risker,
First off, I do want to once again clarify that my intention in the previous post was not to claim that VE/Parsoid is perfect. It was more that we've fixed sufficient bugs at this point that the most significant "bugs" (bugs, not missing features) that need fixing (and are being fixed) are those that have to do with usability tweaks.
How do you know that? Have you performed automated tests on all Wikipedia content? Or are you waiting for users to find these bugs?
http://parsoid.wmflabs.org:8001/stats
This is the url for our round trip testing on 160K pages (20K each from 8 wikipedias).
Fantastic! How frequently are those tests re-run? Could you add a last-run-date on that page?
The tests are re-run after a bunch of commits that we think should be regression tested -- usually updated one or more times a day (when a lot of patches are being merged) or after a few days (during periods of low activity). The last code udpate was Thursday
http://parsoid.wmflabs.org:8001/commits gives you the list of commits (and date when code was updated) http://parsoid.wmflabs.org:8001/topfails gives you individual test results on every tested page for more detail.
Currently we are updating our rt testing infrastructure to gather performance numbers as well (this has been on the cards for a long time, but never got the attention it needed). But, Marco is working that part of our codebase as we speak. https://bugzilla.wikimedia.org/show_bug.cgi?id=46659 and other related ones.
We do not deploy to production before we have run tests on a subset of pages in rt-testing. Given the nature of how tests are run, it is usually sufficient to run on about a 1000 pages to know if there are serious regressions .. sometimes we run on a larger subset of pages.
Was a regression testsuite built using the issues encountered during the last parser rewrite?
We also continually update a parser tests file (in the code repository) with minimized test cases based on regressions and odd wikitext usage. About 1100 tests so far that run in 4 modes (wt2html, wt2wt, html2wt, html2html) plus 14000 randomly generated edits to the tests to mimic edits and test our selective serializer. This is our first guard against bad code.
Subbu.