On 07/23/2013 05:28 PM, John Vandenberg wrote:
On Wed, Jul 24, 2013 at 2:06 AM, Subramanya Sastry ssastry@wikimedia.org wrote:
Hi John and Risker,
First off, I do want to once again clarify that my intention in the previous post was not to claim that VE/Parsoid is perfect. It was more that we've fixed sufficient bugs at this point that the most significant "bugs" (bugs, not missing features) that need fixing (and are being fixed) are those that have to do with usability tweaks.
How do you know that? Have you performed automated tests on all Wikipedia content? Or are you waiting for users to find these bugs?
http://parsoid.wmflabs.org:8001/stats
This is the url for our round trip testing on 160K pages (20K each from 8 wikipedias).
Till late March, we used to run round trip testing on 100K enwp pages. We then moved to a mix of pages from different WPs to catch language and wiki-specific issues and fix them.
So, this is our methodology for catching parse and roundtrip errors on real WP pages and regressions.
I wont go into great details of what the 3 numbers mean and the nuances.
But, 99.6% means that 0.4% of pages still had corruptions, and that 15% of pages had syntactic dirty diffs.
However, note that this is because the serialization behaves as if the entire document is edited (which lets us stress test our seriailzation system) but is not real behavior in production. In production, our HTML two WT is smarter and attempts to only serialize modified segments and uses original wikitext for unmodified segments of the dom (called selective serialization). So, in reality, the corruption percentage should be much smaller than even the 0.4% and the dirty diffs as well will be way smaller (but you are still finding 1 in 200 or more) -- and this is separate from nowiki issues.
We are not solely dependent on users to find bugs for us, no, but in production, if there are corruptions that show up, it would be helpful if we are alerted.
Does that clarify?
VE and Parsoid devs have put in a lot and lot of effort to recognize broken wikitext source, fix it or isolate it,
My point was that you dont appear to be doing analysis of how of all Wikipedia content is broken; at least I dont see a public document listing which templates and pages are causing the parser problems, so the communities on each Wikipedia can fix them ahead of deployment.
Unfortunately, this is much harder to do. What we can consider is to periodically swap out our test pages to consider a fresh patch of pages so new kinds of problems show up in automated testing. In some cases, detecting problems automatically is equivalent to be able to fix them up automatically as well.
Gabriel is currently on a (well-deserved) vacation and once he is back, we'll discuss this issue and see what can be done. But, whenever we find problems, we've been fixing templates (about 3 or 4 fixed so far) or we fix broken wikitext as well.
We also have this desirable enhancement/tool that we could build: https://bugzilla.wikimedia.org/show_bug.cgi?id=46705
I believe there is bug about automated testing of the parser against existing pages, which would identify problems.
I scanned the Spanish 'visualeditor' tag's 50 recentchanges earlier and found a dirty diff, which I believe hasnt been raised in bugzilla yet.
https://bugzilla.wikimedia.org/show_bug.cgi?id=51909
50 VE edits on eswp is more than one day of recentchanges. Most of the top 10 wikis have roughly the same level of testing going on. That should be a concern. The number of VE edits is about to increase on another nine Wikipedias, with very little real impact analysis having been done. That is a shame, because the enwp deployment has provided us with a list of problems which will impact those wikis if they are using the same syntax, be it weird or broken or otherwise troublesome.
As indicated earlier, we have done automated RT testing on 20K pages on different WPs and fixed various problems, but yes, this will not catch all problematic scenarios.
and protect it across edits, and roundtrip it back in original form to prevent corruption. I think we have been largely successful but we still have more cases to go that are being exposed here which we will fix. But, occasionally, these kind of errors do show up -- and we ask for your patience as we fix these. Once again, this is not a claim to perfection, but a claim that this is not a significant source of corrupt edits. But, yes even a 0.1% error rate does mean a big number in the absolute when thousands of pages are being edited -- and we will continue to pare this down.
Is 0.1% a real data point, or a stab in the dark? Because I found two in 100 on enwp; Robert found at least one in 200 on enwp; and I found 1 in 50 on eswp.
Sorry -- I should have phrased that better. I just picked 0.1% as an arbitrary number to make the observation that even when it is as low as 0.1%, in absolute numbers, it can still be noticeable.
Subbu.