On Wed, Jul 24, 2013 at 9:02 AM, Subramanya Sastry ssastry@wikimedia.org wrote:
http://parsoid.wmflabs.org:8001/stats
This is the url for our round trip testing on 160K pages (20K each from 8 wikipedias).
Very minor point .. there are ~400 missing pages on the list; is that intentional ? ;-)
One is 'Mos:time' which is in NS 0, and does actually exist as a redirect to the WP: manual of style: https://en.wikipedia.org/wiki/Mos:time
... But, 99.6% means that 0.4% of pages still had corruptions, and that 15% of pages had syntactic dirty diffs.
So 15% is 24000 pages which can bust, but may not if the edit doesnt touch the bustable part.
Does /topfails cycle through all 24000, 40 pages at a time?
Could you provide a dump of the list of 24000 bustable pages? Split by project? Each community could then investigate those pages for broken tables, and more critically .. templates which emit broken wikisyntax that is causing your team grief.
Do you have stats on each of those eight wikipedias? i.e. is there noticeable differences in the percentages on different wikipedias? if so, can you report those percentages for each projects? I'm guessing Chinese is an example where there are higher percentages..?
-- John Vandenberg