For Parsoid, we run tests [1] against a set of 160K articles that
we randomly picked a couple years back .. about 10K articles from
16 wikis. For Parsoid's purposes, we run roundtrip tests (wikitext
-> html -> wikitext) and compare diffs, as well as run
trivial edit tests (wikitext -> html -> add comment at end
of page -> wikitext) and check how clean our roundtripping is.
This testing has been extremely good at telling us when something
is broken vs. when something is good to be deployed. Checking
these results is part of our deployment process. We also collect
performance statistics in each testing run, however our testing
database / database schema is not sufficiently tuned to let us
actually track performance regressions well .. so, that data has
just sat in the db without being used for anything.
But, we've also been recently talking about:
* refresh this to pick a more proportional set of articles from
different wikis (more from enwiki, less from others, etc.), but
not yet done this.
* throw in a different (non-random selection) set of pages that
are particularly important (featured articles, etc.). so, we would
be interested in any set of articles that is considered important
enough to be regularly tested against.
This map-reduce style testing code is somewhat general enough that
it could be repurposed for other kinds of testing. For example, we
have also repurposed this same rt-testing code for running visual
diffs (compare phantomjs renderings of php parser output and
parsoid output on the same title) on a set of about 800 enwiki
articles (random selection) [2].
This kind of testing is very essential for our deployments and not
sure if it is appropriate for other teams .. but sharing just in
case.
Subbu.
[1] See
http://parsoid-tests.wikimedia.org/topfails and
http://parsoid-tests.wikimedia.org/commits .. The main page is
http://parsoid-tests.wikimedia.org but this page can sometimes
timeout whenever the db is clogged and old test results need
clearing out.
[2]
http://parsoid-tests.wikimedia.org/visualdiff/ with code @
https://github.com/subbuss/parsoid_visual_diffs
On 05/20/2015 01:48 AM, Elena Tonkovidova wrote:
On
https://docs.google.com/spreadsheets/d/14Ei-KWYbZcmvT70irx6NGIJCi17tF2o1szXnQsZ2h-A/edit#gid=0
there are articles that I usually check when I do regression
testing.
One group is a set of articles that used to have some sort
of performance/display issues
- Barack
Obama, Cat, India, Richard Nixon,
Europe,
English language
Another group of articles - where images or Image Gallery
is tested(gif, svg, image map, charts, timeline, large amount
of imgs in the Image Gallery)
- Claude Monet - extensive Image Gallery(different
img sizes)
- List of go games - many svg images
- Lilac
chaser, Caridoid
escape reaction -
animated(gif) images
- The
Club(dining club), Image map
- for image map img
-
Tel Aviv(Hebrew) for timeline img template
-
several specific articles with problems in their lead img
And,
yes, it'd be really great if we can 1) define more precisely
what articles properties we are interested to test(visiting
statistics, size,
structures, special layouts, imgs etc.)
and 2) create a process(system) to find such articles
thanks
Elena
_______________________________________________
QA mailing list
QA@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/qa