<div dir="ltr">Subbu that's exactly the kind of thing I have in mind, and further reinforces the notions I have that using an API (server- or client-side) built on top of Parsoid could be a huge step forward in reliability. If you're testing that the API works & DOM spec is correct across that many articles, the things we build on top should "Just Work (TM)." I'll be sure to look into this in my hacking, thanks for replying!</div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, May 20, 2015 at 10:52 AM, Subramanya Sastry <span dir="ltr"><<a href="mailto:ssastry@wikimedia.org" target="_blank">ssastry@wikimedia.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div><br>
For Parsoid, we run tests [1] against a set of 160K articles that
we randomly picked a couple years back .. about 10K articles from
16 wikis. For Parsoid's purposes, we run roundtrip tests (wikitext
-> html -> wikitext) and compare diffs, as well as run
trivial edit tests (wikitext -> html -> add comment at end
of page -> wikitext) and check how clean our roundtripping is.<br>
<br>
This testing has been extremely good at telling us when something
is broken vs. when something is good to be deployed. Checking
these results is part of our deployment process. We also collect
performance statistics in each testing run, however our testing
database / database schema is not sufficiently tuned to let us
actually track performance regressions well .. so, that data has
just sat in the db without being used for anything.<br>
<br>
But, we've also been recently talking about:<br>
* refresh this to pick a more proportional set of articles from
different wikis (more from enwiki, less from others, etc.), but
not yet done this.<br>
* throw in a different (non-random selection) set of pages that
are particularly important (featured articles, etc.). so, we would
be interested in any set of articles that is considered important
enough to be regularly tested against.<br>
<br>
This map-reduce style testing code is somewhat general enough that
it could be repurposed for other kinds of testing. For example, we
have also repurposed this same rt-testing code for running visual
diffs (compare phantomjs renderings of php parser output and
parsoid output on the same title) on a set of about 800 enwiki
articles (random selection) [2].<br>
<br>
This kind of testing is very essential for our deployments and not
sure if it is appropriate for other teams .. but sharing just in
case.<br>
<br>
Subbu.<br>
<br>
[1] See <a href="http://parsoid-tests.wikimedia.org/topfails" target="_blank">http://parsoid-tests.wikimedia.org/topfails</a> and
<a href="http://parsoid-tests.wikimedia.org/commits" target="_blank">http://parsoid-tests.wikimedia.org/commits</a> .. The main page is
<a href="http://parsoid-tests.wikimedia.org" target="_blank">http://parsoid-tests.wikimedia.org</a> but this page can sometimes
timeout whenever the db is clogged and old test results need
clearing out.<br>
<br>
[2] <a href="http://parsoid-tests.wikimedia.org/visualdiff/" target="_blank">http://parsoid-tests.wikimedia.org/visualdiff/</a> with code @
<a href="https://github.com/subbuss/parsoid_visual_diffs" target="_blank">https://github.com/subbuss/parsoid_visual_diffs</a><div><div class="h5"><br>
<br>
<br>
On 05/20/2015 01:48 AM, Elena Tonkovidova wrote:<br>
</div></div></div>
<blockquote type="cite"><div><div class="h5">
<div dir="ltr">On <a href="https://docs.google.com/spreadsheets/d/14Ei-KWYbZcmvT70irx6NGIJCi17tF2o1szXnQsZ2h-A/edit#gid=0" target="_blank">https://docs.google.com/spreadsheets/d/14Ei-KWYbZcmvT70irx6NGIJCi17tF2o1szXnQsZ2h-A/edit#gid=0</a>
there are articles that I usually check when I do regression
testing.
<div><br>
</div>
<div>One group is a set of articles that used to have some sort
of performance/display issues<br>
<div>- <span style="font-family:arial,sans,sans-serif;font-size:13px;font-weight:bold">Barack
Obama, Cat, India, Richard Nixon, </span></div>
<div><span style="font-family:arial,sans,sans-serif;font-size:13px;font-weight:bold">Europe,
English language</span></div>
<div><br>
</div>
</div>
<div>Another group of articles - where images or Image Gallery
is tested(gif, svg, image map, charts, timeline, large amount
of imgs in the Image Gallery)</div>
<div><br>
</div>
<div>- <b>Claude Monet </b>- extensive Image Gallery(different
img sizes)</div>
<div>- <b>List of go games</b> - many svg images</div>
<div>- <span style="color:rgb(0,0,0);font-family:arial,sans,sans-serif;font-size:13px;font-weight:bold;white-space:pre-wrap">Lilac
chaser, </span><span style="color:rgb(0,0,0);font-family:arial,sans,sans-serif;font-size:13px;font-weight:bold;white-space:pre-wrap">Caridoid
escape reaction </span><span style="color:rgb(0,0,0);font-family:arial,sans,sans-serif;font-size:13px;white-space:pre-wrap">-
animated(gif) images</span></div>
<div><span style="color:rgb(0,0,0);font-family:arial,sans,sans-serif;font-size:13px;white-space:pre-wrap">- </span><span style="color:rgb(0,0,0);font-family:arial,sans,sans-serif;font-size:13px;white-space:pre-wrap"><b>The
Club(dining club), Image map</b></span><span style="color:rgb(0,0,0);font-family:arial,sans,sans-serif;font-size:13px;white-space:pre-wrap">
- for image map img</span></div>
<div><span style="color:rgb(0,0,0);font-family:arial,sans,sans-serif;font-size:13px;white-space:pre-wrap">-
<b>Tel Aviv(Hebrew</b>) for timeline img template</span></div>
<div><span style="color:rgb(0,0,0);font-family:arial,sans,sans-serif;font-size:13px;white-space:pre-wrap">-
several specific articles with problems in their lead img</span></div>
<div><span style="color:rgb(0,0,0);font-family:arial,sans,sans-serif;font-size:13px;white-space:pre-wrap"><br>
</span></div>
<div><span style="color:rgb(0,0,0);font-family:arial,sans,sans-serif;font-size:13px;white-space:pre-wrap">And,
yes, it'd be really great if we can 1) define more precisely
what articles properties we are interested to test(visiting
statistics, </span><span style="color:rgb(0,0,0);font-family:arial,sans,sans-serif;font-size:13px;white-space:pre-wrap">size,
structures, special layouts, imgs etc.) </span><span style="color:rgb(0,0,0);font-family:arial,sans,sans-serif;font-size:13px;white-space:pre-wrap">
and 2) create a process(system) to find such articles </span></div>
<div><span style="color:rgb(0,0,0);font-family:arial,sans,sans-serif;font-size:13px;white-space:pre-wrap"><br>
</span></div>
<div>
<h1 style="margin:0px;padding:8px 0px;border:0px"><span>Also,
there is still an open task - </span><font color="#464c5c" face="Segoe UI, Segoe UI Web Regular, Segoe UI Symbol,
Helvetica Neue, Helvetica, Arial, sans-serif"><span style="font-size:15px;font-weight:normal"><a href="https://phabricator.wikimedia.org/T97151" target="_blank">https://phabricator.wikimedia.org/T97151</a>
- </span></font><span>Testing
Page issues and disambiguation templates(T90250). Going
through the list of </span><a href="http://en.wikipedia.org/wiki/Category:Wikipedia_articles_with_content_issues" rel="noreferrer" target="_blank">http://en.wikipedia.org/wiki/Category:Wikipedia_articles_with_content_issues</a></h1>
<a href="http://en.wikipedia.org/wiki/Wikipedia:Template_messages/General#Disambiguation_and_redirection" rel="noreferrer" target="_blank">http://en.wikipedia.org/wiki/Wikipedia:Template_messages/General#Disambiguation_and_redirection</a> should
help to catch some issues.</div>
<div><br>
</div>
<div>thanks</div>
<div>Elena</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Tue, May 19, 2015 at 9:23 PM,
Brian Gerstle <span dir="ltr"><<a href="mailto:bgerstle@wikimedia.org" target="_blank">bgerstle@wikimedia.org</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div dir="ltr">+search</div>
<div>
<div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Tue, May 19, 2015 at
3:14 PM, Brian Gerstle <span dir="ltr"><<a href="mailto:bgerstle@wikimedia.org" target="_blank">bgerstle@wikimedia.org</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div dir="ltr">The subject hints at a question
that's been nagging me for a while, and now
that I'm going to be hacking on testing in
Lyon I wanted to ask:
<div><br>
</div>
<div>Do we have a list of articles we usually
run tests against?</div>
<div><br>
</div>
<div>If not, do we have any processes for
curating such a list? Would anyone be
interested in a brainstorming session at
Lyon to discuss this further?</div>
<div><br>
</div>
<div>Basically, as a developer, I would love
to have more confidence that some code I
wrote doesn't break on our most popular
articles. Or, if we can get more
sophisticated, that <b>certain properties
of my code hold true for certain kinds of
generated pages</b>.*</div>
<div><br>
</div>
<div>Please respond with your thoughts and
whether you think I should create a phab
task for the hackathon about this. In
either case, ping me anytime or grab me at
Lyon to discuss further!</div>
<div><br>
</div>
<div>Regards,</div>
<div><br>
</div>
<div>Brian</div>
<div><br>
</div>
<div>* Yes, I'm talking about using
property-based testing generators to create
random, shrinkable MW pages that we can run
tests on. Not sure if it's practical, but
could be an interesting experiment.</div>
<span><font color="#888888">
<div>
<div><br>
</div>
-- <br>
<div>
<div dir="ltr">
<div>
<div dir="ltr">EN Wikipedia user
page: <a href="https://en.wikipedia.org/wiki/User:Brian.gerstle" target="_blank">https://en.wikipedia.org/wiki/User:Brian.gerstle</a><br>
IRC: bgerstle</div>
</div>
</div>
</div>
</div>
</font></span></div>
</blockquote>
</div>
<br>
<br clear="all">
<div><br>
</div>
-- <br>
<div>
<div dir="ltr">
<div>
<div dir="ltr">EN Wikipedia user page: <a href="https://en.wikipedia.org/wiki/User:Brian.gerstle" target="_blank">https://en.wikipedia.org/wiki/User:Brian.gerstle</a><br>
IRC: bgerstle</div>
</div>
</div>
</div>
</div>
</div>
</div>
<br>
_______________________________________________<br>
reading-wmf mailing list<br>
<a href="mailto:reading-wmf@lists.wikimedia.org" target="_blank">reading-wmf@lists.wikimedia.org</a><br>
<a href="https://lists.wikimedia.org/mailman/listinfo/reading-wmf" target="_blank">https://lists.wikimedia.org/mailman/listinfo/reading-wmf</a><br>
<br>
</blockquote>
</div>
<br>
</div>
</div>
<br>
<fieldset></fieldset>
<br>
</div></div><pre>_______________________________________________
QA mailing list
<a href="mailto:QA@lists.wikimedia.org" target="_blank">QA@lists.wikimedia.org</a>
<a href="https://lists.wikimedia.org/mailman/listinfo/qa" target="_blank">https://lists.wikimedia.org/mailman/listinfo/qa</a>
</pre>
</blockquote>
<br>
</div>
<br>_______________________________________________<br>
reading-wmf mailing list<br>
<a href="mailto:reading-wmf@lists.wikimedia.org">reading-wmf@lists.wikimedia.org</a><br>
<a href="https://lists.wikimedia.org/mailman/listinfo/reading-wmf" target="_blank">https://lists.wikimedia.org/mailman/listinfo/reading-wmf</a><br>
<br></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature"><div dir="ltr"><div><div dir="ltr">EN Wikipedia user page: <a href="https://en.wikipedia.org/wiki/User:Brian.gerstle" target="_blank">https://en.wikipedia.org/wiki/User:Brian.gerstle</a><br>IRC: bgerstle</div></div></div></div>
</div>