<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix"><br>
       For Parsoid, we run tests [1] against a set of 160K articles that
      we randomly picked a couple years back .. about 10K articles from
      16 wikis. For Parsoid's purposes, we run roundtrip tests (wikitext
      -> html -> wikitext) and compare diffs, as well as run
      trivial edit tests (wikitext -> html -> add comment at end
      of page -> wikitext) and check how clean our roundtripping is.<br>
      <br>
      This testing has been extremely good at telling us when something
      is broken vs. when something is good to be deployed. Checking
      these results is part of our deployment process. We also collect
      performance statistics in each testing run, however our testing
      database / database schema is not sufficiently tuned to let us
      actually track performance regressions well .. so, that data has
      just sat in the db without being used for anything.<br>
      <br>
      But, we've also been recently talking about:<br>
      * refresh this to pick a more proportional set of articles from
      different wikis (more from enwiki, less from others, etc.), but
      not yet done this.<br>
      * throw in a different (non-random selection) set of pages that
      are particularly important (featured articles, etc.). so, we would
      be interested in any set of articles that is considered important
      enough to be regularly tested against.<br>
      <br>
      This map-reduce style testing code is somewhat general enough that
      it could be repurposed for other kinds of testing. For example, we
      have also repurposed this same rt-testing code for running visual
      diffs (compare phantomjs renderings of php parser output and
      parsoid output on the same title) on a set of about 800 enwiki
      articles (random selection) [2].<br>
      <br>
      This kind of testing is very essential for our deployments and not
      sure if it is appropriate for other teams .. but sharing just in
      case.<br>
      <br>
      Subbu.<br>
      <br>
      [1] See <a class="moz-txt-link-freetext" href="http://parsoid-tests.wikimedia.org/topfails">http://parsoid-tests.wikimedia.org/topfails</a> and
      <a class="moz-txt-link-freetext" href="http://parsoid-tests.wikimedia.org/commits">http://parsoid-tests.wikimedia.org/commits</a> .. The main page is
      <a class="moz-txt-link-freetext" href="http://parsoid-tests.wikimedia.org">http://parsoid-tests.wikimedia.org</a> but this page can sometimes
      timeout whenever the db is clogged and old test results need
      clearing out.<br>
      <br>
      [2] <a class="moz-txt-link-freetext" href="http://parsoid-tests.wikimedia.org/visualdiff/">http://parsoid-tests.wikimedia.org/visualdiff/</a>  with code @
      <a class="moz-txt-link-freetext" href="https://github.com/subbuss/parsoid_visual_diffs">https://github.com/subbuss/parsoid_visual_diffs</a><br>
      <br>
      <br>
      On 05/20/2015 01:48 AM, Elena Tonkovidova wrote:<br>
    </div>
    <blockquote
cite="mid:CAMHBB_1NndTf56foa6cPgjR+p8scAcG7WG5bTtgsCCQLrHuhcg@mail.gmail.com"
      type="cite">
      <div dir="ltr">On <a moz-do-not-send="true"
href="https://docs.google.com/spreadsheets/d/14Ei-KWYbZcmvT70irx6NGIJCi17tF2o1szXnQsZ2h-A/edit#gid=0"
          target="_blank">https://docs.google.com/spreadsheets/d/14Ei-KWYbZcmvT70irx6NGIJCi17tF2o1szXnQsZ2h-A/edit#gid=0</a>
        there are articles that I usually check when I do regression
        testing. 
        <div><br>
        </div>
        <div>One group is a set of articles that used to have some sort
          of performance/display issues<br>
          <div>-  <span
style="font-family:arial,sans,sans-serif;font-size:13px;font-weight:bold">Barack
              Obama, Cat, India, Richard Nixon, </span></div>
          <div><span
style="font-family:arial,sans,sans-serif;font-size:13px;font-weight:bold">Europe,
              English language</span></div>
          <div><br>
          </div>
        </div>
        <div>Another group of articles - where images or Image Gallery
          is tested(gif, svg, image map, charts, timeline, large amount
          of imgs in the Image Gallery)</div>
        <div><br>
        </div>
        <div>- <b>Claude Monet </b>- extensive Image Gallery(different
          img sizes)</div>
        <div>- <b>List of go games</b> - many svg images</div>
        <div>- <span
style="color:rgb(0,0,0);font-family:arial,sans,sans-serif;font-size:13px;font-weight:bold;white-space:pre-wrap">Lilac
            chaser, </span><span
style="color:rgb(0,0,0);font-family:arial,sans,sans-serif;font-size:13px;font-weight:bold;white-space:pre-wrap">Caridoid
            escape reaction </span><span
style="color:rgb(0,0,0);font-family:arial,sans,sans-serif;font-size:13px;white-space:pre-wrap">-
            animated(gif) images</span></div>
        <div><span
style="color:rgb(0,0,0);font-family:arial,sans,sans-serif;font-size:13px;white-space:pre-wrap">- </span><span
style="color:rgb(0,0,0);font-family:arial,sans,sans-serif;font-size:13px;white-space:pre-wrap"><b>The
              Club(dining club), Image map</b></span><span
style="color:rgb(0,0,0);font-family:arial,sans,sans-serif;font-size:13px;white-space:pre-wrap">
            - for image map img</span></div>
        <div><span
style="color:rgb(0,0,0);font-family:arial,sans,sans-serif;font-size:13px;white-space:pre-wrap">-
            <b>Tel Aviv(Hebrew</b>) for timeline img template</span></div>
        <div><span
style="color:rgb(0,0,0);font-family:arial,sans,sans-serif;font-size:13px;white-space:pre-wrap">-
            several specific articles with problems in their lead img</span></div>
        <div><span
style="color:rgb(0,0,0);font-family:arial,sans,sans-serif;font-size:13px;white-space:pre-wrap"><br>
          </span></div>
        <div><span
style="color:rgb(0,0,0);font-family:arial,sans,sans-serif;font-size:13px;white-space:pre-wrap">And,
            yes, it'd be really great if we can 1) define more precisely
            what articles properties we are interested to test(visiting
            statistics, </span><span
style="color:rgb(0,0,0);font-family:arial,sans,sans-serif;font-size:13px;white-space:pre-wrap">size,
            structures, special layouts, imgs etc.) </span><span
style="color:rgb(0,0,0);font-family:arial,sans,sans-serif;font-size:13px;white-space:pre-wrap">
            and 2) create a process(system) to find such articles </span></div>
        <div><span
style="color:rgb(0,0,0);font-family:arial,sans,sans-serif;font-size:13px;white-space:pre-wrap"><br>
          </span></div>
        <div>
          <h1 style="margin:0px;padding:8px 0px;border:0px"><span
              style="color:rgb(70,76,92);font-family:'Segoe UI','Segoe
              UI Web Regular','Segoe UI Symbol','Helvetica
              Neue',Helvetica,Arial,sans-serif;font-size:15px;font-weight:normal">Also,
              there is still an open task - </span><font color="#464c5c"
              face="Segoe UI, Segoe UI Web Regular, Segoe UI Symbol,
              Helvetica Neue, Helvetica, Arial, sans-serif"><span
                style="font-size:15px;font-weight:normal"><a
                  moz-do-not-send="true"
                  href="https://phabricator.wikimedia.org/T97151"
                  target="_blank">https://phabricator.wikimedia.org/T97151</a>
                - </span></font><span
              style="font-weight:normal;color:rgb(70,76,92);font-family:'Segoe
              UI','Segoe UI Web Regular','Segoe UI Symbol','Helvetica
              Neue',Helvetica,Arial,sans-serif;font-size:15px">Testing
              Page issues and disambiguation templates(T90250). Going
              through the list of </span><a moz-do-not-send="true"
href="http://en.wikipedia.org/wiki/Category:Wikipedia_articles_with_content_issues"
              class="" target="_blank" rel="noreferrer"
              style="font-size:13px;font-weight:normal;text-decoration:none;color:rgb(24,85,157);font-family:'Segoe
              UI','Segoe UI Web Regular','Segoe UI Symbol','Helvetica
              Neue',Helvetica,Arial,sans-serif;line-height:18.8500003814697px">http://en.wikipedia.org/wiki/Category:Wikipedia_articles_with_content_issues</a></h1>
          <a moz-do-not-send="true"
href="http://en.wikipedia.org/wiki/Wikipedia:Template_messages/General#Disambiguation_and_redirection"
            class="" target="_blank" rel="noreferrer"
            style="text-decoration:none;color:rgb(24,85,157);font-family:'Segoe
            UI','Segoe UI Web Regular','Segoe UI Symbol','Helvetica
Neue',Helvetica,Arial,sans-serif;font-size:13px;line-height:18.8500003814697px">http://en.wikipedia.org/wiki/Wikipedia:Template_messages/General#Disambiguation_and_redirection</a> should
          help to catch some issues.</div>
        <div><br>
        </div>
        <div>thanks</div>
        <div>Elena</div>
        <div class="gmail_extra"><br>
          <div class="gmail_quote">On Tue, May 19, 2015 at 9:23 PM,
            Brian Gerstle <span dir="ltr"><<a moz-do-not-send="true"
                href="mailto:bgerstle@wikimedia.org" target="_blank">bgerstle@wikimedia.org</a>></span>
            wrote:<br>
            <blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
              <div dir="ltr">+search</div>
              <div>
                <div>
                  <div class="gmail_extra"><br>
                    <div class="gmail_quote">On Tue, May 19, 2015 at
                      3:14 PM, Brian Gerstle <span dir="ltr"><<a
                          moz-do-not-send="true"
                          href="mailto:bgerstle@wikimedia.org"
                          target="_blank">bgerstle@wikimedia.org</a>></span>
                      wrote:<br>
                      <blockquote class="gmail_quote" style="margin:0px
                        0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
                        <div dir="ltr">The subject hints at a question
                          that's been nagging me for a while, and now
                          that I'm going to be hacking on testing in
                          Lyon I wanted to ask:
                          <div><br>
                          </div>
                          <div>Do we have a list of articles we usually
                            run tests against?</div>
                          <div><br>
                          </div>
                          <div>If not, do we have any processes for
                            curating such a list?  Would anyone be
                            interested in a brainstorming session at
                            Lyon to discuss this further?</div>
                          <div><br>
                          </div>
                          <div>Basically, as a developer, I would love
                            to have more confidence that some code I
                            wrote doesn't break on our most popular
                            articles.  Or, if we can get more
                            sophisticated, that <b>certain properties
                              of my code hold true for certain kinds of
                              generated pages</b>.*</div>
                          <div><br>
                          </div>
                          <div>Please respond with your thoughts and
                            whether you think I should create a phab
                            task for the hackathon about this.  In
                            either case, ping me anytime or grab me at
                            Lyon to discuss further!</div>
                          <div><br>
                          </div>
                          <div>Regards,</div>
                          <div><br>
                          </div>
                          <div>Brian</div>
                          <div><br>
                          </div>
                          <div>* Yes, I'm talking about using
                            property-based testing generators to create
                            random, shrinkable MW pages that we can run
                            tests on. Not sure if it's practical, but
                            could be an interesting experiment.</div>
                          <span><font color="#888888">
                              <div>
                                <div><br>
                                </div>
                                -- <br>
                                <div>
                                  <div dir="ltr">
                                    <div>
                                      <div dir="ltr">EN Wikipedia user
                                        page: <a moz-do-not-send="true"
href="https://en.wikipedia.org/wiki/User:Brian.gerstle" target="_blank">https://en.wikipedia.org/wiki/User:Brian.gerstle</a><br>
                                        IRC: bgerstle</div>
                                    </div>
                                  </div>
                                </div>
                              </div>
                            </font></span></div>
                      </blockquote>
                    </div>
                    <br>
                    <br clear="all">
                    <div><br>
                    </div>
                    -- <br>
                    <div>
                      <div dir="ltr">
                        <div>
                          <div dir="ltr">EN Wikipedia user page: <a
                              moz-do-not-send="true"
                              href="https://en.wikipedia.org/wiki/User:Brian.gerstle"
                              target="_blank">https://en.wikipedia.org/wiki/User:Brian.gerstle</a><br>
                            IRC: bgerstle</div>
                        </div>
                      </div>
                    </div>
                  </div>
                </div>
              </div>
              <br>
              _______________________________________________<br>
              reading-wmf mailing list<br>
              <a moz-do-not-send="true"
                href="mailto:reading-wmf@lists.wikimedia.org"
                target="_blank">reading-wmf@lists.wikimedia.org</a><br>
              <a moz-do-not-send="true"
                href="https://lists.wikimedia.org/mailman/listinfo/reading-wmf"
                target="_blank">https://lists.wikimedia.org/mailman/listinfo/reading-wmf</a><br>
              <br>
            </blockquote>
          </div>
          <br>
        </div>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
QA mailing list
<a class="moz-txt-link-abbreviated" href="mailto:QA@lists.wikimedia.org">QA@lists.wikimedia.org</a>
<a class="moz-txt-link-freetext" href="https://lists.wikimedia.org/mailman/listinfo/qa">https://lists.wikimedia.org/mailman/listinfo/qa</a>
</pre>
    </blockquote>
    <br>
  </body>
</html>