Re: [Wikimedia-search] top articles across languages for testing?

List overview All Threads
Download

newer

older

How do people use the search API?

FYI: Phabricator project renamed

Brian Gerstle

19 May 2015 19 May '15

9:23 p.m.

+search

On Tue, May 19, 2015 at 3:14 PM, Brian Gerstle bgerstle@wikimedia.org wrote:

...

The subject hints at a question that's been nagging me for a while, and now that I'm going to be hacking on testing in Lyon I wanted to ask:

Do we have a list of articles we usually run tests against?

If not, do we have any processes for curating such a list? Would anyone be interested in a brainstorming session at Lyon to discuss this further?

Basically, as a developer, I would love to have more confidence that some code I wrote doesn't break on our most popular articles. Or, if we can get more sophisticated, that *certain properties of my code hold true for certain kinds of generated pages*.*

Please respond with your thoughts and whether you think I should create a phab task for the hackathon about this. In either case, ping me anytime or grab me at Lyon to discuss further!

Regards,

Brian

Yes, I'm talking about using property-based testing generators to create

random, shrinkable MW pages that we can run tests on. Not sure if it's practical, but could be an interesting experiment.

-- EN Wikipedia user page: https://en.wikipedia.org/wiki/User:Brian.gerstle IRC: bgerstle

-- EN Wikipedia user page: https://en.wikipedia.org/wiki/User:Brian.gerstle IRC: bgerstle

Attachments:

attachment.htm (text/html — 2.2 KB)

Show replies by date

Brian Gerstle

19 May 19 May

10:25 p.m.

New subject: top articles across languages for testing?

+analytics

On Tue, May 19, 2015 at 3:23 PM, Brian Gerstle bgerstle@wikimedia.org wrote:

...

+search

On Tue, May 19, 2015 at 3:14 PM, Brian Gerstle bgerstle@wikimedia.org wrote:

...
The subject hints at a question that's been nagging me for a while, and now that I'm going to be hacking on testing in Lyon I wanted to ask:

Do we have a list of articles we usually run tests against?

If not, do we have any processes for curating such a list? Would anyone be interested in a brainstorming session at Lyon to discuss this further?

Basically, as a developer, I would love to have more confidence that some code I wrote doesn't break on our most popular articles. Or, if we can get more sophisticated, that *certain properties of my code hold true for certain kinds of generated pages*.*

Please respond with your thoughts and whether you think I should create a phab task for the hackathon about this. In either case, ping me anytime or grab me at Lyon to discuss further!

Regards,

Brian

Yes, I'm talking about using property-based testing generators to

create random, shrinkable MW pages that we can run tests on. Not sure if it's practical, but could be an interesting experiment.

-- EN Wikipedia user page: https://en.wikipedia.org/wiki/User:Brian.gerstle IRC: bgerstle

-- EN Wikipedia user page: https://en.wikipedia.org/wiki/User:Brian.gerstle IRC: bgerstle

-- EN Wikipedia user page: https://en.wikipedia.org/wiki/User:Brian.gerstle IRC: bgerstle

Elena Tonkovidova

20 May 20 May

8:48 a.m.

New subject: [reading-wmf] top articles across languages for testing?

On https://docs.google.com/spreadsheets/d/14Ei-KWYbZcmvT70irx6NGIJCi17tF2o1szXn... there are articles that I usually check when I do regression testing.

One group is a set of articles that used to have some sort of performance/display issues - Barack Obama, Cat, India, Richard Nixon, Europe, English language

Another group of articles - where images or Image Gallery is tested(gif, svg, image map, charts, timeline, large amount of imgs in the Image Gallery)

- *Claude Monet *- extensive Image Gallery(different img sizes) - *List of go games* - many svg images - Lilac chaser, Caridoid escape reaction - animated(gif) images - *The Club(dining club), Image map* - for image map img - *Tel Aviv(Hebrew*) for timeline img template - several specific articles with problems in their lead img

And, yes, it'd be really great if we can 1) define more precisely what articles properties we are interested to test(visiting statistics, size, structures, special layouts, imgs etc.) and 2) create a process(system) to find such articles

Also, there is still an open task - https://phabricator.wikimedia.org/T97151 - Testing Page issues and disambiguation templates(T90250). Going through the list of http://en.wikipedia.org/wiki/Category:Wikipedia_articles_with_content_issues http://en.wikipedia.org/wiki/Wikipedia:Template_messages/General#Disambiguat... should help to catch some issues.

thanks Elena

On Tue, May 19, 2015 at 9:23 PM, Brian Gerstle bgerstle@wikimedia.org wrote:

...

+search

On Tue, May 19, 2015 at 3:14 PM, Brian Gerstle bgerstle@wikimedia.org wrote:

...
The subject hints at a question that's been nagging me for a while, and now that I'm going to be hacking on testing in Lyon I wanted to ask:

Do we have a list of articles we usually run tests against?

If not, do we have any processes for curating such a list? Would anyone be interested in a brainstorming session at Lyon to discuss this further?

Basically, as a developer, I would love to have more confidence that some code I wrote doesn't break on our most popular articles. Or, if we can get more sophisticated, that *certain properties of my code hold true for certain kinds of generated pages*.*

Please respond with your thoughts and whether you think I should create a phab task for the hackathon about this. In either case, ping me anytime or grab me at Lyon to discuss further!

Regards,

Brian

Yes, I'm talking about using property-based testing generators to

create random, shrinkable MW pages that we can run tests on. Not sure if it's practical, but could be an interesting experiment.

-- EN Wikipedia user page: https://en.wikipedia.org/wiki/User:Brian.gerstle IRC: bgerstle

-- EN Wikipedia user page: https://en.wikipedia.org/wiki/User:Brian.gerstle IRC: bgerstle

reading-wmf mailing list reading-wmf@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/reading-wmf

Subramanya Sastry

4:52 p.m.

New subject: [QA] [reading-wmf] top articles across languages for testing?

For Parsoid, we run tests [1] against a set of 160K articles that we randomly picked a couple years back .. about 10K articles from 16 wikis. For Parsoid's purposes, we run roundtrip tests (wikitext -> html -> wikitext) and compare diffs, as well as run trivial edit tests (wikitext -> html -> add comment at end of page -> wikitext) and check how clean our roundtripping is.

This testing has been extremely good at telling us when something is broken vs. when something is good to be deployed. Checking these results is part of our deployment process. We also collect performance statistics in each testing run, however our testing database / database schema is not sufficiently tuned to let us actually track performance regressions well .. so, that data has just sat in the db without being used for anything.

But, we've also been recently talking about: * refresh this to pick a more proportional set of articles from different wikis (more from enwiki, less from others, etc.), but not yet done this. * throw in a different (non-random selection) set of pages that are particularly important (featured articles, etc.). so, we would be interested in any set of articles that is considered important enough to be regularly tested against.

This map-reduce style testing code is somewhat general enough that it could be repurposed for other kinds of testing. For example, we have also repurposed this same rt-testing code for running visual diffs (compare phantomjs renderings of php parser output and parsoid output on the same title) on a set of about 800 enwiki articles (random selection) [2].

This kind of testing is very essential for our deployments and not sure if it is appropriate for other teams .. but sharing just in case.

Subbu.

[1] See http://parsoid-tests.wikimedia.org/topfails and http://parsoid-tests.wikimedia.org/commits .. The main page is http://parsoid-tests.wikimedia.org but this page can sometimes timeout whenever the db is clogged and old test results need clearing out.

[2] http://parsoid-tests.wikimedia.org/visualdiff/ with code @ https://github.com/subbuss/parsoid_visual_diffs

On 05/20/2015 01:48 AM, Elena Tonkovidova wrote:

...

On https://docs.google.com/spreadsheets/d/14Ei-KWYbZcmvT70irx6NGIJCi17tF2o1szXn... there are articles that I usually check when I do regression testing.

One group is a set of articles that used to have some sort of performance/display issues

Barack Obama, Cat, India, Richard Nixon,

Europe, English language

Another group of articles - where images or Image Gallery is tested(gif, svg, image map, charts, timeline, large amount of imgs in the Image Gallery)

*Claude Monet *- extensive Image Gallery(different img sizes)

*List of go games* - many svg images

Lilac chaser, Caridoid escape reaction - animated(gif) images

*The Club(dining club), Image map*- for image map img

*Tel Aviv(Hebrew*) for timeline img template

several specific articles with problems in their lead img

And, yes, it'd be really great if we can 1) define more precisely what articles properties we are interested to test(visiting statistics, size, structures, special layouts, imgs etc.) and 2) create a process(system) to find such articles

Also, there is still an open task - https://phabricator.wikimedia.org/T97151 - Testing Page issues and disambiguation templates(T90250). Going through the list of http://en.wikipedia.org/wiki/Category:Wikipedia_articles_with_content_issues

http://en.wikipedia.org/wiki/Wikipedia:Template_messages/General#Disambiguat... should help to catch some issues.

thanks Elena

On Tue, May 19, 2015 at 9:23 PM, Brian Gerstle <bgerstle@wikimedia.org mailto:bgerstle@wikimedia.org> wrote:
+search

On Tue, May 19, 2015 at 3:14 PM, Brian Gerstle
<bgerstle@wikimedia.org <mailto:bgerstle@wikimedia.org>> wrote:

    The subject hints at a question that's been nagging me for a
    while, and now that I'm going to be hacking on testing in Lyon
    I wanted to ask:

    Do we have a list of articles we usually run tests against?

    If not, do we have any processes for curating such a list? 
    Would anyone be interested in a brainstorming session at Lyon
    to discuss this further?

    Basically, as a developer, I would love to have more
    confidence that some code I wrote doesn't break on our most
    popular articles.  Or, if we can get more sophisticated, that
    *certain properties of my code hold true for certain kinds of
    generated pages*.*

    Please respond with your thoughts and whether you think I
    should create a phab task for the hackathon about this.  In
    either case, ping me anytime or grab me at Lyon to discuss
    further!

    Regards,

    Brian

    * Yes, I'm talking about using property-based testing
    generators to create random, shrinkable MW pages that we can
    run tests on. Not sure if it's practical, but could be an
    interesting experiment.

    -- 
    EN Wikipedia user page:
    https://en.wikipedia.org/wiki/User:Brian.gerstle
    IRC: bgerstle




-- 
EN Wikipedia user page:
https://en.wikipedia.org/wiki/User:Brian.gerstle
IRC: bgerstle

_______________________________________________
reading-wmf mailing list
reading-wmf@lists.wikimedia.org
<mailto:reading-wmf@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/reading-wmf
QA mailing list QA@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/qa

Brian Gerstle

5:32 p.m.

New subject: [reading-wmf] [QA] top articles across languages for testing?

Subbu that's exactly the kind of thing I have in mind, and further reinforces the notions I have that using an API (server- or client-side) built on top of Parsoid could be a huge step forward in reliability. If you're testing that the API works & DOM spec is correct across that many articles, the things we build on top should "Just Work (TM)." I'll be sure to look into this in my hacking, thanks for replying!

On Wed, May 20, 2015 at 10:52 AM, Subramanya Sastry ssastry@wikimedia.org wrote:

...

For Parsoid, we run tests [1] against a set of 160K articles that we randomly picked a couple years back .. about 10K articles from 16 wikis. For Parsoid's purposes, we run roundtrip tests (wikitext -> html -> wikitext) and compare diffs, as well as run trivial edit tests (wikitext -> html -> add comment at end of page -> wikitext) and check how clean our roundtripping is.

This testing has been extremely good at telling us when something is broken vs. when something is good to be deployed. Checking these results is part of our deployment process. We also collect performance statistics in each testing run, however our testing database / database schema is not sufficiently tuned to let us actually track performance regressions well .. so, that data has just sat in the db without being used for anything.

But, we've also been recently talking about:

refresh this to pick a more proportional set of articles from different

wikis (more from enwiki, less from others, etc.), but not yet done this.

throw in a different (non-random selection) set of pages that are

particularly important (featured articles, etc.). so, we would be interested in any set of articles that is considered important enough to be regularly tested against.

This map-reduce style testing code is somewhat general enough that it could be repurposed for other kinds of testing. For example, we have also repurposed this same rt-testing code for running visual diffs (compare phantomjs renderings of php parser output and parsoid output on the same title) on a set of about 800 enwiki articles (random selection) [2].

This kind of testing is very essential for our deployments and not sure if it is appropriate for other teams .. but sharing just in case.

Subbu.

[1] See http://parsoid-tests.wikimedia.org/topfails and http://parsoid-tests.wikimedia.org/commits .. The main page is http://parsoid-tests.wikimedia.org but this page can sometimes timeout whenever the db is clogged and old test results need clearing out.

[2] http://parsoid-tests.wikimedia.org/visualdiff/ with code @ https://github.com/subbuss/parsoid_visual_diffs

On 05/20/2015 01:48 AM, Elena Tonkovidova wrote:

On https://docs.google.com/spreadsheets/d/14Ei-KWYbZcmvT70irx6NGIJCi17tF2o1szXn... there are articles that I usually check when I do regression testing.

One group is a set of articles that used to have some sort of performance/display issues

Barack Obama, Cat, India, Richard Nixon,

Europe, English language

Another group of articles - where images or Image Gallery is tested(gif, svg, image map, charts, timeline, large amount of imgs in the Image Gallery)

*Claude Monet *- extensive Image Gallery(different img sizes)

*List of go games* - many svg images

Lilac chaser, Caridoid escape reaction - animated(gif) images

*The Club(dining club), Image map* - for image map img

*Tel Aviv(Hebrew*) for timeline img template

several specific articles with problems in their lead img

And, yes, it'd be really great if we can 1) define more precisely what articles properties we are interested to test(visiting statistics, size, structures, special layouts, imgs etc.) and 2) create a process(system) to find such articles

Also, there is still an open task - https://phabricator.wikimedia.org/T97151 - Testing Page issues and disambiguation templates(T90250). Going through the list of http://en.wikipedia.org/wiki/Category:Wikipedia_articles_with_content_issues http://en.wikipedia.org/wiki/Wikipedia:Template_messages/General#Disambiguat... should help to catch some issues.

thanks Elena

On Tue, May 19, 2015 at 9:23 PM, Brian Gerstle bgerstle@wikimedia.org wrote:

...
+search

On Tue, May 19, 2015 at 3:14 PM, Brian Gerstle bgerstle@wikimedia.org wrote:

...
The subject hints at a question that's been nagging me for a while, and now that I'm going to be hacking on testing in Lyon I wanted to ask:

Do we have a list of articles we usually run tests against?

If not, do we have any processes for curating such a list? Would anyone be interested in a brainstorming session at Lyon to discuss this further?

Basically, as a developer, I would love to have more confidence that some code I wrote doesn't break on our most popular articles. Or, if we can get more sophisticated, that *certain properties of my code hold true for certain kinds of generated pages*.*

Please respond with your thoughts and whether you think I should create a phab task for the hackathon about this. In either case, ping me anytime or grab me at Lyon to discuss further!

Regards,

Brian

Yes, I'm talking about using property-based testing generators to

create random, shrinkable MW pages that we can run tests on. Not sure if it's practical, but could be an interesting experiment.

-- EN Wikipedia user page: https://en.wikipedia.org/wiki/User:Brian.gerstle IRC: bgerstle

-- EN Wikipedia user page: https://en.wikipedia.org/wiki/User:Brian.gerstle IRC: bgerstle

reading-wmf mailing list reading-wmf@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/reading-wmf

QA mailing listQA@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/qa

reading-wmf mailing list reading-wmf@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/reading-wmf

-- EN Wikipedia user page: https://en.wikipedia.org/wiki/User:Brian.gerstle IRC: bgerstle

3477

Age (days ago)

3478

Last active (days ago)

wikimedia-search@lists.wikimedia.org

4 comments

3 participants

tags (0)

participants (3)

Brian Gerstle
Elena Tonkovidova
Subramanya Sastry