Preliminary WikiGrok response quality in stable

List overview All Threads
Download

newer

older

Declarative DOM event delegation...

experimental error reporting...

Maryana Pinchuk

5 Jan 2015 5 Jan '15

3:48 p.m.

If you're like me, you've probably been breathlessly awaiting the results of the first WikiGrok stable A/B test to see if the responses we're getting are good, bad, or ugly :) Well, good news! I did some hand-coding of the results (a sample of about 300 responses from the ~1,200 we got during the test) and have some interesting preliminary findings to share. Caveat: this is not science, just a quick check of WikiGrok's pulse. Leila from Analytics is helping us analyze this and other WikiGrok test data and will have a more thorough write-up of the results soon :)

As a reminder, this test ran for a week in December in stable for logged in users only on English Wikipedia. We tested two versions of the UX (a simple "yes/no/maybe" interface and a slightly more complex tagging one), and we asked questions about biographies (actors and writers) and music albums (live or studio albums). The responses were not yet sent to Wikidata; the infrastructure to do that is currently in development.

* The tl;dr is that the quality of the responses is pretty high! *The overall rate of correct responses for the sample I looked at was* *80%*.

* Also, *users with no edits and users with 1 or more edits had similar quality responses* (in fact, the 0 edit count users gave slightly higher quality responses). So even total newbs are capable of grokking :)

* Lastly, while we didn't see any differences in engagement or conversion (the rate at which users started and finished the WikiGrok process) between the two versions, there was a difference in quality – *Version B (tagging) produced a noticeably higher quality response rate (95%)*.

More detailed breakdown of quality below, including by individual answer (fun fact that is sure to make Sam Smith sad: nobody seems to have any clue what a live album is!). Now let's see if these trends hold for logged out users, too :) Our first test for all users (logged in and logged out) is slated for later this month.

** * * *

*User classes*

Users with 0 edits – 85% Users with 1 or more edits – 80%

*Versions*

Version A – 68% Version B – 95%

*Question types*

"Is this person an author?" – 72% "Is this a film actor?" – 90% "Is this a television actor?" – 85% "Is this a live album?" – 50% :( "Is this a studio album?" – 64%

-- Maryana Pinchuk Product Manager, Wikimedia Foundation wikimediafoundation.org

Attachments:

attachment.htm (text/html — 3.0 KB)

Show replies by date

Moiz Syed

5 Jan 5 Jan

3:53 p.m.

New subject: Preliminary WikiGrok response quality in stable

Wow, this is pretty awesome. Looking forward to detailed analysis!

On Mon, Jan 5, 2015 at 10:48 AM, Maryana Pinchuk mpinchuk@wikimedia.org wrote:

...

If you're like me, you've probably been breathlessly awaiting the results of the first WikiGrok stable A/B test to see if the responses we're getting are good, bad, or ugly :) Well, good news! I did some hand-coding of the results (a sample of about 300 responses from the ~1,200 we got during the test) and have some interesting preliminary findings to share. Caveat: this is not science, just a quick check of WikiGrok's pulse. Leila from Analytics is helping us analyze this and other WikiGrok test data and will have a more thorough write-up of the results soon :)

As a reminder, this test ran for a week in December in stable for logged in users only on English Wikipedia. We tested two versions of the UX (a simple "yes/no/maybe" interface and a slightly more complex tagging one), and we asked questions about biographies (actors and writers) and music albums (live or studio albums). The responses were not yet sent to Wikidata; the infrastructure to do that is currently in development.

The tl;dr is that the quality of the responses is pretty high! *The

overall rate of correct responses for the sample I looked at was* *80%*.

Also, *users with no edits and users with 1 or more edits had similar

quality responses* (in fact, the 0 edit count users gave slightly higher quality responses). So even total newbs are capable of grokking :)

Lastly, while we didn't see any differences in engagement or conversion

(the rate at which users started and finished the WikiGrok process) between the two versions, there was a difference in quality – *Version B (tagging) produced a noticeably higher quality response rate (95%)*.

More detailed breakdown of quality below, including by individual answer (fun fact that is sure to make Sam Smith sad: nobody seems to have any clue what a live album is!). Now let's see if these trends hold for logged out users, too :) Our first test for all users (logged in and logged out) is slated for later this month.

*User classes*

Users with 0 edits – 85% Users with 1 or more edits – 80%

*Versions*

Version A – 68% Version B – 95%

*Question types*

"Is this person an author?" – 72% "Is this a film actor?" – 90% "Is this a television actor?" – 85% "Is this a live album?" – 50% :( "Is this a studio album?" – 64%

-- Maryana Pinchuk Product Manager, Wikimedia Foundation wikimediafoundation.org

Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l

Florian Schmidt

4:08 p.m.

New subject: Preliminary WikiGrok response quality in stable

Awesome! Can’t wait for it to be „always-on“ :)

Von: mobile-l-bounces@lists.wikimedia.org [mailto:mobile-l-bounces@lists.wikimedia.org] Im Auftrag von Maryana Pinchuk Gesendet: Montag, 5. Januar 2015 19:48 An: Leila Zia; Dario Taraborelli; mobile-l Betreff: [WikimediaMobile] Preliminary WikiGrok response quality in stable

* The tl;dr is that the quality of the responses is pretty high! The overall rate of correct responses for the sample I looked at was 80%.

* Also, users with no edits and users with 1 or more edits had similar quality responses (in fact, the 0 edit count users gave slightly higher quality responses). So even total newbs are capable of grokking :)

* Lastly, while we didn't see any differences in engagement or conversion (the rate at which users started and finished the WikiGrok process) between the two versions, there was a difference in quality – Version B (tagging) produced a noticeably higher quality response rate (95%).

* * *

User classes

Users with 0 edits – 85% Users with 1 or more edits – 80%

Versions

Version A – 68% Version B – 95%

Question types

"Is this person an author?" – 72% "Is this a film actor?" – 90% "Is this a television actor?" – 85% "Is this a live album?" – 50% :( "Is this a studio album?" – 64%

-- Maryana Pinchuk Product Manager, Wikimedia Foundation wikimediafoundation.org http://wikimediafoundation.org

Joaquin Oltra Hernandez

4:47 p.m.

New subject: Preliminary WikiGrok response quality in stable

Very cool results. Seems like showing the same question to a bunch of users and grabbing the most popular answer as the correct one will work on most of the cases. On Jan 5, 2015 8:08 PM, "Florian Schmidt" < florian.schmidt.welzow@t-online.de> wrote:

...

Awesome! Can’t wait for it to be „always-on“ :)

*Von:* mobile-l-bounces@lists.wikimedia.org [ mailto:mobile-l-bounces@lists.wikimedia.org mobile-l-bounces@lists.wikimedia.org]* Im Auftrag von* Maryana Pinchuk *Gesendet:* Montag, 5. Januar 2015 19:48 *An:* Leila Zia; Dario Taraborelli; mobile-l *Betreff:* [WikimediaMobile] Preliminary WikiGrok response quality in stable

If you're like me, you've probably been breathlessly awaiting the results of the first WikiGrok stable A/B test to see if the responses we're getting are good, bad, or ugly :) Well, good news! I did some hand-coding of the results (a sample of about 300 responses from the ~1,200 we got during the test) and have some interesting preliminary findings to share. Caveat: this is not science, just a quick check of WikiGrok's pulse. Leila from Analytics is helping us analyze this and other WikiGrok test data and will have a more thorough write-up of the results soon :)

As a reminder, this test ran for a week in December in stable for logged in users only on English Wikipedia. We tested two versions of the UX (a simple "yes/no/maybe" interface and a slightly more complex tagging one), and we asked questions about biographies (actors and writers) and music albums (live or studio albums). The responses were not yet sent to Wikidata; the infrastructure to do that is currently in development.

The tl;dr is that the quality of the responses is pretty high!* The

overall rate of correct responses for the sample I looked at was** 80%*.

Also, *users with no edits and users with 1 or more edits had similar

quality responses* (in fact, the 0 edit count users gave slightly higher quality responses). So even total newbs are capable of grokking :)

Lastly, while we didn't see any differences in engagement or conversion

(the rate at which users started and finished the WikiGrok process) between the two versions, there was a difference in quality –* Version B (tagging) produced a noticeably higher quality response rate (95%)*.

More detailed breakdown of quality below, including by individual answer (fun fact that is sure to make Sam Smith sad: nobody seems to have any clue what a live album is!). Now let's see if these trends hold for logged out users, too :) Our first test for all users (logged in and logged out) is slated for later this month.

*User classes*

Users with 0 edits – 85%

Users with 1 or more edits – 80%

*Versions*

Version A – 68%

Version B – 95%

*Question types*

"Is this person an author?" – 72%

"Is this a film actor?" – 90%

"Is this a television actor?" – 85%

"Is this a live album?" – 50% :(

"Is this a studio album?" – 64%

--

Maryana Pinchuk Product Manager, Wikimedia Foundation *wikimediafoundation.org* http://wikimediafoundation.org

Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l

Rob Moen

8:52 p.m.

New subject: Preliminary WikiGrok response quality in stable

So cool. It's always pleasing to see positive results tests like these. Seems like WikiGrok Version B wins round 1.

On Mon, Jan 5, 2015 at 11:47 AM, Joaquin Oltra Hernandez < jhernandez@wikimedia.org> wrote:

...

Very cool results. Seems like showing the same question to a bunch of users and grabbing the most popular answer as the correct one will work on most of the cases. On Jan 5, 2015 8:08 PM, "Florian Schmidt" < florian.schmidt.welzow@t-online.de> wrote:

...
Awesome! Can’t wait for it to be „always-on“ :)

*Von:* mobile-l-bounces@lists.wikimedia.org [ mailto:mobile-l-bounces@lists.wikimedia.org mobile-l-bounces@lists.wikimedia.org]* Im Auftrag von* Maryana Pinchuk *Gesendet:* Montag, 5. Januar 2015 19:48 *An:* Leila Zia; Dario Taraborelli; mobile-l *Betreff:* [WikimediaMobile] Preliminary WikiGrok response quality in stable

If you're like me, you've probably been breathlessly awaiting the results of the first WikiGrok stable A/B test to see if the responses we're getting are good, bad, or ugly :) Well, good news! I did some hand-coding of the results (a sample of about 300 responses from the ~1,200 we got during the test) and have some interesting preliminary findings to share. Caveat: this is not science, just a quick check of WikiGrok's pulse. Leila from Analytics is helping us analyze this and other WikiGrok test data and will have a more thorough write-up of the results soon :)

As a reminder, this test ran for a week in December in stable for logged in users only on English Wikipedia. We tested two versions of the UX (a simple "yes/no/maybe" interface and a slightly more complex tagging one), and we asked questions about biographies (actors and writers) and music albums (live or studio albums). The responses were not yet sent to Wikidata; the infrastructure to do that is currently in development.

The tl;dr is that the quality of the responses is pretty high!* The

overall rate of correct responses for the sample I looked at was** 80%*.

Also, *users with no edits and users with 1 or more edits had similar

quality responses* (in fact, the 0 edit count users gave slightly higher quality responses). So even total newbs are capable of grokking :)

Lastly, while we didn't see any differences in engagement or conversion

(the rate at which users started and finished the WikiGrok process) between the two versions, there was a difference in quality –* Version B (tagging) produced a noticeably higher quality response rate (95%)*.

More detailed breakdown of quality below, including by individual answer (fun fact that is sure to make Sam Smith sad: nobody seems to have any clue what a live album is!). Now let's see if these trends hold for logged out users, too :) Our first test for all users (logged in and logged out) is slated for later this month.

*User classes*

Users with 0 edits – 85%

Users with 1 or more edits – 80%

*Versions*

Version A – 68%

Version B – 95%

*Question types*

"Is this person an author?" – 72%

"Is this a film actor?" – 90%

"Is this a television actor?" – 85%

"Is this a live album?" – 50% :(

"Is this a studio album?" – 64%

--

Maryana Pinchuk Product Manager, Wikimedia Foundation *wikimediafoundation.org* http://wikimediafoundation.org

Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l

Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l

-- Rob Moen Wikimedia Foundation rmoen@wikimedia.org

3644

Age (days ago)

3644

Last active (days ago)

mobile-l@lists.wikimedia.org

4 comments

5 participants

tags (0)

participants (5)

Florian Schmidt
Joaquin Oltra Hernandez
Maryana Pinchuk
Moiz Syed
Rob Moen