Forwarding questions from Research-l with permission, with the hope that these will spark discussion here on Wikimedia-l.
RJensen: "Comments: I have not seen any editor make actual use of the Article Feedback tool -- are there examples? Yes Wikipedians are very proud of their vast half-billion-person audience. However they do not ask "what features are most useful for a high school student or teacher/ a university student/ etc""
Pine: This is a very interesting question. What have been the benefits of AFT5? I have seen complaints about spam and suppressible material being written in AFT5. What benefits has it had?
Thanks, Pine
(cross-posting my reply from wiki-research-l)
The complete reports on WMF research on AFT5 can be found here: http://meta.wikimedia.org/wiki/Research:Article_feedback
The tool is currently deployed on a random 10% sample of English Wikipedia articles so it's not surprising most readers/editors don't see it very often. We are currently collecting about 4K unique feedback messages per day: http://toolserver.org/~dartar/aft5
As for the quality of feedback – as judged by community members and readers – we have some preliminary usage data coming from the FeedbackPage: http://toolserver.org/~dartar/fp/ as well as results based on blind assessment by Wikipedians that we ran during the early stages of AFT5 research (see the "Quality assessment" sections in the research reports above).
We will be publishing shortly an update on FeedbackPage data, but as the feature is not rolled out on the entire project and not many editors or readers know how to find the FeedbackPage (i.e. the only place where comments can be filtered, flagged and moderated), these results should not be taken as conclusive.
A full roll out of AFT5 on the entire English Wikipedia is scheduled for Q4 2012.
Dario
On Sep 6, 2012, at 1:33 PM, ENWP Pine wrote:
Forwarding questions from Research-l with permission, with the hope that these will spark discussion here on Wikimedia-l.
RJensen: "Comments: I have not seen any editor make actual use of the Article Feedback tool -- are there examples? Yes Wikipedians are very proud of their vast half-billion-person audience. However they do not ask "what features are most useful for a high school student or teacher/ a university student/ etc""
Pine: This is a very interesting question. What have been the benefits of AFT5? I have seen complaints about spam and suppressible material being written in AFT5. What benefits has it had?
Thanks, Pine _______________________________________________ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
Dario Taraborelli wrote:
A full roll out of AFT5 on the entire English Wikipedia is scheduled for Q4 2012.
Hi.
Do you have a link to on-wiki consensus for this idea? I was under the impression that the ArticleFeedback tool was developed as an experiment. It needs a discussion and on-wiki consensus before being widely deployed, right? Has that happened?
MZMcBride
MZMcBride wrote:
Dario Taraborelli wrote:
A full roll out of AFT5 on the entire English Wikipedia is scheduled for Q4 2012.
Do you have a link to on-wiki consensus for this idea? I was under the impression that the ArticleFeedback tool was developed as an experiment. It needs a discussion and on-wiki consensus before being widely deployed, right? Has that happened?
I asked these questions in September. As far as I can tell, there was no response to them.
A number of long-time users have concerns about this tool, particularly as it is quite often used to libel subjects of articles.
Before adding to volunteers' workloads (to moderate and respond to these comments), I would think it unquestionably requires the consent of the volunteers, right?
Dario?
MZMcBride
Hey – apologies for the late response (I've just returned from my annual leave).
The AFT team is currently reviewing a number of options to address the moderation workload issue, which is definitely an important one. So to answer your question: you can safely consider the feature as "experimental", further decisions will be deferred until we've tested/discussed different approaches to moderation.
As far as data is concerned, we're going to publish additional analyses/datasets on feedback volume and moderation activity for the current (10%) sample on top of those already available via the dashboards, they'll be announced on the lists and on meta:R:AFT.
Dario
On Nov 10, 2012, at 8:04 AM, MZMcBride z@mzmcbride.com wrote:
MZMcBride wrote:
Dario Taraborelli wrote:
A full roll out of AFT5 on the entire English Wikipedia is scheduled for Q4 2012.
Do you have a link to on-wiki consensus for this idea? I was under the impression that the ArticleFeedback tool was developed as an experiment. It needs a discussion and on-wiki consensus before being widely deployed, right? Has that happened?
I asked these questions in September. As far as I can tell, there was no response to them.
A number of long-time users have concerns about this tool, particularly as it is quite often used to libel subjects of articles.
Before adding to volunteers' workloads (to moderate and respond to these comments), I would think it unquestionably requires the consent of the volunteers, right?
Dario?
MZMcBride
Dario Taraborelli, 06/09/2012 23:47:
The complete reports on WMF research on AFT5 can be found here: http://meta.wikimedia.org/wiki/Research:Article_feedback
The tool is currently deployed on a random 10% sample of English Wikipedia articles so it's not surprising most readers/editors don't see it very often. We are currently collecting about 4K unique feedback messages per day: http://toolserver.org/~dartar/aft5
As for the quality of feedback – as judged by community members and readers – we have some preliminary usage data coming from the FeedbackPage: http://toolserver.org/~dartar/fp/ as well as results based on blind assessment by Wikipedians that we ran during the early stages of AFT5 research (see the "Quality assessment" sections in the research reports above).
Graphs are empty for me there, is it just me?
We will be publishing shortly an update on FeedbackPage data, but as the feature is not rolled out on the entire project and not many editors or readers know how to find the FeedbackPage (i.e. the only place where comments can be filtered, flagged and moderated), these results should not be taken as conclusive.
A full roll out of AFT5 on the entire English Wikipedia is scheduled for Q4 2012.
Nemo
2012/10/12 Federico Leva (Nemo) nemowiki@gmail.com:
Dario Taraborelli, 06/09/2012 23:47:
The complete reports on WMF research on AFT5 can be found here: http://meta.wikimedia.org/wiki/Research:Article_feedback
The tool is currently deployed on a random 10% sample of English Wikipedia articles so it's not surprising most readers/editors don't see it very often. We are currently collecting about 4K unique feedback messages per day: http://toolserver.org/~dartar/aft5
As for the quality of feedback – as judged by community members and readers – we have some preliminary usage data coming from the FeedbackPage: http://toolserver.org/~dartar/fp/ as well as results based on blind assessment by Wikipedians that we ran during the early stages of AFT5 research (see the "Quality assessment" sections in the research reports above).
Graphs are empty for me there, is it just me?
No, for me too. They start being full at "Daily feedback volume (option 1)". And the graphs on the FeedbackPage usage dashboard are completely empty.
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
On Oct 12, 2012, at 2:11, "Federico Leva (Nemo)" nemowiki@gmail.com wrote:
we have some preliminary usage data coming from the FeedbackPage: http://toolserver.org/~dartar/fp/
Graphs are empty for me there, is it just me?
We have a temporary hardware issue affecting the slave DB from which this data is pulled. Ops is on it and I hope to have it back soon.
Dario
Dario Taraborelli, 12/10/2012 15:41:
On Oct 12, 2012, at 2:11, "Federico Leva (Nemo)" nemowiki@gmail.com wrote:
we have some preliminary usage data coming from the FeedbackPage: http://toolserver.org/~dartar/fp/
Graphs are empty for me there, is it just me?
We have a temporary hardware issue affecting the slave DB from which this data is pulled. Ops is on it and I hope to have it back soon.
Thank you for enabling it again. I had read about the blind tests in https://meta.wikimedia.org/wiki/Research:Article_feedback/Quality_assessment before but I see some major changes in the graphs, which are a bit hard to understand. 1) In "Daily moderation actions (percentage)" there's a huge spike of helpful/unhelpful after C (July), did those flags even exist before? Or did helpfulness increase after wider usage according to the finding «the average page receives higher quality feedback than pages picked for their popularity/controversial topic»? (There's no change between 5 and 10 % though.) 2) "Unique daily articles with feedback moderated" shows a spike and then a stabilization, but I don't know what the graphs actually is about. For instance, can feedback be moderated per article ("feedback semi/full protection" or so) or only per item, etc. Do you know if moderation happens on the same articles and if stricter moderation increases helpfulness of feedback also on non-moderated articles?
Nemo
Thank you for enabling it again. I had read about the blind tests in < https://meta.wikimedia.org/wiki/Research:Article_feedback/Quality_assessment... before but I see some major changes in the graphs, which are a bit hard to understand.
- In "Daily moderation actions (percentage)" there's a huge spike of
helpful/unhelpful after C (July), did those flags even exist before? Or did helpfulness increase after wider usage according to the finding «the average page receives higher quality feedback than pages picked for their popularity/controversial topic»? (There's no change between 5 and 10 % though.)
*They did; the spike is most probably caused by a deployment from 0.6 percent of articles to 5 percent of articles, with a resulting "ooh, shiny! Lets take a look" reaction.
2) "Unique daily articles with feedback moderated" shows a spike and then a
stabilization, but I don't know what the graphs actually is about. For instance, can feedback be moderated per article ("feedback semi/full protection" or so) or only per item, etc. Do you know if moderation happens on the same articles and if stricter moderation increases helpfulness of feedback also on non-moderated articles?
*So, I *believe* it means "the number of distinct articles which have had feedback moderated that day", regardless of whether people use the article-specific page or the centralised page, but I'm not sure - some clarification from Dario would be awesome :). Ditto your other questions, particularly on the distribution of articles.
Nemo
______________________________**_________________ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.**org Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/**mailman/listinfo/wikimedia-lhttps://lists.wikimedia.org/mailman/listinfo/wikimedia-l
On Sun, Oct 14, 2012 at 4:33 AM, Oliver Keyes okeyes@wikimedia.org wrote:
Thank you for enabling it again. I had read about the blind tests in < https://meta.wikimedia.org/wiki/Research:Article_feedback/Quality_assessment... before but I see some major changes in the graphs, which are a bit hard to understand.
- In "Daily moderation actions (percentage)" there's a huge spike of
helpful/unhelpful after C (July), did those flags even exist before? Or did helpfulness increase after wider usage according to the finding «the average page receives higher quality feedback than pages picked for their popularity/controversial topic»? (There's no change between 5 and 10 % though.)
*They did; the spike is most probably caused by a deployment from 0.6 percent of articles to 5 percent of articles, with a resulting "ooh, shiny! Lets take a look" reaction.
Indeed; I remember some (internal) announcements around this, which caused me and no doubt others to while away an evening just after deployment clicking helpful/unhelpful :)
Also, not to state the obvious, but 'helpful' feedback in and of itself doesn't mean the article changed for the better; I've marked plenty of feedback 'helpful' without doing anything further about it. Is there any data about rate of change of the articles since AFT was enabled? (probably pretty hard to measure since articles are individually fluid at much different rates, depending on topic, and you'd have to control for the baseline likeliness of random bursts of editing somehow).
-- phoebe
On 14 October 2012 20:19, phoebe ayers phoebe.wiki@gmail.com wrote:
Indeed; I remember some (internal) announcements around this, which caused me and no doubt others to while away an evening just after deployment clicking helpful/unhelpful :)
I didn't spend an entire evening on it, but I can certainly say those announcements prompted me to go and moderate feedback which I then didn't sustain. If lots of people did the same as us, that would certainly give a spike in the graphs.
Also, not to state the obvious, but 'helpful' feedback in and of itself doesn't mean the article changed for the better; I've marked plenty of feedback 'helpful' without doing anything further about it. Is there any data about rate of change of the articles since AFT was enabled? (probably pretty hard to measure since articles are individually fluid at much different rates, depending on topic, and you'd have to control for the baseline likeliness of random bursts of editing somehow).
That is a very important point. The goal of the AFT is not to collect feedback, but to improve articles (either by people acting on the feedback or, perhaps more interestingly, but people giving feedback and then being prompted to edit themselves).
Collecting statistics on the feedback itself is a good first stage in the experimentation process, but it does need to be followed up be statistics on whether the ultimate goal is being achieved or not (based on anecdotal evidence, I suspect it isn't at this point, but it is early days).
I found it mostly useless. Not only could I mark the feedback resolved, which should not be possible for a banned user (!), but the feedback was either gibberish/abuse or unhelpful in the sense of (1) the material requested was already in the article, or a linked article, or (2) the complaint was too unspecific to be actionable. Since I have about 4700 articles watchlisted, I feel this is a representative sample, and the result is only to be expected from "an encyclopedia that anyone can edit". Does this feature justify its cost? No.
----- Original Message ----- From: "phoebe ayers" phoebe.wiki@gmail.com To: "Wikimedia Mailing List" wikimedia-l@lists.wikimedia.org Sent: Sunday, October 14, 2012 8:19 PM Subject: Re: [Wikimedia-l] AFT5: what practical benefits has it had?
On Sun, Oct 14, 2012 at 4:33 AM, Oliver Keyes okeyes@wikimedia.org wrote:
Thank you for enabling it again. I had read about the blind tests in < https://meta.wikimedia.org/wiki/Research:Article_feedback/Quality_assessment... before but I see some major changes in the graphs, which are a bit hard to understand.
- In "Daily moderation actions (percentage)" there's a huge spike of
helpful/unhelpful after C (July), did those flags even exist before? Or did helpfulness increase after wider usage according to the finding «the average page receives higher quality feedback than pages picked for their popularity/controversial topic»? (There's no change between 5 and 10 % though.)
*They did; the spike is most probably caused by a deployment from 0.6 percent of articles to 5 percent of articles, with a resulting "ooh, shiny! Lets take a look" reaction.
Indeed; I remember some (internal) announcements around this, which caused me and no doubt others to while away an evening just after deployment clicking helpful/unhelpful :)
Also, not to state the obvious, but 'helpful' feedback in and of itself doesn't mean the article changed for the better; I've marked plenty of feedback 'helpful' without doing anything further about it. Is there any data about rate of change of the articles since AFT was enabled? (probably pretty hard to measure since articles are individually fluid at much different rates, depending on topic, and you'd have to control for the baseline likeliness of random bursts of editing somehow).
-- phoebe
phoebe ayers, 14/10/2012 21:19:
Also, not to state the obvious, but 'helpful' feedback in and of itself doesn't mean the article changed for the better; I've marked plenty of feedback 'helpful' without doing anything further about it. Is there any data about rate of change of the articles since AFT was enabled? (probably pretty hard to measure since articles are individually fluid at much different rates, depending on topic, and you'd have to control for the baseline likeliness of random bursts of editing somehow).
This was the original aim of AFT, to monitor the Public Policy initiative effects, so it's definitely possible, but I think they're mostly doing research about the tool itself now?
Nemo
Thank you for enabling it again. I had read about the blind tests in https://meta.wikimedia.org/wiki/Research:Article_feedback/Quality_assessment before but I see some major changes in the graphs, which are a bit hard to understand.
- In "Daily moderation actions (percentage)" there's a huge spike of helpful/unhelpful after C (July), did those flags even exist before? Or did helpfulness increase after wider usage according to the finding «the average page receives higher quality feedback than pages picked for their popularity/controversial topic»? (There's no change between 5 and 10 % though.)
*They did; the spike is most probably caused by a deployment from 0.6 percent of articles to 5 percent of articles, with a resulting "ooh, shiny! Lets take a look" reaction.
the spike actually follows the 5% deployment combined with the CentralNotice announcement (see annotation D in the first plot), the latter is almost certainly what caused the spike
- "Unique daily articles with feedback moderated" shows a spike and then a stabilization, but I don't know what the graphs actually is about. For instance, can feedback be moderated per article ("feedback semi/full protection" or so) or only per item, etc. Do you know if moderation happens on the same articles and if stricter moderation increases helpfulness of feedback also on non-moderated articles?
*So, I believe it means "the number of distinct articles which have had feedback moderated that day", regardless of whether people use the article-specific page or the centralised page, but I'm not sure - some clarification from Dario would be awesome :). Ditto your other questions, particularly on the distribution of articles.
what Oliver said, we are not keeping track of the source of moderation activity at the moment but I agree it would be a very important piece of data to analyze. After consulting with Fabrice, I've opened this ticket on bugzilla so we can assess the effort needed to implement this via the logging table: https://bugzilla.wikimedia.org/show_bug.cgi?id=41061
Dario
wikimedia-l@lists.wikimedia.org