Hi all,
after rolling out MediaViewer to the English and German Wikipedias, we have gotten quite a few complaints; to understand how representative they are, I have looked at the number of users who have opted out (there is a user preference for that; it is linked from the MediaViewer interface, although one of the recurring complaints is that it is still not trivial to find). I would appreciate opinions on whether this is a good approach and whether I did it the right way.
The queries I have run look like this:
select up_value, count(*) from user left join user_properties on user_id = up_user and up_property = 'multimediaviewer-enable' where user_touched > '20140604000000' and user_editcount > 10000 group by up_value;
for various edit count limits (the timestamp is the time of deployment on enwiki plus a few hours).
Hmm... It's hard to evaluate your strategy without more context. Why are you limiting your query to users with more than 10k lifetime edits? Are your trying to generate a proportion of a subset of users? If so, what's the denominator?
Also, opt-out rates tend to be low no matter how obvious and desired they are. If the goal of this analysis is to find out if opt-out rates are high (or low), then I'd recommend comparing them with opt-out rates for another feature.
-Aaron
On Mon, Jun 9, 2014 at 2:52 AM, Gergo Tisza gtisza@wikimedia.org wrote:
Hi all,
after rolling out MediaViewer to the English and German Wikipedias, we have gotten quite a few complaints; to understand how representative they are, I have looked at the number of users who have opted out (there is a user preference for that; it is linked from the MediaViewer interface, although one of the recurring complaints is that it is still not trivial to find). I would appreciate opinions on whether this is a good approach and whether I did it the right way.
The queries I have run look like this:
select up_value, count(*) from user left join user_properties on user_id = up_user and up_property = 'multimediaviewer-enable' where user_touched > '20140604000000' and user_editcount > 10000 group by up_value;
for various edit count limits (the timestamp is the time of deployment on enwiki plus a few hours).
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
On Mon, Jun 9, 2014 at 11:20 AM, Aaron Halfaker ahalfaker@wikimedia.org wrote:
Hmm... It's hard to evaluate your strategy without more context. Why are you limiting your query to users with more than 10k lifetime edits? Are your trying to generate a proportion of a subset of users? If so, what's the denominator?
I ran the query with various editcount limits (0, 1K, 10K, 100K, 1M); the goal was to see whether power users are affected differently. (They are, users with 100K+ edits have the highest opt-out rates.) I am basically taking the ratio of ( <users with more than X total edits who have been active since the rollout and disabled MediaViewer> / <users with more than X total edits who have been active since the rollout> ) which in this case should be ( <count(*) for up_value = ' ' group> / ( <count(*) for up_value = ' ' group> + <count(*) for up_value is NULL ' group>) )
Also, opt-out rates tend to be low no matter how obvious and desired they are. If the goal of this analysis is to find out if opt-out rates are high (or low), then I'd recommend comparing them with opt-out rates for another feature.
Any tip for what that feature should be? Are there features for which opt-out rates have been recorded a few days after they have been deployed? (I imagine one problem is that opt-out happens slowly - although in the past two days our opt-out count grew about 20%, but the ratio to active users has been remarkably stable.)
On Mon, Jun 9, 2014 at 11:20 AM, Aaron Halfaker ahalfaker@wikimedia.org wrote:
Also, opt-out rates tend to be low no matter how obvious and desired they are. If the goal of this analysis is to find out if opt-out rates are high (or low), then I'd recommend comparing them with opt-out rates for another feature.
One thing I did was to compare opt-out rates with other wikis where we have received fewer complaints (fr, es), and enwiki optouts seem to be in the same range. Do you think that is a useful indicator, or comparing optout rates for wikis with a different userbase size is not particularly useful?
Re user counts; we have, I think, 1 editor who has 1M+ edits. I imagine we don't have many with 100K edits. How big are those user groups? It's useful to know that power users are more likely to opt out, great, but if you only have 30 users in your definition of 'power users' it's going to be thrown off very easily.
My big worry would be that finding this out only tells you that either (1) only power users have a problem or (2) only power users can find the off-switch. Comparing with other features that also feature an off-switch would allow you to eliminate this as an independent variable.
On 9 June 2014 11:55, Gergo Tisza gtisza@wikimedia.org wrote:
On Mon, Jun 9, 2014 at 11:20 AM, Aaron Halfaker ahalfaker@wikimedia.org wrote:
Also, opt-out rates tend to be low no matter how obvious and desired they are. If the goal of this analysis is to find out if opt-out rates are high (or low), then I'd recommend comparing them with opt-out rates for another feature.
One thing I did was to compare opt-out rates with other wikis where we have received fewer complaints (fr, es), and enwiki optouts seem to be in the same range. Do you think that is a useful indicator, or comparing optout rates for wikis with a different userbase size is not particularly useful?
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Hmm, I thought that was a really interesting question (How our user base breaks down into orders of magnitude of edits)
We have 15 such users (including bots) with over a million edits:
MariaDB [enwiki_p]> select count(*), floor( log10( user_editcount ) ) from user group by floor( log10( user_editcount ) ); +----------+----------------------------------+ | count(*) | floor( log10( user_editcount ) ) | +----------+----------------------------------+ | 14392731 | NULL | | 5843417 | 0 | | 1132181 | 1 | | 157056 | 2 | | 29970 | 3 | | 6400 | 4 | | 401 | 5 | | 15 | 6 | +----------+----------------------------------+ 8 rows in set (12.14 sec)
If you limit it to non-bots, that number goes down to 2 (Koavf and Waacstats):
MariaDB [enwiki_p]> select count(*), floor( log10( user_editcount ) ) from user left outer join user_groups on ug_user = user_id and ug_group = 'bot' where ug_user is null group by floor( log10( user_editcount ) ); +----------+----------------------------------+ | count(*) | floor( log10( user_editcount ) ) | +----------+----------------------------------+ | 14392718 | NULL | | 5843411 | 0 | | 1132140 | 1 | | 156951 | 2 | | 29769 | 3 | | 6160 | 4 | | 290 | 5 | | 2 | 6 | +----------+----------------------------------+ 8 rows in set (1 min 5.53 sec)
--bawolff
On 6/9/14, Oliver Keyes okeyes@wikimedia.org wrote:
Re user counts; we have, I think, 1 editor who has 1M+ edits. I imagine we don't have many with 100K edits. How big are those user groups? It's useful to know that power users are more likely to opt out, great, but if you only have 30 users in your definition of 'power users' it's going to be thrown off very easily.
My big worry would be that finding this out only tells you that either (1) only power users have a problem or (2) only power users can find the off-switch. Comparing with other features that also feature an off-switch would allow you to eliminate this as an independent variable.
On 9 June 2014 11:55, Gergo Tisza gtisza@wikimedia.org wrote:
On Mon, Jun 9, 2014 at 11:20 AM, Aaron Halfaker ahalfaker@wikimedia.org wrote:
Also, opt-out rates tend to be low no matter how obvious and desired they are. If the goal of this analysis is to find out if opt-out rates are high (or low), then I'd recommend comparing them with opt-out rates for another feature.
One thing I did was to compare opt-out rates with other wikis where we have received fewer complaints (fr, es), and enwiki optouts seem to be in the same range. Do you think that is a useful indicator, or comparing optout rates for wikis with a different userbase size is not particularly useful?
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
multimedia@lists.wikimedia.org