On 31 January 2013 03:55, Matthew Flaschen mflaschen@wikimedia.org wrote:
On 01/30/2013 06:43 PM, Oliver Keyes wrote:
So: attached, data - everyone with >5 actions in the recentchanges table. Now, the result-set is only ~7,000 entries long, which I'm /preeetty/ sure is unreliable somehow, but I've applied a decade and a half of collected comp sci studies and around 3 decades of practical experience to the problem and they've all gone 'er. no idea. It should work'. If anyone else can spot what's going wrong, most appreciated :).
I asked Ryan Faulkner to take a look, and he did indeed get higher numbers of users:
User ids that had at least 5 edits in the last 30 days
select count(*) from (select rc_user, count(*) as revs from enwiki.recentchanges where rc_timestamp >= '20130101000000' and rc_timestamp < '20130131000000' group by 1 having revs >= 5) as t
He said it was 32511 for ns0 (main namespace).
He didn't try to check the skin info in the query, so far.
Tried a LEFT OUTER JOIN and it produced all of 5k more results :/. I'll
look at it with fresh eyes in the morning, unless anyone wants to get there first (sorry for taking so much of your time with what *should* be a pretty simple issue)
I think the issue is that there is no row in user_properties if they did not change their skin. From https://www.mediawiki.org/wiki/Manual:User_properties_table:
"Only non-default settings are stored, so changes to the defaults are now reflected for everybody that hasn't saved an alternative preference, not only new accounts."
So the query seems to miss people who just always left the default skin. If I'm understanding this correctly, it has to be an outer join, defaulting to vector (the default on enwiki).
Matt Flaschen
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics