On 31 January 2013 03:55, Matthew Flaschen <mflaschen@wikimedia.org> wrote:
On 01/30/2013 06:43 PM, Oliver Keyes wrote:
> So: attached, data - everyone with >5 actions in the recentchanges
> table. Now, the result-set is only ~7,000 entries long, which I'm
> /preeetty/ sure is unreliable somehow, but I've applied a decade and a
> half of collected comp sci studies and around 3 decades of practical
> experience to the problem and they've all gone 'er. no idea. It should
> work'. If anyone else can spot what's going wrong, most appreciated :).

I asked Ryan Faulkner to take a look, and he did indeed get higher
numbers of users:

User ids that had at least 5 edits in the last 30 days

select count(*) from (select rc_user, count(*) as revs from
enwiki.recentchanges where rc_timestamp >= '20130101000000' and
rc_timestamp < '20130131000000' group by 1 having revs >= 5) as t

He said it was 32511 for ns0 (main namespace).

He didn't try to check the skin info in the query, so far.

Tried a LEFT OUTER JOIN and it produced all of 5k more results :/. I'll look at it with fresh eyes in the morning, unless anyone wants to get there first (sorry for taking so much of your time with what should be a pretty simple issue)
 
I think the issue is that there is no row in user_properties if they did
not change their skin.  From
https://www.mediawiki.org/wiki/Manual:User_properties_table:

"Only non-default settings are stored, so changes to the defaults are
now reflected for everybody that hasn't saved an alternative preference,
not only new accounts."

So the query seems to miss people who just always left the default skin.
 If I'm understanding this correctly, it has to be an outer join,
defaulting to vector (the default on enwiki).

Matt Flaschen

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics



--
Oliver Keyes
Community Liaison, Product Development
Wikimedia Foundation