On 01/30/2013 06:43 PM, Oliver Keyes wrote:
So: attached, data - everyone with >5 actions in
the recentchanges
table. Now, the result-set is only ~7,000 entries long, which I'm
/preeetty/ sure is unreliable somehow, but I've applied a decade and a
half of collected comp sci studies and around 3 decades of practical
experience to the problem and they've all gone 'er. no idea. It should
work'. If anyone else can spot what's going wrong, most appreciated :).
I asked Ryan Faulkner to take a look, and he did indeed get higher
numbers of users:
User ids that had at least 5 edits in the last 30 days
select count(*) from (select rc_user, count(*) as revs from
enwiki.recentchanges where rc_timestamp >= '20130101000000' and
rc_timestamp < '20130131000000' group by 1 having revs >= 5) as t
He said it was 32511 for ns0 (main namespace).
He didn't try to check the skin info in the query, so far.
I think the issue is that there is no row in user_properties if they did
not change their skin. From
https://www.mediawiki.org/wiki/Manual:User_properties_table:
"Only non-default settings are stored, so changes to the defaults are
now reflected for everybody that hasn't saved an alternative preference,
not only new accounts."
So the query seems to miss people who just always left the default skin.
If I'm understanding this correctly, it has to be an outer join,
defaulting to vector (the default on enwiki).
Matt Flaschen