That overreaches :/. So, for example, it pulls out prefs for http://en.wikipedia.org/wiki/Special:Contributions/Kpjas - who's made 2-3 contributions at most in that timeframe. This is going to bug me, I can tell.

On 31 January 2013 23:11, Dan Andreescu <dandreescu@wikimedia.org> wrote:
Hey guys,

I took a look and I'm not familiar with the schema, but I believe you need this clause as part of the LEFT JOIN instead of the WHERE clause:

user_properties.up_property = 'skin'

Because otherwise it would eliminate anyone who doesn't have a 'skin' property.  If I understand correctly, those people would default to 'vector'.  I've attached what I think is the right query, though I'd love a mirror to test these types of things on.

Hope that helps

Dan


On Thu, Jan 31, 2013 at 11:46 AM, Oliver Keyes <okeyes@wikimedia.org> wrote:
Alright, having screwed with it I've got it up to...21k (and I replicated Ryan's work, with a couple of tweaks to explicitly exclude log actions and namespace !0 - same results. So something is hooky here :/). At this stage I'm genuinely not sure what could be going wrong; working attached if anyone wants to point out my blunder.


On 31 January 2013 06:23, Oliver Keyes <okeyes@wikimedia.org> wrote:


On 31 January 2013 03:55, Matthew Flaschen <mflaschen@wikimedia.org> wrote:
On 01/30/2013 06:43 PM, Oliver Keyes wrote:
> So: attached, data - everyone with >5 actions in the recentchanges
> table. Now, the result-set is only ~7,000 entries long, which I'm
> /preeetty/ sure is unreliable somehow, but I've applied a decade and a
> half of collected comp sci studies and around 3 decades of practical
> experience to the problem and they've all gone 'er. no idea. It should
> work'. If anyone else can spot what's going wrong, most appreciated :).

I asked Ryan Faulkner to take a look, and he did indeed get higher
numbers of users:

User ids that had at least 5 edits in the last 30 days

select count(*) from (select rc_user, count(*) as revs from
enwiki.recentchanges where rc_timestamp >= '20130101000000' and
rc_timestamp < '20130131000000' group by 1 having revs >= 5) as t

He said it was 32511 for ns0 (main namespace).

He didn't try to check the skin info in the query, so far.

Tried a LEFT OUTER JOIN and it produced all of 5k more results :/. I'll look at it with fresh eyes in the morning, unless anyone wants to get there first (sorry for taking so much of your time with what should be a pretty simple issue)
 
I think the issue is that there is no row in user_properties if they did
not change their skin.  From
https://www.mediawiki.org/wiki/Manual:User_properties_table:

"Only non-default settings are stored, so changes to the defaults are
now reflected for everybody that hasn't saved an alternative preference,
not only new accounts."

So the query seems to miss people who just always left the default skin.
 If I'm understanding this correctly, it has to be an outer join,
defaulting to vector (the default on enwiki).

Matt Flaschen

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics



--
Oliver Keyes
Community Liaison, Product Development
Wikimedia Foundation



--
Oliver Keyes
Community Liaison, Product Development
Wikimedia Foundation

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics



_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics




--
Oliver Keyes
Community Liaison, Product Development
Wikimedia Foundation