On Sat, Mar 22, 2008 at 5:56 AM, Lars Aronsson lars@aronsson.se wrote:
According to [[sv:Special:Statistics]] there are 58,087 user accounts, but <contributor><username> has 28,416 distinct values. Is it realistic that half of all registered usernames have never contributed a single edit (to non-deleted pages)?
Yes, this is very common on websites. People sign up and then never use the account for some reason. Half is a figure I'd expect. On enwiki,
mysql> SELECT COUNT(*) FROM user WHERE user_editcount=0; +----------+ | COUNT(*) | +----------+ | 4424031 | +----------+ 1 row in set (6 min 14.05 sec)
versus
mysql> SELECT ss_users FROM site_stats; +----------+ | ss_users | +----------+ | 6721545 | +----------+ 1 row in set (0.11 sec)
an even worse ratio. I just now notice that you actually used svwiki, so here are the same queries for that.
mysql> SELECT COUNT(*) FROM user WHERE user_editcount=0; +----------+ | COUNT(*) | +----------+ | 26838 | +----------+ 1 row in set (3.60 sec)
mysql> SELECT ss_users FROM site_stats; +----------+ | ss_users | +----------+ | 58125 | +----------+ 1 row in set (0.01 sec)
Can we find out what happened to them? Did they write spam that was deleted and the username permanently blocked? Did they just register their name to stop others from doing so? Or did something go wrong during the registration?
I expect most weren't really sure what they were doing, and thought they'd edit, only to find out they couldn't or didn't want to; or they registered in case they wanted to edit later, but then forgot the account password; or something in that vein. Some percentage will have been blocked for WP:USERNAME violations, of course, but I don't think it's going to be very high, since I've seen identical things on many Internet forums. In those cases you basically never have people patrolling new usernames (and for objectionable names, a forced name change is more common than a block), or any very high level of spammers. On forums you might have a failed e-mail confirmation, but that's not going to matter on Wikimedia. When registering, you get immediately logged in, right? So typing a password and then forgetting it five minutes later isn't going to be a problem?
Of those who did contribute something, of course most usernames only made very few contributions. This is a long tail. So how do we separate the regular/serious/active contributors from the occassional ones? In [[m:board elections]] to the WMF, a limit of 400 edits is used, and this threshold is as good as any.
That's okay for established contributors. A probably more interesting general-purpose statistic is the number of currently active contributors, namely the number who have made edits in the past week, two weeks, month, or whatever.
On Sat, Mar 22, 2008 at 6:52 PM, Alex mrzmanwiki@gmail.com wrote:
Some sort of statistic that gives the number of active accounts would be ideal, say any account that has made an edit in the past week. Not sure how computationally expensive that would be though. For a large site like enwiki, it would probably have to be cached and updated on a regular basis.
Caching it is somewhat tricky, since you have to be able to decrement it when any revision hits the one-week mark, *but* only if no intervening edit was made by the same user. That makes maintenance in O(dN/dT) time (with retrieval in O(1) time) not quite so simple as with most counters. Scanning a bunch of recentchanges rows every hour or every day and caching that might be okay, although it's not quite as nice as most counters (needs to be recomputed, can't be updated in real time).