When I grep for "<contributor>" or "<revision>" in svwiki-20080310-pages-meta-history.xml I find 5,822,491 occurrences. But [[sv:Special:Statistics]] says there have been 6,246,812 edits. What are the 424,321 edits in between? Deleted pages?
According to [[sv:Special:Statistics]] there are 58,087 user accounts, but <contributor><username> has 28,416 distinct values. Is it realistic that half of all registered usernames have never contributed a single edit (to non-deleted pages)? Can we find out what happened to them? Did they write spam that was deleted and the username permanently blocked? Did they just register their name to stop others from doing so? Or did something go wrong during the registration?
Of those who did contribute something, of course most usernames only made very few contributions. This is a long tail. So how do we separate the regular/serious/active contributors from the occassional ones? In [[m:board elections]] to the WMF, a limit of 400 edits is used, and this threshold is as good as any.
In <contributor><username> of the sv.wp dump there are 900 names (and 104 addresses in <contributor><ip>) that have contributed 400 revisions or more (to non-deleted pages). Of these 900, some 80 have names containing "bot" and some are sock puppets, but I guess that 800 could be eligible to vote. There are 81 admins on sv.wp. Is one admin per ten eligible voter volunteers a "normal" quotient? It also means we have one eligible voter per 12,500 speakers of the Swedish language (800 out of 10 million).
I think 800 is the number of volunteers that should be mentioned rather than the 58,087 mostly inactive usernames.
On 22/03/2008, Lars Aronsson lars@aronsson.se wrote:
According to [[sv:Special:Statistics]] there are 58,087 user accounts, but <contributor><username> has 28,416 distinct values. Is it realistic that half of all registered usernames have never contributed a single edit (to non-deleted pages)? Can we find out what happened to them? Did they write spam that was deleted and the username permanently blocked? Did they just register their name to stop others from doing so? Or did something go wrong during the registration? Of those who did contribute something, of course most usernames only made very few contributions. This is a long tail. So how do we separate the regular/serious/active contributors from the occassional ones? In [[m:board elections]] to the WMF, a limit of 400 edits is used, and this threshold is as good as any. I think 800 is the number of volunteers that should be mentioned rather than the 58,087 mostly inactive usernames.
*Yes please*. Or something like it. The bogus number of users on Special:Statistics is widely quoted in the press even though almost all of them are spammer, vandal or troll accounts. "xx users with over 400 edits" would be ideal. Or over some number. Something meaningful.
- d.
On 3/22/08, David Gerard dgerard@gmail.com wrote:
*Yes please*. Or something like it. The bogus number of users on Special:Statistics is widely quoted in the press even though almost all of them are spammer, vandal or troll accounts. "xx users with over 400 edits" would be ideal. Or over some number. Something meaningful.
Don't the Perl-generated site-wide statistics include all-time active users? Maybe the press could be pointed to those numbers? (Also, they could exercise some common sense with their usage of statistics).
On 22/03/2008, Ævar Arnfjörð Bjarmason avarab@gmail.com wrote:
On 3/22/08, David Gerard dgerard@gmail.com wrote:
*Yes please*. Or something like it. The bogus number of users on Special:Statistics is widely quoted in the press even though almost all of them are spammer, vandal or troll accounts. "xx users with over 400 edits" would be ideal. Or over some number. Something meaningful.
Don't the Perl-generated site-wide statistics include all-time active users? Maybe the press could be pointed to those numbers?
I do point them to them, when they bother asking.
(Also, they could exercise some common sense with their usage of statistics).
This will never happen. The problem is them believing what [[Special:Statistics]] tells them.
- d.
David Gerard wrote:
On 22/03/2008, Ævar Arnfjörð Bjarmason avarab@gmail.com wrote:
On 3/22/08, David Gerard dgerard@gmail.com wrote:
*Yes please*. Or something like it. The bogus number of users on Special:Statistics is widely quoted in the press even though almost all of them are spammer, vandal or troll accounts. "xx users with over 400 edits" would be ideal. Or over some number. Something meaningful.
Don't the Perl-generated site-wide statistics include all-time active users? Maybe the press could be pointed to those numbers?
I do point them to them, when they bother asking.
(Also, they could exercise some common sense with their usage of statistics).
This will never happen. The problem is them believing what [[Special:Statistics]] tells them.
- d.
Some sort of statistic that gives the number of active accounts would be ideal, say any account that has made an edit in the past week. Not sure how computationally expensive that would be though. For a large site like enwiki, it would probably have to be cached and updated on a regular basis.
What bugs me the most is why when one sets $wgDisableCounters=true; does Special:Statistics still insist on reporting them?! https://bugzilla.wikimedia.org/show_bug.cgi?id=5619
On Sat, Mar 22, 2008 at 5:56 AM, Lars Aronsson lars@aronsson.se wrote:
According to [[sv:Special:Statistics]] there are 58,087 user accounts, but <contributor><username> has 28,416 distinct values. Is it realistic that half of all registered usernames have never contributed a single edit (to non-deleted pages)?
Yes, this is very common on websites. People sign up and then never use the account for some reason. Half is a figure I'd expect. On enwiki,
mysql> SELECT COUNT(*) FROM user WHERE user_editcount=0; +----------+ | COUNT(*) | +----------+ | 4424031 | +----------+ 1 row in set (6 min 14.05 sec)
versus
mysql> SELECT ss_users FROM site_stats; +----------+ | ss_users | +----------+ | 6721545 | +----------+ 1 row in set (0.11 sec)
an even worse ratio. I just now notice that you actually used svwiki, so here are the same queries for that.
mysql> SELECT COUNT(*) FROM user WHERE user_editcount=0; +----------+ | COUNT(*) | +----------+ | 26838 | +----------+ 1 row in set (3.60 sec)
mysql> SELECT ss_users FROM site_stats; +----------+ | ss_users | +----------+ | 58125 | +----------+ 1 row in set (0.01 sec)
Can we find out what happened to them? Did they write spam that was deleted and the username permanently blocked? Did they just register their name to stop others from doing so? Or did something go wrong during the registration?
I expect most weren't really sure what they were doing, and thought they'd edit, only to find out they couldn't or didn't want to; or they registered in case they wanted to edit later, but then forgot the account password; or something in that vein. Some percentage will have been blocked for WP:USERNAME violations, of course, but I don't think it's going to be very high, since I've seen identical things on many Internet forums. In those cases you basically never have people patrolling new usernames (and for objectionable names, a forced name change is more common than a block), or any very high level of spammers. On forums you might have a failed e-mail confirmation, but that's not going to matter on Wikimedia. When registering, you get immediately logged in, right? So typing a password and then forgetting it five minutes later isn't going to be a problem?
Of those who did contribute something, of course most usernames only made very few contributions. This is a long tail. So how do we separate the regular/serious/active contributors from the occassional ones? In [[m:board elections]] to the WMF, a limit of 400 edits is used, and this threshold is as good as any.
That's okay for established contributors. A probably more interesting general-purpose statistic is the number of currently active contributors, namely the number who have made edits in the past week, two weeks, month, or whatever.
On Sat, Mar 22, 2008 at 6:52 PM, Alex mrzmanwiki@gmail.com wrote:
Some sort of statistic that gives the number of active accounts would be ideal, say any account that has made an edit in the past week. Not sure how computationally expensive that would be though. For a large site like enwiki, it would probably have to be cached and updated on a regular basis.
Caching it is somewhat tricky, since you have to be able to decrement it when any revision hits the one-week mark, *but* only if no intervening edit was made by the same user. That makes maintenance in O(dN/dT) time (with retrieval in O(1) time) not quite so simple as with most counters. Scanning a bunch of recentchanges rows every hour or every day and caching that might be okay, although it's not quite as nice as most counters (needs to be recomputed, can't be updated in real time).
On 3/23/08, Simetrical Simetrical+wikilist@gmail.com wrote:
I expect most weren't really sure what they were doing, and thought they'd edit, only to find out they couldn't or didn't want to;
This seems to me like a question worth finding the answer to. We're talking about potentially tens or hundreds of thousands of users who wanted to edit, and couldn't figure out how to do it. Eep.
Steve
On Wed, Mar 26, 2008 at 9:13 AM, Steve Bennett stevagewp@gmail.com wrote:
This seems to me like a question worth finding the answer to. We're talking about potentially tens or hundreds of thousands of users who wanted to edit, and couldn't figure out how to do it. Eep.
Clearly we need to put some <blink> on the edit tab.
Steve Bennett wrote:
This seems to me like a question worth finding the answer to. We're talking about potentially tens or hundreds of thousands of users who wanted to edit, and couldn't figure out how to do it.
"Wikipedia, the free encyclopedia, that every third person can edit."
Out of the 6.7 million registered user names on the English Wikipedia, some 4.4 million had never completed a single edit.
Hoi, The one reason why you create a user is to be able to set the user preferences. I have created MANY profiles for exactly this reason.
When you want to have less registered users without edits, you should consider providing user preferences to anonymous users. Thanks, GerardM
On Wed, Mar 26, 2008 at 4:41 AM, Lars Aronsson lars@aronsson.se wrote:
Steve Bennett wrote:
This seems to me like a question worth finding the answer to. We're talking about potentially tens or hundreds of thousands of users who wanted to edit, and couldn't figure out how to do it.
"Wikipedia, the free encyclopedia, that every third person can edit."
Out of the 6.7 million registered user names on the English Wikipedia, some 4.4 million had never completed a single edit.
-- Lars Aronsson (lars@aronsson.se) Aronsson Datateknik - http://aronsson.se
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Lars Aronsson schrieb:
Steve Bennett wrote:
This seems to me like a question worth finding the answer to. We're talking about potentially tens or hundreds of thousands of users who wanted to edit, and couldn't figure out how to do it.
"Wikipedia, the free encyclopedia, that every third person can edit."
Out of the 6.7 million registered user names on the English Wikipedia, some 4.4 million had never completed a single edit.
IMO Wikipedia could do the same as many boards do: purge accounts after a time of inactivity. I would go for 1year old accounts which have not done any edits, these shouldn't be a problem. I would also kill accounts with no edits which are blocked infinite, they are for no use anyway.
Marco
On Tue, Apr 1, 2008 at 10:08 PM, Marco Schuster < marco@harddisk.is-a-geek.org> wrote:
Lars Aronsson schrieb:
Steve Bennett wrote:
This seems to me like a question worth finding the answer to. We're talking about potentially tens or hundreds of thousands of users who wanted to edit, and couldn't figure out how to do it.
"Wikipedia, the free encyclopedia, that every third person can edit."
Out of the 6.7 million registered user names on the English Wikipedia, some 4.4 million had never completed a single edit.
IMO Wikipedia could do the same as many boards do: purge accounts after a time of inactivity. I would go for 1year old accounts which have not done any edits, these shouldn't be a problem. I would also kill accounts with no edits which are blocked infinite, they are for no use anyway.
Incidentally, what would be the *benefit* of "killing" accounts that have
never edited?
Michael
On Tue, Apr 1, 2008 at 4:08 PM, Marco Schuster marco@harddisk.is-a-geek.org wrote:
IMO Wikipedia could do the same as many boards do: purge accounts after a time of inactivity. I would go for 1year old accounts which have not done any edits, these shouldn't be a problem. I would also kill accounts with no edits which are blocked infinite, they are for no use anyway.
I hate this kind of policy. It has much more annoyance value than benefit, as far as I can see.
----- Original Message ----- From: Marco Schuster marco@harddisk.is-a-geek.org
IMO Wikipedia could do the same as many boards do: purge accounts after a time of inactivity. I would go for 1year old accounts which have not done any edits, these shouldn't be a problem. I would also kill accounts with no edits which are blocked infinite, they are for no use anyway.
Some of these may have made edits that were deleted and that may subsequently need to be restored, so accounts that have made deleted edits should probably be preserved. Unfortunately I expect that'll include most of them.
On 01/04/2008, Marco Schuster marco@harddisk.is-a-geek.org wrote:
IMO Wikipedia could do the same as many boards do: purge accounts after a time of inactivity. I would go for 1year old accounts which have not done any edits, these shouldn't be a problem. I would also kill accounts with no edits which are blocked infinite, they are for no use anyway.
I've always found that really obnoxious myself.
There's no actual gain to Wikipedia from doing this - if someone wants such an account name, they can ask for a rename so it can be usurped.
The problem is that we announce to the world, right there on [[Special:Statistics]], a number we know is bogus, and people then take us at our word. The solution is to put a non-bogus number there.
- d.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
David Gerard wrote:
The problem is that we announce to the world, right there on [[Special:Statistics]], a number we know is bogus, and people then take us at our word. The solution is to put a non-bogus number there.
It's not bogus, it's just not the number most people want. :)
The best solution would be to add a more useful number, aka the number of "active" accounts.
That means we have to define what an "active user" is, and add a way to track the number in a way that stays relatively accurate and doesn't harm site performance.
- -- brion vibber (brion @ wikimedia.org)
On 01/04/2008, Brion Vibber brion@wikimedia.org wrote:
The best solution would be to add a more useful number, aka the number of "active" accounts. That means we have to define what an "active user" is, and add a way to track the number in a way that stays relatively accurate and doesn't harm site performance.
Erik Zachte's wikistats gives two numbers: >5 edits in the past month and >100 edits in the past month.
Off the top of my head: I suppose the count could be all usernames with >100 edits. This can be approximated, e.g. not fixing it if a username goes from 101 edits to 99 edits (or 0 edits) by page deletion. Just a ticker going up.
- d.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
David Gerard wrote:
On 01/04/2008, Brion Vibber brion@wikimedia.org wrote:
The best solution would be to add a more useful number, aka the number of "active" accounts. That means we have to define what an "active user" is, and add a way to track the number in a way that stays relatively accurate and doesn't harm site performance.
Erik Zachte's wikistats gives two numbers: >5 edits in the past month and >100 edits in the past month.
Off the top of my head: I suppose the count could be all usernames with >100 edits. This can be approximated, e.g. not fixing it if a username goes from 101 edits to 99 edits (or 0 edits) by page deletion. Just a ticker going up.
I've added a bug to this effect for implementation tracking: https://bugzilla.wikimedia.org/show_bug.cgi?id=13585
- -- brion vibber (brion @ wikimedia.org)
On 02/04/2008, Brion Vibber brion@wikimedia.org wrote:
David Gerard wrote:
Erik Zachte's wikistats gives two numbers: >5 edits in the past month and >100 edits in the past month.
I've added a bug to this effect for implementation tracking: https://bugzilla.wikimedia.org/show_bug.cgi?id=13585
cool :-)
I should point out: I mention Erik's thresholds, because they seem meaningful when I quote them to people who want to know "so how many editors are there, anyway?"
- d.
Marco Schuster wrote:
Lars Aronsson schrieb:
Out of the 6.7 million registered user names on the English Wikipedia, some 4.4 million had never completed a single edit.
IMO Wikipedia could do the same as many boards do: purge accounts after a time of inactivity.
I don't agree. First, edits are not the only kind of activity. You can register an account to change personal settings that you use for reading. For example, I might want to read (look at) the Vietnamese Wikipedia even though I don't edit it. Second, such a passive but useful account can be inactive (no edit, no login, nothing) for several years.
My point, however, is that we shouldn't brag that en.wp has 6.7 million contributors, because only 2.3 million have edited. Far fewer have made more than a handful edits, useful edits in the article namespace, that weren't reverted. The size of our "volunteer community" is probably a lot smaller.
On 3/22/08, Lars Aronsson lars@aronsson.se wrote:
Can we find out what happened to them? Did they write spam that was deleted and the username permanently blocked? Did they just register their name to stop others from doing so? Or did something go wrong during the registration?
Speaking anecdotally, the number of user accounts spiked after the introduction of semi-protection. Some will be accounts that have been created in case the owner wants to edit a semi-protected page one day, some will be accounts created with the intention of so editing, but forgotten before becoming autoconfirmed.
-- Stephen Bain stephen.bain@gmail.com
wikitech-l@lists.wikimedia.org