Hi Analytics,
On ENWP, does the number of 26,163,773 users include IPs who have made edits? Does it include editors on all Wikimedia projects or just those who have registered and/or edited on ENWP?
Thanks, Pine
On 11 September 2015 at 11:13, Pine W wiki.pine@gmail.com wrote:
Hi Analytics,
On ENWP, does the number of 26,163,773 users
You mean https://en.wikipedia.org/wiki/Special:Statistics "Registered users"? Assuming yes…
include IPs who have made edits?
No.
Does it include editors on all Wikimedia projects
No.
or just those who have registered and/or edited on ENWP?
Registered, regardless of having edited.
J.
Thanks! Pine On Sep 11, 2015 11:20 AM, "James Forrester" jforrester@wikimedia.org wrote:
On 11 September 2015 at 11:13, Pine W wiki.pine@gmail.com wrote:
Hi Analytics,
On ENWP, does the number of 26,163,773 users
You mean https://en.wikipedia.org/wiki/Special:Statistics "Registered users"? Assuming yes…
include IPs who have made edits?
No.
Does it include editors on all Wikimedia projects
No.
or just those who have registered and/or edited on ENWP?
Registered, regardless of having edited.
J.
James D. Forrester Lead Product Manager, Editing Wikimedia Foundation, Inc.
jforrester@wikimedia.org | @jdforrester
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Next question: https://en.wikipedia.org/wiki/Special:Statistics shows that ENWP alone has had 123,512 active editors (5 or more actions) in the last 30 days. But https://reportcard.wmflabs.org/ shows that for June 2015 (the latest data available there), there were only 31k active editors on ENWP and 77k active editors for all projects combined. https://stats.wikimedia.org/EN/TablesWikipediaEN.htm seems consistent with the latter, showing that for August 2015 there were 30,789 active editors. Is there an explanation for the large difference between the 123,512 active editors shown on https://en.wikipedia.org/wiki/Special:Statistics, and the 30,789 active editors shown on https://stats.wikimedia.org/EN/TablesWikipediaEN.htm?
Thanks,
Pine
On Fri, Sep 11, 2015 at 11:29 AM, Pine W wiki.pine@gmail.com wrote:
Thanks! Pine On Sep 11, 2015 11:20 AM, "James Forrester" jforrester@wikimedia.org wrote:
On 11 September 2015 at 11:13, Pine W wiki.pine@gmail.com wrote:
Hi Analytics,
On ENWP, does the number of 26,163,773 users
You mean https://en.wikipedia.org/wiki/Special:Statistics "Registered users"? Assuming yes…
include IPs who have made edits?
No.
Does it include editors on all Wikimedia projects
No.
or just those who have registered and/or edited on ENWP?
Registered, regardless of having edited.
J.
James D. Forrester Lead Product Manager, Editing Wikimedia Foundation, Inc.
jforrester@wikimedia.org | @jdforrester
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Aha, I just figured it out. The two pages are using very different definitions for "active editors". https://en.wikipedia.org/wiki/Special:Statistics refers to anyone who has made a *single* edit in the last 30 days as an "active editor", while https://stats.wikimedia.org/EN/TablesWikipediaEN.htm refers to edits that have made *5 or more* edits in the past month as active. This mix of terminology is confusing. I think that the definition on Special:Statistics makes more sense for "active editors" than the >=5 definition than is commonly used in discussions on mailing lists. Can anyone suggest a better set of terminology to distinguish the >=1 "active editors" from the >=5 "active editors"?
Pine
On Sat, Sep 12, 2015 at 2:21 PM, Pine W wiki.pine@gmail.com wrote:
Next question: https://en.wikipedia.org/wiki/Special:Statistics shows that ENWP alone has had 123,512 active editors (5 or more actions) in the last 30 days. But https://reportcard.wmflabs.org/ shows that for June 2015 (the latest data available there), there were only 31k active editors on ENWP and 77k active editors for all projects combined. https://stats.wikimedia.org/EN/TablesWikipediaEN.htm seems consistent with the latter, showing that for August 2015 there were 30,789 active editors. Is there an explanation for the large difference between the 123,512 active editors shown on https://en.wikipedia.org/wiki/Special:Statistics, and the 30,789 active editors shown on https://stats.wikimedia.org/EN/TablesWikipediaEN.htm?
Thanks,
Pine
On Fri, Sep 11, 2015 at 11:29 AM, Pine W wiki.pine@gmail.com wrote:
Thanks! Pine On Sep 11, 2015 11:20 AM, "James Forrester" jforrester@wikimedia.org wrote:
On 11 September 2015 at 11:13, Pine W wiki.pine@gmail.com wrote:
Hi Analytics,
On ENWP, does the number of 26,163,773 users
You mean https://en.wikipedia.org/wiki/Special:Statistics "Registered users"? Assuming yes…
include IPs who have made edits?
No.
Does it include editors on all Wikimedia projects
No.
or just those who have registered and/or edited on ENWP?
Registered, regardless of having edited.
J.
James D. Forrester Lead Product Manager, Editing Wikimedia Foundation, Inc.
jforrester@wikimedia.org | @jdforrester
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Hi Pine,
I think that the definition on Special:Statistics makes more sense for "active editors" than the >=5 definition than is commonly used in discussions on mailing lists.
tl;dr 'active editor' is a term with a long history. If we recoin that term and keep informing the public how many active editors we counted we will make our public stats more vain and empty.
Long version:
This is a recurring discussion, with minor variations.
In my personal opinion our movement has a tendency to publish too extreme numbers already, however bloated, as if our more substantial achievements aren't awe-inspiring enough.
(examples are 'Wikipedias in 280 languages', '800 wikis', not to mention our extreme 'article' counts)
As long as we keep these extreme counts with little substance for ourselves I wouldn't care much about terminology, but we tend not to keep these for ourselves.
Can I illustrate my point by reductio ad absurdum (sort of)?
Would you call a person who jots his name on a paycheck once a month and writes nothing else a writer?
Would you call a person who climbs three steps to enter a bus a climber?
Are you a reader if you glance at a glossy's cover once at your local barber?
A person with one edit in one particular month and maybe none in the rest of the year to me is not much of an editor really.
It's one more person who knows of Wikipedia (we have 500+ million of those) and found the edit and submit buttons and tried those, to see what happens.
Now if that person likes what happened and wants to do it again we are on to something.
The threshold of edits a person should reach before we can infer intention and motivation is of course arbitrary, but clearly more than one in my view.
I'm not saying we shouldn’t count one-off's. If people get deterred by one problematic edit that is hugely relevant. And the enormous gap between 1+ and 3+ edits is of course a major concern.
I would just prefer a different term rather than 'active editor', which is what you suggest to adopt.
Cheers,
Erik
From: analytics-bounces@lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Pine W Sent: Saturday, September 12, 2015 23:29 To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Subject: Re: [Analytics] User statistics for video marking ENWP 5m article milestone
Aha, I just figured it out. The two pages are using very different definitions for "active editors". https://en.wikipedia.org/wiki/Special:Statistics refers to anyone who has made a *single* edit in the last 30 days as an "active editor", while https://stats.wikimedia.org/EN/TablesWikipediaEN.htm refers to edits that have made *5 or more* edits in the past month as active. This mix of terminology is confusing. I think that the definition on Special:Statistics makes more sense for "active editors" than the >=5 definition than is commonly used in discussions on mailing lists. Can anyone suggest a better set of terminology to distinguish the >=1 "active editors" from the >=5 "active editors"?
Pine
On Sat, Sep 12, 2015 at 2:21 PM, Pine W wiki.pine@gmail.com wrote:
Next question: https://en.wikipedia.org/wiki/Special:Statistics https://en.wikipedia.org/wiki/Special:Statistics shows that ENWP alone has had 123,512 active editors (5 or more actions) in the last 30 days. But https://reportcard.wmflabs.org/ shows that for June 2015 (the latest data available there), there were only 31k active editors on ENWP and 77k active editors for all projects combined. https://stats.wikimedia.org/EN/TablesWikipediaEN.htm https://stats.wikimedia.org/EN/TablesWikipediaEN.htm seems consistent with the latter, showing that for August 2015 there were 30,789 active editors. Is there an explanation for the large difference between the 123,512 active editors shown on https://en.wikipedia.org/wiki/Special:Statistics, and the 30,789 active editors shown on https://stats.wikimedia.org/EN/TablesWikipediaEN.htm?
Thanks,
Pine
On Fri, Sep 11, 2015 at 11:29 AM, Pine W wiki.pine@gmail.com wrote:
Thanks! Pine
On Sep 11, 2015 11:20 AM, "James Forrester" jforrester@wikimedia.org wrote:
On 11 September 2015 at 11:13, Pine W wiki.pine@gmail.com wrote:
Hi Analytics,
On ENWP, does the number of 26,163,773 users
You mean https://en.wikipedia.org/wiki/Special:Statistics "Registered users"? Assuming yes…
include IPs who have made edits?
No.
Does it include editors on all Wikimedia projects
No.
or just those who have registered and/or edited on ENWP?
Registered, regardless of having edited.
J.
Hi Erik,
How about dropping the terms "active editor" and "very active editor" entirely, and instead using C1, C5, C100, etc with C(X) being the number of contribs in a given period of time (last 30 days or last month, most likely)?
Another alternative is to change the terminology or measures on Special:Statistics to align with stats.wikimedia.org.
Thoughts?
Pine On Sep 12, 2015 4:25 PM, "Erik Zachte" ezachte@wikimedia.org wrote:
Hi Pine,
I think that the definition on Special:Statistics makes more sense for
"active editors" than the >=5 definition than is commonly used in discussions on mailing lists.
tl;dr 'active editor' is a term with a long history. If we recoin that term and keep informing the public how many active editors we counted we will make our public stats more vain and empty.
Long version:
This is a recurring discussion, with minor variations.
In my personal opinion our movement has a tendency to publish too extreme numbers already, however bloated, as if our more substantial achievements aren't awe-inspiring enough.
(examples are 'Wikipedias in 280 languages', '800 wikis', not to mention our extreme 'article' counts)
As long as we keep these extreme counts with little substance for ourselves I wouldn't care much about terminology, but we tend not to keep these for ourselves.
Can I illustrate my point by reductio ad absurdum (sort of)?
Would you call a person who jots his name on a paycheck once a month and writes nothing else a writer?
Would you call a person who climbs three steps to enter a bus a climber?
Are you a reader if you glance at a glossy's cover once at your local barber?
A person with one edit in one particular month and maybe none in the rest of the year to me is not much of an editor really.
It's one more person who knows of Wikipedia (we have 500+ million of those) and found the edit and submit buttons and tried those, to see what happens.
Now if that person likes what happened and wants to do it again we are on to something.
The threshold of edits a person should reach before we can infer intention and motivation is of course arbitrary, but clearly more than one in my view.
I'm not saying we shouldn’t count one-off's. If people get deterred by one problematic edit that is hugely relevant. And the enormous gap between 1+ and 3+ edits is of course a major concern.
I would just prefer a different term rather than 'active editor', which is what you suggest to adopt.
Cheers,
Erik
*From:* analytics-bounces@lists.wikimedia.org [mailto: analytics-bounces@lists.wikimedia.org] *On Behalf Of *Pine W *Sent:* Saturday, September 12, 2015 23:29 *To:* A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. *Subject:* Re: [Analytics] User statistics for video marking ENWP 5m article milestone
Aha, I just figured it out. The two pages are using very different definitions for "active editors". https://en.wikipedia.org/wiki/Special:Statistics refers to anyone who has made a *single* edit in the last 30 days as an "active editor", while https://stats.wikimedia.org/EN/TablesWikipediaEN.htm refers to edits that have made *5 or more* edits in the past month as active. This mix of terminology is confusing. I think that the definition on Special:Statistics makes more sense for "active editors" than the >=5 definition than is commonly used in discussions on mailing lists. Can anyone suggest a better set of terminology to distinguish the >=1 "active editors" from the >=5 "active editors"?
Pine
On Sat, Sep 12, 2015 at 2:21 PM, Pine W wiki.pine@gmail.com wrote:
Next question: https://en.wikipedia.org/wiki/Special:Statistics shows that ENWP alone has had 123,512 active editors (5 or more actions) in the last 30 days. But https://reportcard.wmflabs.org/ shows that for June 2015 (the latest data available there), there were only 31k active editors on ENWP and 77k active editors for all projects combined. https://stats.wikimedia.org/EN/TablesWikipediaEN.htm seems consistent with the latter, showing that for August 2015 there were 30,789 active editors. Is there an explanation for the large difference between the 123,512 active editors shown on https://en.wikipedia.org/wiki/Special:Statistics, and the 30,789 active editors shown on https://stats.wikimedia.org/EN/TablesWikipediaEN.htm?
Thanks,
Pine
On Fri, Sep 11, 2015 at 11:29 AM, Pine W wiki.pine@gmail.com wrote:
Thanks! Pine
On Sep 11, 2015 11:20 AM, "James Forrester" jforrester@wikimedia.org wrote:
On 11 September 2015 at 11:13, Pine W wiki.pine@gmail.com wrote:
Hi Analytics,
On ENWP, does the number of 26,163,773 users
You mean https://en.wikipedia.org/wiki/Special:Statistics "Registered users"? Assuming yes…
include IPs who have made edits?
No.
Does it include editors on all Wikimedia projects
No.
or just those who have registered and/or edited on ENWP?
Registered, regardless of having edited.
J.
--
James D. Forrester Lead Product Manager, Editing Wikimedia Foundation, Inc.
jforrester@wikimedia.org | @jdforrester
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
And to add, we are not alone in using a definition that requires repeated actions within a month for a user to be counted as active - just happened to see this:
http://blog.stackoverflow.com/2015/09/were-changing-our-name-back-to-stack-o... "The [Stack Exchange] network as a whole has more monthly 5-time posters than English Wikipedia has 5-time monthly editors." (ouch!)
On Sat, Sep 12, 2015 at 4:25 PM, Erik Zachte ezachte@wikimedia.org wrote:
Hi Pine,
I think that the definition on Special:Statistics makes more sense for
"active editors" than the >=5 definition than is commonly used in discussions on mailing lists.
tl;dr 'active editor' is a term with a long history. If we recoin that term and keep informing the public how many active editors we counted we will make our public stats more vain and empty.
Long version:
This is a recurring discussion, with minor variations.
In my personal opinion our movement has a tendency to publish too extreme numbers already, however bloated, as if our more substantial achievements aren't awe-inspiring enough.
(examples are 'Wikipedias in 280 languages', '800 wikis', not to mention our extreme 'article' counts)
As long as we keep these extreme counts with little substance for ourselves I wouldn't care much about terminology, but we tend not to keep these for ourselves.
Can I illustrate my point by reductio ad absurdum (sort of)?
Would you call a person who jots his name on a paycheck once a month and writes nothing else a writer?
Would you call a person who climbs three steps to enter a bus a climber?
Are you a reader if you glance at a glossy's cover once at your local barber?
A person with one edit in one particular month and maybe none in the rest of the year to me is not much of an editor really.
It's one more person who knows of Wikipedia (we have 500+ million of those) and found the edit and submit buttons and tried those, to see what happens.
Now if that person likes what happened and wants to do it again we are on to something.
The threshold of edits a person should reach before we can infer intention and motivation is of course arbitrary, but clearly more than one in my view.
I'm not saying we shouldn’t count one-off's. If people get deterred by one problematic edit that is hugely relevant. And the enormous gap between 1+ and 3+ edits is of course a major concern.
I would just prefer a different term rather than 'active editor', which is what you suggest to adopt.
Cheers,
Erik
*From:* analytics-bounces@lists.wikimedia.org [mailto: analytics-bounces@lists.wikimedia.org] *On Behalf Of *Pine W *Sent:* Saturday, September 12, 2015 23:29 *To:* A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. *Subject:* Re: [Analytics] User statistics for video marking ENWP 5m article milestone
Aha, I just figured it out. The two pages are using very different definitions for "active editors". https://en.wikipedia.org/wiki/Special:Statistics refers to anyone who has made a *single* edit in the last 30 days as an "active editor", while https://stats.wikimedia.org/EN/TablesWikipediaEN.htm refers to edits that have made *5 or more* edits in the past month as active. This mix of terminology is confusing. I think that the definition on Special:Statistics makes more sense for "active editors" than the >=5 definition than is commonly used in discussions on mailing lists. Can anyone suggest a better set of terminology to distinguish the >=1 "active editors" from the >=5 "active editors"?
Pine
On Sat, Sep 12, 2015 at 2:21 PM, Pine W wiki.pine@gmail.com wrote:
Next question: https://en.wikipedia.org/wiki/Special:Statistics shows that ENWP alone has had 123,512 active editors (5 or more actions) in the last 30 days. But https://reportcard.wmflabs.org/ shows that for June 2015 (the latest data available there), there were only 31k active editors on ENWP and 77k active editors for all projects combined. https://stats.wikimedia.org/EN/TablesWikipediaEN.htm seems consistent with the latter, showing that for August 2015 there were 30,789 active editors. Is there an explanation for the large difference between the 123,512 active editors shown on https://en.wikipedia.org/wiki/Special:Statistics, and the 30,789 active editors shown on https://stats.wikimedia.org/EN/TablesWikipediaEN.htm?
Thanks,
Pine
On Fri, Sep 11, 2015 at 11:29 AM, Pine W wiki.pine@gmail.com wrote:
Thanks! Pine
On Sep 11, 2015 11:20 AM, "James Forrester" jforrester@wikimedia.org wrote:
On 11 September 2015 at 11:13, Pine W wiki.pine@gmail.com wrote:
Hi Analytics,
On ENWP, does the number of 26,163,773 users
You mean https://en.wikipedia.org/wiki/Special:Statistics "Registered users"? Assuming yes…
include IPs who have made edits?
No.
Does it include editors on all Wikimedia projects
No.
or just those who have registered and/or edited on ENWP?
Registered, regardless of having edited.
J.
--
James D. Forrester Lead Product Manager, Editing Wikimedia Foundation, Inc.
jforrester@wikimedia.org | @jdforrester
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
At least https://en.wikipedia.org/wiki/Special:Statistics calls it "active registered users", not "active editors". Still confusing of course.
BTW, this difference is also explained at https://www.mediawiki.org/wiki/Analytics/Metric_definitions#Active_editor (this page is linked in the intro of the table you were looking at, https://stats.wikimedia.org/EN/TablesWikipediaEN.htm ).
And there's more background at https://meta.wikimedia.org/wiki/Research:Refining_the_definition_of_monthly_... and https://blog.wikimedia.org/2012/08/31/improving-the-accuracy-of-the-active-e... .
On Sat, Sep 12, 2015 at 2:29 PM, Pine W wiki.pine@gmail.com wrote:
Aha, I just figured it out. The two pages are using very different definitions for "active editors". https://en.wikipedia.org/wiki/Special:Statistics refers to anyone who has made a *single* edit in the last 30 days as an "active editor", while https://stats.wikimedia.org/EN/TablesWikipediaEN.htm refers to edits that have made *5 or more* edits in the past month as active. This mix of terminology is confusing. I think that the definition on Special:Statistics makes more sense for "active editors" than the >=5 definition than is commonly used in discussions on mailing lists. Can anyone suggest a better set of terminology to distinguish the >=1 "active editors" from the >=5 "active editors"?
Pine
On Sat, Sep 12, 2015 at 2:21 PM, Pine W wiki.pine@gmail.com wrote:
Next question: https://en.wikipedia.org/wiki/Special:Statistics shows that ENWP alone has had 123,512 active editors (5 or more actions) in the last 30 days. But https://reportcard.wmflabs.org/ shows that for June 2015 (the latest data available there), there were only 31k active editors on ENWP and 77k active editors for all projects combined. https://stats.wikimedia.org/EN/TablesWikipediaEN.htm seems consistent with the latter, showing that for August 2015 there were 30,789 active editors. Is there an explanation for the large difference between the 123,512 active editors shown on https://en.wikipedia.org/wiki/Special:Statistics, and the 30,789 active editors shown on https://stats.wikimedia.org/EN/TablesWikipediaEN.htm?
Thanks,
Pine
On Fri, Sep 11, 2015 at 11:29 AM, Pine W wiki.pine@gmail.com wrote:
Thanks! Pine
On Sep 11, 2015 11:20 AM, "James Forrester" jforrester@wikimedia.org wrote:
On 11 September 2015 at 11:13, Pine W wiki.pine@gmail.com wrote:
Hi Analytics,
On ENWP, does the number of 26,163,773 users
You mean https://en.wikipedia.org/wiki/Special:Statistics "Registered users"? Assuming yes…
include IPs who have made edits?
No.
Does it include editors on all Wikimedia projects
No.
or just those who have registered and/or edited on ENWP?
Registered, regardless of having edited.
J.
James D. Forrester Lead Product Manager, Editing Wikimedia Foundation, Inc.
jforrester@wikimedia.org | @jdforrester
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
On 11 September 2015 at 19:19, James Forrester jforrester@wikimedia.org wrote:
Does it include editors on all Wikimedia projects
No.
or just those who have registered and/or edited on ENWP?
Registered, regardless of having edited.
James is of course correct, but one small caveat worth adding: because of SUL, a substantial proportion of these will be "autocreated" accounts from other projects - so even 'registration' may not mean what it seems.
Aha, that is important for me to know. Thanks Andrew.
Pine
On Thu, Sep 17, 2015 at 11:07 AM, Andrew Gray andrew.gray@dunelm.org.uk wrote:
On 11 September 2015 at 19:19, James Forrester jforrester@wikimedia.org wrote:
Does it include editors on all Wikimedia projects
No.
or just those who have registered and/or edited on ENWP?
Registered, regardless of having edited.
James is of course correct, but one small caveat worth adding: because of SUL, a substantial proportion of these will be "autocreated" accounts from other projects - so even 'registration' may not mean what it seems.
--
- Andrew Gray andrew.gray@dunelm.org.uk
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Is there a way to get counts for the number of accounts, including or excluding IPs, that have ever edited English Wikipedia, ? It would be preferable to know the number of unique people, but of course that's impossible.
Thanks, Pine Aha, that is important for me to know. Thanks Andrew.
Pine
On Thu, Sep 17, 2015 at 11:07 AM, Andrew Gray andrew.gray@dunelm.org.uk wrote:
On 11 September 2015 at 19:19, James Forrester jforrester@wikimedia.org wrote:
Does it include editors on all Wikimedia projects
No.
or just those who have registered and/or edited on ENWP?
Registered, regardless of having edited.
James is of course correct, but one small caveat worth adding: because of SUL, a substantial proportion of these will be "autocreated" accounts from other projects - so even 'registration' may not mean what it seems.
--
- Andrew Gray andrew.gray@dunelm.org.uk
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
To a very crude approximation, there are approximately 8.2 million accounts which have at least one edit on English Wikipedia - at least assuming my SQL query is correct! http://quarry.wmflabs.org/query/1911
This is all user accounts with one or more edits in the contributions record; it does not contain IPs, and it does not contain any accounts whose sole contributions have since been deleted (which is probably quite a substantial number). Conversely, it includes a vast panoply of single-use vandalism accounts, sockpuppets, etc etc etc. And bots, of course.
Andrew.
On 27 October 2015 at 05:50, Pine W wiki.pine@gmail.com wrote:
Is there a way to get counts for the number of accounts, including or excluding IPs, that have ever edited English Wikipedia, ? It would be preferable to know the number of unique people, but of course that's impossible.
Thanks, Pine
Aha, that is important for me to know. Thanks Andrew.
Pine
On Thu, Sep 17, 2015 at 11:07 AM, Andrew Gray andrew.gray@dunelm.org.uk wrote:
On 11 September 2015 at 19:19, James Forrester jforrester@wikimedia.org wrote:
Does it include editors on all Wikimedia projects
No.
or just those who have registered and/or edited on ENWP?
Registered, regardless of having edited.
James is of course correct, but one small caveat worth adding: because of SUL, a substantial proportion of these will be "autocreated" accounts from other projects - so even 'registration' may not mean what it seems.
--
- Andrew Gray andrew.gray@dunelm.org.uk
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Wikistats has it that 5,644,681 registered accounts published at least once till Oct 1, 2015, and 2,181,006 three or more times. It used to publish that on [1][2] but I just removed it.
I'm campaigning against us publishing overly inflated counts since about two years (Wikimania London).
Since this thread is going on and on, I'll repost my (reworded) reservations on this particular metric, for newcomers.
Even if we state explicitly that this is not unique people, any audience will think it may be close and we are overly correct by adding the caveat. It may not be so close. For that reason imo such a metric would be of questionable value, to put it mildly.
Pine:
Is there a way to get counts for the number of accounts, including or excluding IPs, that have ever edited English Wikipedia, ?
First the anon contributors: when we'd count every ip address that shows up in the dumps, we'd count *very* many people who were just vandalizing willfully, or just pressing edit for fun, or forgot to login once, and also moved from one ip address to another over the years. On top of that many people get a new ip address (from a pool) on every session, depends on provider policy.
As for registered editors the number Wikistats used to publish may be a rather empty metric for several reasons: - How many casual editors will have forgotten their password and just created a new user id? Only veteran editors know about sockpuppeting and how one is supposed not to do that. - How many people will have registered in good faith just out of habit, or to tweak presentation preferences, and then played with the edit button just to see what happens? Note that roughly 2 out of 3 accounts doesn't even reach 3 edits.
Cheers, Erik Zachte
[1] https://stats.wikimedia.org/EN/TablesWikipediaEN.htm#editdistribution [2] BTW I use the term wikipedians overly inclusive in that report. A person who edited once or twice isn't a wikipedian in my book, just like a person who writes two post-it notes per month and nothing else isn't called a writer. Some terms only apply above some threshold.
-----Original Message----- From: Analytics [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Andrew Gray Sent: Tuesday, October 27, 2015 11:06 To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Subject: Re: [Analytics] User statistics for video marking ENWP 5m article milestone
To a very crude approximation, there are approximately 8.2 million accounts which have at least one edit on English Wikipedia - at least assuming my SQL query is correct! http://quarry.wmflabs.org/query/1911
This is all user accounts with one or more edits in the contributions record; it does not contain IPs, and it does not contain any accounts whose sole contributions have since been deleted (which is probably quite a substantial number). Conversely, it includes a vast panoply of single-use vandalism accounts, sockpuppets, etc etc etc. And bots, of course.
Andrew.
On 27 October 2015 at 05:50, Pine W wiki.pine@gmail.com wrote:
Is there a way to get counts for the number of accounts, including or excluding IPs, that have ever edited English Wikipedia, ? It would be preferable to know the number of unique people, but of course that's impossible.
Thanks, Pine
Aha, that is important for me to know. Thanks Andrew.
Pine
On Thu, Sep 17, 2015 at 11:07 AM, Andrew Gray andrew.gray@dunelm.org.uk wrote:
On 11 September 2015 at 19:19, James Forrester jforrester@wikimedia.org wrote:
Does it include editors on all Wikimedia projects
No.
or just those who have registered and/or edited on ENWP?
Registered, regardless of having edited.
James is of course correct, but one small caveat worth adding: because of SUL, a substantial proportion of these will be "autocreated" accounts from other projects - so even 'registration' may not mean what it seems.
--
- Andrew Gray andrew.gray@dunelm.org.uk
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- - Andrew Gray andrew.gray@dunelm.org.uk
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Interesting - wonder why my query's giving a higher number?
I agree entirely that we should be very careful with quoting these figures. I think you'd probably be safe to say that more than a million people have edited... but even then I'd be cautious.
Andrew.
On 27 October 2015 at 11:11, Erik Zachte ezachte@wikimedia.org wrote:
Wikistats has it that 5,644,681 registered accounts published at least once till Oct 1, 2015, and 2,181,006 three or more times. It used to publish that on [1][2] but I just removed it.
I'm campaigning against us publishing overly inflated counts since about two years (Wikimania London).
Since this thread is going on and on, I'll repost my (reworded) reservations on this particular metric, for newcomers.
Even if we state explicitly that this is not unique people, any audience will think it may be close and we are overly correct by adding the caveat. It may not be so close. For that reason imo such a metric would be of questionable value, to put it mildly.
Pine:
Is there a way to get counts for the number of accounts, including or excluding IPs, that have ever edited English Wikipedia, ?
First the anon contributors: when we'd count every ip address that shows up in the dumps, we'd count *very* many people who were just vandalizing willfully, or just pressing edit for fun, or forgot to login once, and also moved from one ip address to another over the years. On top of that many people get a new ip address (from a pool) on every session, depends on provider policy.
As for registered editors the number Wikistats used to publish may be a rather empty metric for several reasons:
- How many casual editors will have forgotten their password and just created a new user id? Only veteran editors know about sockpuppeting and how one is supposed not to do that.
- How many people will have registered in good faith just out of habit, or to tweak presentation preferences, and then played with the edit button just to see what happens? Note that roughly 2 out of 3 accounts doesn't even reach 3 edits.
Cheers, Erik Zachte
[1] https://stats.wikimedia.org/EN/TablesWikipediaEN.htm#editdistribution [2] BTW I use the term wikipedians overly inclusive in that report. A person who edited once or twice isn't a wikipedian in my book, just like a person who writes two post-it notes per month and nothing else isn't called a writer. Some terms only apply above some threshold.
-----Original Message----- From: Analytics [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Andrew Gray Sent: Tuesday, October 27, 2015 11:06 To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Subject: Re: [Analytics] User statistics for video marking ENWP 5m article milestone
To a very crude approximation, there are approximately 8.2 million accounts which have at least one edit on English Wikipedia - at least assuming my SQL query is correct! http://quarry.wmflabs.org/query/1911
This is all user accounts with one or more edits in the contributions record; it does not contain IPs, and it does not contain any accounts whose sole contributions have since been deleted (which is probably quite a substantial number). Conversely, it includes a vast panoply of single-use vandalism accounts, sockpuppets, etc etc etc. And bots, of course.
Andrew.
On 27 October 2015 at 05:50, Pine W wiki.pine@gmail.com wrote:
Is there a way to get counts for the number of accounts, including or excluding IPs, that have ever edited English Wikipedia, ? It would be preferable to know the number of unique people, but of course that's impossible.
Thanks, Pine
Aha, that is important for me to know. Thanks Andrew.
Pine
On Thu, Sep 17, 2015 at 11:07 AM, Andrew Gray andrew.gray@dunelm.org.uk wrote:
On 11 September 2015 at 19:19, James Forrester jforrester@wikimedia.org wrote:
Does it include editors on all Wikimedia projects
No.
or just those who have registered and/or edited on ENWP?
Registered, regardless of having edited.
James is of course correct, but one small caveat worth adding: because of SUL, a substantial proportion of these will be "autocreated" accounts from other projects - so even 'registration' may not mean what it seems.
--
- Andrew Gray andrew.gray@dunelm.org.uk
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
--
- Andrew Gray andrew.gray@dunelm.org.uk
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
I also wonder about this discrepancy. I ran a more explicit version of Andrew query, trying to eliminate some possible edge cases, and came up with the same number.
Now I'm curious. Are there junk rows in our user table, retained for legacy reasons maybe? Is user_editcount inaccurate? Erik, can you describe the processing you perform to winnow down from 8.2 million?
J
On Tue, Oct 27, 2015 at 7:06 AM, Andrew Gray andrew.gray@dunelm.org.uk wrote:
Interesting - wonder why my query's giving a higher number?
I agree entirely that we should be very careful with quoting these figures. I think you'd probably be safe to say that more than a million people have edited... but even then I'd be cautious.
Andrew.
On 27 October 2015 at 11:11, Erik Zachte ezachte@wikimedia.org wrote:
Wikistats has it that 5,644,681 registered accounts published at least
once till Oct 1, 2015, and 2,181,006 three or more times.
It used to publish that on [1][2] but I just removed it.
I'm campaigning against us publishing overly inflated counts since about
two years (Wikimania London).
Since this thread is going on and on, I'll repost my (reworded)
reservations on this particular metric, for newcomers.
Even if we state explicitly that this is not unique people, any audience
will think it may be close and we are overly correct by adding the caveat. It may not be so close. For that reason imo such a metric would be of questionable value, to put it mildly.
Pine:
Is there a way to get counts for the number of accounts, including or
excluding IPs, that have ever edited English Wikipedia, ?
First the anon contributors: when we'd count every ip address that shows
up in the dumps, we'd count *very* many people who were just vandalizing willfully, or just pressing edit for fun, or forgot to login once, and also moved from one ip address to another over the years. On top of that many people get a new ip address (from a pool) on every session, depends on provider policy.
As for registered editors the number Wikistats used to publish may be a
rather empty metric for several reasons:
- How many casual editors will have forgotten their password and just
created a new user id? Only veteran editors know about sockpuppeting and how one is supposed not to do that.
- How many people will have registered in good faith just out of habit,
or to tweak presentation preferences, and then played with the edit button just to see what happens? Note that roughly 2 out of 3 accounts doesn't even reach 3 edits.
Cheers, Erik Zachte
[1]
https://stats.wikimedia.org/EN/TablesWikipediaEN.htm#editdistribution
[2] BTW I use the term wikipedians overly inclusive in that report. A
person who edited once or twice isn't a wikipedian in my book, just like a person who writes two post-it notes per month and nothing else isn't called a writer. Some terms only apply above some threshold.
-----Original Message----- From: Analytics [mailto:analytics-bounces@lists.wikimedia.org] On
Behalf Of Andrew Gray
Sent: Tuesday, October 27, 2015 11:06 To: A mailing list for the Analytics Team at WMF and everybody who has
an interest in Wikipedia and analytics.
Subject: Re: [Analytics] User statistics for video marking ENWP 5m
article milestone
To a very crude approximation, there are approximately 8.2 million
accounts which have at least one edit on English Wikipedia - at least assuming my SQL query is correct! http://quarry.wmflabs.org/query/1911
This is all user accounts with one or more edits in the contributions
record; it does not contain IPs, and it does not contain any accounts whose sole contributions have since been deleted (which is probably quite a substantial number). Conversely, it includes a vast panoply of single-use vandalism accounts, sockpuppets, etc etc etc. And bots, of course.
Andrew.
On 27 October 2015 at 05:50, Pine W wiki.pine@gmail.com wrote:
Is there a way to get counts for the number of accounts, including or excluding IPs, that have ever edited English Wikipedia, ? It would be preferable to know the number of unique people, but of course that's impossible.
Thanks, Pine
Aha, that is important for me to know. Thanks Andrew.
Pine
On Thu, Sep 17, 2015 at 11:07 AM, Andrew Gray andrew.gray@dunelm.org.uk wrote:
On 11 September 2015 at 19:19, James Forrester jforrester@wikimedia.org wrote:
Does it include editors on all Wikimedia projects
No.
or just those who have registered and/or edited on ENWP?
Registered, regardless of having edited.
James is of course correct, but one small caveat worth adding: because of SUL, a substantial proportion of these will be "autocreated" accounts from other projects - so even 'registration' may not mean what it seems.
--
- Andrew Gray andrew.gray@dunelm.org.uk
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
--
- Andrew Gray andrew.gray@dunelm.org.uk
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
--
- Andrew Gray andrew.gray@dunelm.org.uk
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
user_editcount includes edits to deleted pages and revdeleted edits. Erik's perl scripts use the XML dumps that do not include edits to deleted pages.
Strictly speaking, user_editcount is a better proxy for the number of people who have "ever edited". Erik's is the number of people whose edits appear in the history of a page at the time of an XML dump.
-Aaron
On Tue, Oct 27, 2015 at 9:34 AM, Jonathan Morgan jmorgan@wikimedia.org wrote:
I also wonder about this discrepancy. I ran a more explicit version of Andrew query, trying to eliminate some possible edge cases, and came up with the same number.
Now I'm curious. Are there junk rows in our user table, retained for legacy reasons maybe? Is user_editcount inaccurate? Erik, can you describe the processing you perform to winnow down from 8.2 million?
J
On Tue, Oct 27, 2015 at 7:06 AM, Andrew Gray andrew.gray@dunelm.org.uk wrote:
Interesting - wonder why my query's giving a higher number?
I agree entirely that we should be very careful with quoting these figures. I think you'd probably be safe to say that more than a million people have edited... but even then I'd be cautious.
Andrew.
On 27 October 2015 at 11:11, Erik Zachte ezachte@wikimedia.org wrote:
Wikistats has it that 5,644,681 registered accounts published at least
once till Oct 1, 2015, and 2,181,006 three or more times.
It used to publish that on [1][2] but I just removed it.
I'm campaigning against us publishing overly inflated counts since
about two years (Wikimania London).
Since this thread is going on and on, I'll repost my (reworded)
reservations on this particular metric, for newcomers.
Even if we state explicitly that this is not unique people, any
audience will think it may be close and we are overly correct by adding the caveat. It may not be so close. For that reason imo such a metric would be of questionable value, to put it mildly.
Pine:
Is there a way to get counts for the number of accounts, including or
excluding IPs, that have ever edited English Wikipedia, ?
First the anon contributors: when we'd count every ip address that
shows up in the dumps, we'd count *very* many people who were just vandalizing willfully, or just pressing edit for fun, or forgot to login once, and also moved from one ip address to another over the years. On top of that many people get a new ip address (from a pool) on every session, depends on provider policy.
As for registered editors the number Wikistats used to publish may be a
rather empty metric for several reasons:
- How many casual editors will have forgotten their password and just
created a new user id? Only veteran editors know about sockpuppeting and how one is supposed not to do that.
- How many people will have registered in good faith just out of habit,
or to tweak presentation preferences, and then played with the edit button just to see what happens? Note that roughly 2 out of 3 accounts doesn't even reach 3 edits.
Cheers, Erik Zachte
[1]
https://stats.wikimedia.org/EN/TablesWikipediaEN.htm#editdistribution
[2] BTW I use the term wikipedians overly inclusive in that report. A
person who edited once or twice isn't a wikipedian in my book, just like a person who writes two post-it notes per month and nothing else isn't called a writer. Some terms only apply above some threshold.
-----Original Message----- From: Analytics [mailto:analytics-bounces@lists.wikimedia.org] On
Behalf Of Andrew Gray
Sent: Tuesday, October 27, 2015 11:06 To: A mailing list for the Analytics Team at WMF and everybody who has
an interest in Wikipedia and analytics.
Subject: Re: [Analytics] User statistics for video marking ENWP 5m
article milestone
To a very crude approximation, there are approximately 8.2 million
accounts which have at least one edit on English Wikipedia - at least assuming my SQL query is correct! http://quarry.wmflabs.org/query/1911
This is all user accounts with one or more edits in the contributions
record; it does not contain IPs, and it does not contain any accounts whose sole contributions have since been deleted (which is probably quite a substantial number). Conversely, it includes a vast panoply of single-use vandalism accounts, sockpuppets, etc etc etc. And bots, of course.
Andrew.
On 27 October 2015 at 05:50, Pine W wiki.pine@gmail.com wrote:
Is there a way to get counts for the number of accounts, including or excluding IPs, that have ever edited English Wikipedia, ? It would be preferable to know the number of unique people, but of course that's impossible.
Thanks, Pine
Aha, that is important for me to know. Thanks Andrew.
Pine
On Thu, Sep 17, 2015 at 11:07 AM, Andrew Gray andrew.gray@dunelm.org.uk wrote:
On 11 September 2015 at 19:19, James Forrester jforrester@wikimedia.org wrote:
> Does it include editors on all Wikimedia projects
No.
> or just those who have registered and/or edited on ENWP?
Registered, regardless of having edited.
James is of course correct, but one small caveat worth adding: because of SUL, a substantial proportion of these will be
"autocreated"
accounts from other projects - so even 'registration' may not mean what it seems.
--
- Andrew Gray andrew.gray@dunelm.org.uk
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
--
- Andrew Gray andrew.gray@dunelm.org.uk
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
--
- Andrew Gray andrew.gray@dunelm.org.uk
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Jonathan T. Morgan Senior Design Researcher Wikimedia Foundation User:Jmorgan (WMF) https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
As Aaron says. I'd like to add that if almost 3 million accounts disappeared from the dumps alltogether (vandals? school kids?) that makes the case for not using such a count even more convincing.
Erik
From: Analytics [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Aaron Halfaker Sent: Tuesday, October 27, 2015 15:48 To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Subject: Re: [Analytics] [Spam] Re: User statistics for video marking ENWP 5m article milestone
user_editcount includes edits to deleted pages and revdeleted edits. Erik's perl scripts use the XML dumps that do not include edits to deleted pages.
Strictly speaking, user_editcount is a better proxy for the number of people who have "ever edited". Erik's is the number of people whose edits appear in the history of a page at the time of an XML dump.
-Aaron
On Tue, Oct 27, 2015 at 9:34 AM, Jonathan Morgan jmorgan@wikimedia.org wrote:
I also wonder about this discrepancy. I ran a more explicit version of Andrew query, trying to eliminate some possible edge cases, and came up with the same number.
Now I'm curious. Are there junk rows in our user table, retained for legacy reasons maybe? Is user_editcount inaccurate? Erik, can you describe the processing you perform to winnow down from 8.2 million?
J
On Tue, Oct 27, 2015 at 7:06 AM, Andrew Gray andrew.gray@dunelm.org.uk wrote:
Interesting - wonder why my query's giving a higher number?
I agree entirely that we should be very careful with quoting these figures. I think you'd probably be safe to say that more than a million people have edited... but even then I'd be cautious.
Andrew.
On 27 October 2015 at 11:11, Erik Zachte ezachte@wikimedia.org wrote:
Wikistats has it that 5,644,681 registered accounts published at least once till Oct 1, 2015, and 2,181,006 three or more times. It used to publish that on [1][2] but I just removed it.
I'm campaigning against us publishing overly inflated counts since about two years (Wikimania London).
Since this thread is going on and on, I'll repost my (reworded) reservations on this particular metric, for newcomers.
Even if we state explicitly that this is not unique people, any audience will think it may be close and we are overly correct by adding the caveat. It may not be so close. For that reason imo such a metric would be of questionable value, to put it mildly.
Pine:
Is there a way to get counts for the number of accounts, including or excluding IPs, that have ever edited English Wikipedia, ?
First the anon contributors: when we'd count every ip address that shows up in the dumps, we'd count *very* many people who were just vandalizing willfully, or just pressing edit for fun, or forgot to login once, and also moved from one ip address to another over the years. On top of that many people get a new ip address (from a pool) on every session, depends on provider policy.
As for registered editors the number Wikistats used to publish may be a rather empty metric for several reasons:
- How many casual editors will have forgotten their password and just created a new user id? Only veteran editors know about sockpuppeting and how one is supposed not to do that.
- How many people will have registered in good faith just out of habit, or to tweak presentation preferences, and then played with the edit button just to see what happens? Note that roughly 2 out of 3 accounts doesn't even reach 3 edits.
Cheers, Erik Zachte
[1] https://stats.wikimedia.org/EN/TablesWikipediaEN.htm#editdistribution [2] BTW I use the term wikipedians overly inclusive in that report. A person who edited once or twice isn't a wikipedian in my book, just like a person who writes two post-it notes per month and nothing else isn't called a writer. Some terms only apply above some threshold.
-----Original Message----- From: Analytics [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Andrew Gray Sent: Tuesday, October 27, 2015 11:06 To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Subject: Re: [Analytics] User statistics for video marking ENWP 5m article milestone
To a very crude approximation, there are approximately 8.2 million accounts which have at least one edit on English Wikipedia - at least assuming my SQL query is correct! http://quarry.wmflabs.org/query/1911
This is all user accounts with one or more edits in the contributions record; it does not contain IPs, and it does not contain any accounts whose sole contributions have since been deleted (which is probably quite a substantial number). Conversely, it includes a vast panoply of single-use vandalism accounts, sockpuppets, etc etc etc. And bots, of course.
Andrew.
On 27 October 2015 at 05:50, Pine W wiki.pine@gmail.com wrote:
Is there a way to get counts for the number of accounts, including or excluding IPs, that have ever edited English Wikipedia, ? It would be preferable to know the number of unique people, but of course that's impossible.
Thanks, Pine
Aha, that is important for me to know. Thanks Andrew.
Pine
On Thu, Sep 17, 2015 at 11:07 AM, Andrew Gray andrew.gray@dunelm.org.uk wrote:
On 11 September 2015 at 19:19, James Forrester jforrester@wikimedia.org wrote:
Does it include editors on all Wikimedia projects
No.
or just those who have registered and/or edited on ENWP?
Registered, regardless of having edited.
James is of course correct, but one small caveat worth adding: because of SUL, a substantial proportion of these will be "autocreated" accounts from other projects - so even 'registration' may not mean what it seems.
--
- Andrew Gray andrew.gray@dunelm.org.uk
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
--
- Andrew Gray andrew.gray@dunelm.org.uk
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- - Andrew Gray andrew.gray@dunelm.org.uk
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
I don't agree. There are a lot of good-faith page creations that get deleted every day. There are also many edits that get reverted. Arguably, those edits aren't productive either, but they don't disappear from the dumps like article drafts do. This is a messy filter at best.
On Tue, Oct 27, 2015 at 10:28 AM, Erik Zachte ezachte@wikimedia.org wrote:
As Aaron says. I'd like to add that if almost 3 million accounts disappeared from the dumps alltogether (vandals? school kids?) that makes the case for not using such a count even more convincing.
Erik
*From:* Analytics [mailto:analytics-bounces@lists.wikimedia.org] *On Behalf Of *Aaron Halfaker *Sent:* Tuesday, October 27, 2015 15:48 *To:* A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. *Subject:* Re: [Analytics] [Spam] Re: User statistics for video marking ENWP 5m article milestone
user_editcount includes edits to deleted pages and revdeleted edits. Erik's perl scripts use the XML dumps that do not include edits to deleted pages.
Strictly speaking, user_editcount is a better proxy for the number of people who have "ever edited". Erik's is the number of people whose edits appear in the history of a page at the time of an XML dump.
-Aaron
On Tue, Oct 27, 2015 at 9:34 AM, Jonathan Morgan jmorgan@wikimedia.org wrote:
I also wonder about this discrepancy. I ran a more explicit version of Andrew query, trying to eliminate some possible edge cases, and came up with the same number.
Now I'm curious. Are there junk rows in our user table, retained for legacy reasons maybe? Is user_editcount inaccurate? Erik, can you describe the processing you perform to winnow down from 8.2 million?
J
On Tue, Oct 27, 2015 at 7:06 AM, Andrew Gray andrew.gray@dunelm.org.uk wrote:
Interesting - wonder why my query's giving a higher number?
I agree entirely that we should be very careful with quoting these figures. I think you'd probably be safe to say that more than a million people have edited... but even then I'd be cautious.
Andrew.
On 27 October 2015 at 11:11, Erik Zachte ezachte@wikimedia.org wrote:
Wikistats has it that 5,644,681 registered accounts published at least
once till Oct 1, 2015, and 2,181,006 three or more times.
It used to publish that on [1][2] but I just removed it.
I'm campaigning against us publishing overly inflated counts since about
two years (Wikimania London).
Since this thread is going on and on, I'll repost my (reworded)
reservations on this particular metric, for newcomers.
Even if we state explicitly that this is not unique people, any audience
will think it may be close and we are overly correct by adding the caveat. It may not be so close. For that reason imo such a metric would be of questionable value, to put it mildly.
Pine:
Is there a way to get counts for the number of accounts, including or
excluding IPs, that have ever edited English Wikipedia, ?
First the anon contributors: when we'd count every ip address that shows
up in the dumps, we'd count *very* many people who were just vandalizing willfully, or just pressing edit for fun, or forgot to login once, and also moved from one ip address to another over the years. On top of that many people get a new ip address (from a pool) on every session, depends on provider policy.
As for registered editors the number Wikistats used to publish may be a
rather empty metric for several reasons:
- How many casual editors will have forgotten their password and just
created a new user id? Only veteran editors know about sockpuppeting and how one is supposed not to do that.
- How many people will have registered in good faith just out of habit,
or to tweak presentation preferences, and then played with the edit button just to see what happens? Note that roughly 2 out of 3 accounts doesn't even reach 3 edits.
Cheers, Erik Zachte
[1]
https://stats.wikimedia.org/EN/TablesWikipediaEN.htm#editdistribution
[2] BTW I use the term wikipedians overly inclusive in that report. A
person who edited once or twice isn't a wikipedian in my book, just like a person who writes two post-it notes per month and nothing else isn't called a writer. Some terms only apply above some threshold.
-----Original Message----- From: Analytics [mailto:analytics-bounces@lists.wikimedia.org] On
Behalf Of Andrew Gray
Sent: Tuesday, October 27, 2015 11:06 To: A mailing list for the Analytics Team at WMF and everybody who has
an interest in Wikipedia and analytics.
Subject: Re: [Analytics] User statistics for video marking ENWP 5m
article milestone
To a very crude approximation, there are approximately 8.2 million
accounts which have at least one edit on English Wikipedia - at least assuming my SQL query is correct! http://quarry.wmflabs.org/query/1911
This is all user accounts with one or more edits in the contributions
record; it does not contain IPs, and it does not contain any accounts whose sole contributions have since been deleted (which is probably quite a substantial number). Conversely, it includes a vast panoply of single-use vandalism accounts, sockpuppets, etc etc etc. And bots, of course.
Andrew.
On 27 October 2015 at 05:50, Pine W wiki.pine@gmail.com wrote:
Is there a way to get counts for the number of accounts, including or excluding IPs, that have ever edited English Wikipedia, ? It would be preferable to know the number of unique people, but of course that's impossible.
Thanks, Pine
Aha, that is important for me to know. Thanks Andrew.
Pine
On Thu, Sep 17, 2015 at 11:07 AM, Andrew Gray andrew.gray@dunelm.org.uk wrote:
On 11 September 2015 at 19:19, James Forrester jforrester@wikimedia.org wrote:
Does it include editors on all Wikimedia projects
No.
or just those who have registered and/or edited on ENWP?
Registered, regardless of having edited.
James is of course correct, but one small caveat worth adding: because of SUL, a substantial proportion of these will be "autocreated" accounts from other projects - so even 'registration' may not mean what it seems.
--
- Andrew Gray andrew.gray@dunelm.org.uk
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
--
- Andrew Gray andrew.gray@dunelm.org.uk
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
--
- Andrew Gray andrew.gray@dunelm.org.uk
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
--
Jonathan T. Morgan
Senior Design Researcher
Wikimedia Foundation
User:Jmorgan (WMF) https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
I do agree that we reject good contributions. I also agree this is a messy filter.
The main point however is do we want to communicate to the general public using such messy, fuzzy, inflated (partially), hard to not misunderstand numbers?
We have a history of using vanity metrics (800+ wikis, 280+ Wikipedias). Not untrue in some very formal sense, but totally misleading in that they play on expectations which are totally false.
Erik
From: Analytics [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Aaron Halfaker Sent: Tuesday, October 27, 2015 16:41 To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Subject: Re: [Analytics] [Spam] Re: User statistics for video marking ENWP 5m article milestone
I don't agree. There are a lot of good-faith page creations that get deleted every day. There are also many edits that get reverted. Arguably, those edits aren't productive either, but they don't disappear from the dumps like article drafts do. This is a messy filter at best.
On Tue, Oct 27, 2015 at 10:28 AM, Erik Zachte ezachte@wikimedia.org wrote:
As Aaron says. I'd like to add that if almost 3 million accounts disappeared from the dumps alltogether (vandals? school kids?) that makes the case for not using such a count even more convincing.
Erik
From: Analytics [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Aaron Halfaker Sent: Tuesday, October 27, 2015 15:48 To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Subject: Re: [Analytics] [Spam] Re: User statistics for video marking ENWP 5m article milestone
user_editcount includes edits to deleted pages and revdeleted edits. Erik's perl scripts use the XML dumps that do not include edits to deleted pages.
Strictly speaking, user_editcount is a better proxy for the number of people who have "ever edited". Erik's is the number of people whose edits appear in the history of a page at the time of an XML dump.
-Aaron
On Tue, Oct 27, 2015 at 9:34 AM, Jonathan Morgan jmorgan@wikimedia.org wrote:
I also wonder about this discrepancy. I ran a more explicit version of Andrew query, trying to eliminate some possible edge cases, and came up with the same number.
Now I'm curious. Are there junk rows in our user table, retained for legacy reasons maybe? Is user_editcount inaccurate? Erik, can you describe the processing you perform to winnow down from 8.2 million?
J
On Tue, Oct 27, 2015 at 7:06 AM, Andrew Gray andrew.gray@dunelm.org.uk wrote:
Interesting - wonder why my query's giving a higher number?
I agree entirely that we should be very careful with quoting these figures. I think you'd probably be safe to say that more than a million people have edited... but even then I'd be cautious.
Andrew.
On 27 October 2015 at 11:11, Erik Zachte ezachte@wikimedia.org wrote:
Wikistats has it that 5,644,681 registered accounts published at least once till Oct 1, 2015, and 2,181,006 three or more times. It used to publish that on [1][2] but I just removed it.
I'm campaigning against us publishing overly inflated counts since about two years (Wikimania London).
Since this thread is going on and on, I'll repost my (reworded) reservations on this particular metric, for newcomers.
Even if we state explicitly that this is not unique people, any audience will think it may be close and we are overly correct by adding the caveat. It may not be so close. For that reason imo such a metric would be of questionable value, to put it mildly.
Pine:
Is there a way to get counts for the number of accounts, including or excluding IPs, that have ever edited English Wikipedia, ?
First the anon contributors: when we'd count every ip address that shows up in the dumps, we'd count *very* many people who were just vandalizing willfully, or just pressing edit for fun, or forgot to login once, and also moved from one ip address to another over the years. On top of that many people get a new ip address (from a pool) on every session, depends on provider policy.
As for registered editors the number Wikistats used to publish may be a rather empty metric for several reasons:
- How many casual editors will have forgotten their password and just created a new user id? Only veteran editors know about sockpuppeting and how one is supposed not to do that.
- How many people will have registered in good faith just out of habit, or to tweak presentation preferences, and then played with the edit button just to see what happens? Note that roughly 2 out of 3 accounts doesn't even reach 3 edits.
Cheers, Erik Zachte
[1] https://stats.wikimedia.org/EN/TablesWikipediaEN.htm#editdistribution [2] BTW I use the term wikipedians overly inclusive in that report. A person who edited once or twice isn't a wikipedian in my book, just like a person who writes two post-it notes per month and nothing else isn't called a writer. Some terms only apply above some threshold.
-----Original Message----- From: Analytics [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Andrew Gray Sent: Tuesday, October 27, 2015 11:06 To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Subject: Re: [Analytics] User statistics for video marking ENWP 5m article milestone
To a very crude approximation, there are approximately 8.2 million accounts which have at least one edit on English Wikipedia - at least assuming my SQL query is correct! http://quarry.wmflabs.org/query/1911
This is all user accounts with one or more edits in the contributions record; it does not contain IPs, and it does not contain any accounts whose sole contributions have since been deleted (which is probably quite a substantial number). Conversely, it includes a vast panoply of single-use vandalism accounts, sockpuppets, etc etc etc. And bots, of course.
Andrew.
On 27 October 2015 at 05:50, Pine W wiki.pine@gmail.com wrote:
Is there a way to get counts for the number of accounts, including or excluding IPs, that have ever edited English Wikipedia, ? It would be preferable to know the number of unique people, but of course that's impossible.
Thanks, Pine
Aha, that is important for me to know. Thanks Andrew.
Pine
On Thu, Sep 17, 2015 at 11:07 AM, Andrew Gray andrew.gray@dunelm.org.uk wrote:
On 11 September 2015 at 19:19, James Forrester jforrester@wikimedia.org wrote:
Does it include editors on all Wikimedia projects
No.
or just those who have registered and/or edited on ENWP?
Registered, regardless of having edited.
James is of course correct, but one small caveat worth adding: because of SUL, a substantial proportion of these will be "autocreated" accounts from other projects - so even 'registration' may not mean what it seems.
--
- Andrew Gray andrew.gray@dunelm.org.uk
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
--
- Andrew Gray andrew.gray@dunelm.org.uk
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- - Andrew Gray andrew.gray@dunelm.org.uk
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Given the large margins of error I won't quote a specific number. Is it safe to say that the total number is in the millions or would it be safer, although regrettably vague, to use a word like "countless" or "multitude"?
Pine
Wikipedia isn't special in that people's participation is a long tail (few people do quite a bit and many people do almost nothing). Basically any community (online or offline) has this dynamic[1]. If we say, "# accounts have registered and saved an edit", we're not inflating and the number isn't "fuzzy" in meaning.
Let's say that we set the threshold for inclusion higher -- e.g. to 5+ edits per month. The funny thing about power laws (this variant of long tail) is that they are self-similar. Mathematically, the distribution looks exactly the same when you cut off everyone who made less than 5 edits. There's no clear "truth" to setting the threshold higher and we've shown that it doesn't affect the overall trends that we observe.
If we want to critique how we communicate about something, we can't do it in such general terms as "use 5+ edits". We need to know what meaning is intended to be expressed. Only within the context of "meaning" can we talk about "deception" and "misunderstanding". As an empiricist, I'd like to challenge the speculation about the low competencies of our audience.
So, if we're going to communicate how people contribute to Wikipedia and not "mislead", we're going to need to give people a primer on powerlaws of participation and discuss the implications of the best fit pareto index https://en.wikipedia.org/wiki/Pareto_index for Wikipedia edits. I don't think that such a discussion of the limitations of simple metrics is tractable in such communications. Further, I don't think our audience wants it. Sometimes you just want a quick stat to get a sense of the scale. How many countries are there on Earth? 196 I'm sure some of those countries are much larger than others. Some are really just cities with interesting political situations. I still enjoy knowing that the answer is 196 and I don't feel like I have been mislead.
1. http://www.hpl.hp.com/research/scl/papers/regularities/regularities.pdf
-Aaron
On Tue, Oct 27, 2015 at 11:03 AM, Erik Zachte ezachte@wikimedia.org wrote:
I do agree that we reject good contributions. I also agree this is a messy filter.
The main point however is do we want to communicate to the general public using such messy, fuzzy, inflated (partially), hard to not misunderstand numbers?
We have a history of using vanity metrics (800+ wikis, 280+ Wikipedias). Not untrue in some very formal sense, but totally misleading in that they play on expectations which are totally false.
Erik
*From:* Analytics [mailto:analytics-bounces@lists.wikimedia.org] *On Behalf Of *Aaron Halfaker *Sent:* Tuesday, October 27, 2015 16:41
*To:* A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. *Subject:* Re: [Analytics] [Spam] Re: User statistics for video marking ENWP 5m article milestone
I don't agree. There are a lot of good-faith page creations that get deleted every day. There are also many edits that get reverted. Arguably, those edits aren't productive either, but they don't disappear from the dumps like article drafts do. This is a messy filter at best.
On Tue, Oct 27, 2015 at 10:28 AM, Erik Zachte ezachte@wikimedia.org wrote:
As Aaron says. I'd like to add that if almost 3 million accounts disappeared from the dumps alltogether (vandals? school kids?) that makes the case for not using such a count even more convincing.
Erik
*From:* Analytics [mailto:analytics-bounces@lists.wikimedia.org] *On Behalf Of *Aaron Halfaker *Sent:* Tuesday, October 27, 2015 15:48 *To:* A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. *Subject:* Re: [Analytics] [Spam] Re: User statistics for video marking ENWP 5m article milestone
user_editcount includes edits to deleted pages and revdeleted edits. Erik's perl scripts use the XML dumps that do not include edits to deleted pages.
Strictly speaking, user_editcount is a better proxy for the number of people who have "ever edited". Erik's is the number of people whose edits appear in the history of a page at the time of an XML dump.
-Aaron
On Tue, Oct 27, 2015 at 9:34 AM, Jonathan Morgan jmorgan@wikimedia.org wrote:
I also wonder about this discrepancy. I ran a more explicit version of Andrew query, trying to eliminate some possible edge cases, and came up with the same number.
Now I'm curious. Are there junk rows in our user table, retained for legacy reasons maybe? Is user_editcount inaccurate? Erik, can you describe the processing you perform to winnow down from 8.2 million?
J
On Tue, Oct 27, 2015 at 7:06 AM, Andrew Gray andrew.gray@dunelm.org.uk wrote:
Interesting - wonder why my query's giving a higher number?
I agree entirely that we should be very careful with quoting these figures. I think you'd probably be safe to say that more than a million people have edited... but even then I'd be cautious.
Andrew.
On 27 October 2015 at 11:11, Erik Zachte ezachte@wikimedia.org wrote:
Wikistats has it that 5,644,681 registered accounts published at least
once till Oct 1, 2015, and 2,181,006 three or more times.
It used to publish that on [1][2] but I just removed it.
I'm campaigning against us publishing overly inflated counts since about
two years (Wikimania London).
Since this thread is going on and on, I'll repost my (reworded)
reservations on this particular metric, for newcomers.
Even if we state explicitly that this is not unique people, any audience
will think it may be close and we are overly correct by adding the caveat. It may not be so close. For that reason imo such a metric would be of questionable value, to put it mildly.
Pine:
Is there a way to get counts for the number of accounts, including or
excluding IPs, that have ever edited English Wikipedia, ?
First the anon contributors: when we'd count every ip address that shows
up in the dumps, we'd count *very* many people who were just vandalizing willfully, or just pressing edit for fun, or forgot to login once, and also moved from one ip address to another over the years. On top of that many people get a new ip address (from a pool) on every session, depends on provider policy.
As for registered editors the number Wikistats used to publish may be a
rather empty metric for several reasons:
- How many casual editors will have forgotten their password and just
created a new user id? Only veteran editors know about sockpuppeting and how one is supposed not to do that.
- How many people will have registered in good faith just out of habit,
or to tweak presentation preferences, and then played with the edit button just to see what happens? Note that roughly 2 out of 3 accounts doesn't even reach 3 edits.
Cheers, Erik Zachte
[1]
https://stats.wikimedia.org/EN/TablesWikipediaEN.htm#editdistribution
[2] BTW I use the term wikipedians overly inclusive in that report. A
person who edited once or twice isn't a wikipedian in my book, just like a person who writes two post-it notes per month and nothing else isn't called a writer. Some terms only apply above some threshold.
-----Original Message----- From: Analytics [mailto:analytics-bounces@lists.wikimedia.org] On
Behalf Of Andrew Gray
Sent: Tuesday, October 27, 2015 11:06 To: A mailing list for the Analytics Team at WMF and everybody who has
an interest in Wikipedia and analytics.
Subject: Re: [Analytics] User statistics for video marking ENWP 5m
article milestone
To a very crude approximation, there are approximately 8.2 million
accounts which have at least one edit on English Wikipedia - at least assuming my SQL query is correct! http://quarry.wmflabs.org/query/1911
This is all user accounts with one or more edits in the contributions
record; it does not contain IPs, and it does not contain any accounts whose sole contributions have since been deleted (which is probably quite a substantial number). Conversely, it includes a vast panoply of single-use vandalism accounts, sockpuppets, etc etc etc. And bots, of course.
Andrew.
On 27 October 2015 at 05:50, Pine W wiki.pine@gmail.com wrote:
Is there a way to get counts for the number of accounts, including or excluding IPs, that have ever edited English Wikipedia, ? It would be preferable to know the number of unique people, but of course that's impossible.
Thanks, Pine
Aha, that is important for me to know. Thanks Andrew.
Pine
On Thu, Sep 17, 2015 at 11:07 AM, Andrew Gray andrew.gray@dunelm.org.uk wrote:
On 11 September 2015 at 19:19, James Forrester jforrester@wikimedia.org wrote:
Does it include editors on all Wikimedia projects
No.
or just those who have registered and/or edited on ENWP?
Registered, regardless of having edited.
James is of course correct, but one small caveat worth adding: because of SUL, a substantial proportion of these will be "autocreated" accounts from other projects - so even 'registration' may not mean what it seems.
--
- Andrew Gray andrew.gray@dunelm.org.uk
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
--
- Andrew Gray andrew.gray@dunelm.org.uk
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
--
- Andrew Gray andrew.gray@dunelm.org.uk
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
--
Jonathan T. Morgan
Senior Design Researcher
Wikimedia Foundation
User:Jmorgan (WMF) https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
On Tue, Oct 27, 2015 at 9:56 AM, Aaron Halfaker ahalfaker@wikimedia.org wrote:
If we want to critique how we communicate about something, we can't do it in such general terms as "use 5+ edits". We need to know what meaning is intended to be expressed. Only within the context of "meaning" can we talk about "deception" and "misunderstanding". As an empiricist, I'd like to challenge the speculation about the low competencies of our audience.
For the purpose of the 5M report, our audience is a very large audience coming from very different walks of life, the report will be translated in many languages and will be read worldwide. We are not challenging the competency of our audience, instead we are trying to find a way to assist more of our audience to hear a story closer to the real story.
So, if we're going to communicate how people contribute to Wikipedia and not "mislead", we're going to need to give people a primer on powerlaws of participation and discuss the implications of the best fit pareto index https://en.wikipedia.org/wiki/Pareto_index for Wikipedia edits.
That's one option, but that's too hard to the extent that is impossible. The suggestion is that we do better with the understanding that many people will still not get the full picture, but many more will know a story that is closer to the reality of Wikipedia.
Leila
-Aaron
On Tue, Oct 27, 2015 at 11:03 AM, Erik Zachte ezachte@wikimedia.org wrote:
I do agree that we reject good contributions. I also agree this is a messy filter.
The main point however is do we want to communicate to the general public using such messy, fuzzy, inflated (partially), hard to not misunderstand numbers?
We have a history of using vanity metrics (800+ wikis, 280+ Wikipedias). Not untrue in some very formal sense, but totally misleading in that they play on expectations which are totally false.
Erik
*From:* Analytics [mailto:analytics-bounces@lists.wikimedia.org] *On Behalf Of *Aaron Halfaker *Sent:* Tuesday, October 27, 2015 16:41
*To:* A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. *Subject:* Re: [Analytics] [Spam] Re: User statistics for video marking ENWP 5m article milestone
I don't agree. There are a lot of good-faith page creations that get deleted every day. There are also many edits that get reverted. Arguably, those edits aren't productive either, but they don't disappear from the dumps like article drafts do. This is a messy filter at best.
On Tue, Oct 27, 2015 at 10:28 AM, Erik Zachte ezachte@wikimedia.org wrote:
As Aaron says. I'd like to add that if almost 3 million accounts disappeared from the dumps alltogether (vandals? school kids?) that makes the case for not using such a count even more convincing.
Erik
*From:* Analytics [mailto:analytics-bounces@lists.wikimedia.org] *On Behalf Of *Aaron Halfaker *Sent:* Tuesday, October 27, 2015 15:48 *To:* A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. *Subject:* Re: [Analytics] [Spam] Re: User statistics for video marking ENWP 5m article milestone
user_editcount includes edits to deleted pages and revdeleted edits. Erik's perl scripts use the XML dumps that do not include edits to deleted pages.
Strictly speaking, user_editcount is a better proxy for the number of people who have "ever edited". Erik's is the number of people whose edits appear in the history of a page at the time of an XML dump.
-Aaron
On Tue, Oct 27, 2015 at 9:34 AM, Jonathan Morgan jmorgan@wikimedia.org wrote:
I also wonder about this discrepancy. I ran a more explicit version of Andrew query, trying to eliminate some possible edge cases, and came up with the same number.
Now I'm curious. Are there junk rows in our user table, retained for legacy reasons maybe? Is user_editcount inaccurate? Erik, can you describe the processing you perform to winnow down from 8.2 million?
J
On Tue, Oct 27, 2015 at 7:06 AM, Andrew Gray andrew.gray@dunelm.org.uk wrote:
Interesting - wonder why my query's giving a higher number?
I agree entirely that we should be very careful with quoting these figures. I think you'd probably be safe to say that more than a million people have edited... but even then I'd be cautious.
Andrew.
On 27 October 2015 at 11:11, Erik Zachte ezachte@wikimedia.org wrote:
Wikistats has it that 5,644,681 registered accounts published at least
once till Oct 1, 2015, and 2,181,006 three or more times.
It used to publish that on [1][2] but I just removed it.
I'm campaigning against us publishing overly inflated counts since
about two years (Wikimania London).
Since this thread is going on and on, I'll repost my (reworded)
reservations on this particular metric, for newcomers.
Even if we state explicitly that this is not unique people, any
audience will think it may be close and we are overly correct by adding the caveat. It may not be so close. For that reason imo such a metric would be of questionable value, to put it mildly.
Pine:
Is there a way to get counts for the number of accounts, including or
excluding IPs, that have ever edited English Wikipedia, ?
First the anon contributors: when we'd count every ip address that
shows up in the dumps, we'd count *very* many people who were just vandalizing willfully, or just pressing edit for fun, or forgot to login once, and also moved from one ip address to another over the years. On top of that many people get a new ip address (from a pool) on every session, depends on provider policy.
As for registered editors the number Wikistats used to publish may be a
rather empty metric for several reasons:
- How many casual editors will have forgotten their password and just
created a new user id? Only veteran editors know about sockpuppeting and how one is supposed not to do that.
- How many people will have registered in good faith just out of habit,
or to tweak presentation preferences, and then played with the edit button just to see what happens? Note that roughly 2 out of 3 accounts doesn't even reach 3 edits.
Cheers, Erik Zachte
[1]
https://stats.wikimedia.org/EN/TablesWikipediaEN.htm#editdistribution
[2] BTW I use the term wikipedians overly inclusive in that report. A
person who edited once or twice isn't a wikipedian in my book, just like a person who writes two post-it notes per month and nothing else isn't called a writer. Some terms only apply above some threshold.
-----Original Message----- From: Analytics [mailto:analytics-bounces@lists.wikimedia.org] On
Behalf Of Andrew Gray
Sent: Tuesday, October 27, 2015 11:06 To: A mailing list for the Analytics Team at WMF and everybody who has
an interest in Wikipedia and analytics.
Subject: Re: [Analytics] User statistics for video marking ENWP 5m
article milestone
To a very crude approximation, there are approximately 8.2 million
accounts which have at least one edit on English Wikipedia - at least assuming my SQL query is correct! http://quarry.wmflabs.org/query/1911
This is all user accounts with one or more edits in the contributions
record; it does not contain IPs, and it does not contain any accounts whose sole contributions have since been deleted (which is probably quite a substantial number). Conversely, it includes a vast panoply of single-use vandalism accounts, sockpuppets, etc etc etc. And bots, of course.
Andrew.
On 27 October 2015 at 05:50, Pine W wiki.pine@gmail.com wrote:
Is there a way to get counts for the number of accounts, including or excluding IPs, that have ever edited English Wikipedia, ? It would be preferable to know the number of unique people, but of course that's impossible.
Thanks, Pine
Aha, that is important for me to know. Thanks Andrew.
Pine
On Thu, Sep 17, 2015 at 11:07 AM, Andrew Gray andrew.gray@dunelm.org.uk wrote:
On 11 September 2015 at 19:19, James Forrester jforrester@wikimedia.org wrote:
> Does it include editors on all Wikimedia projects
No.
> or just those who have registered and/or edited on ENWP?
Registered, regardless of having edited.
James is of course correct, but one small caveat worth adding: because of SUL, a substantial proportion of these will be
"autocreated"
accounts from other projects - so even 'registration' may not mean what it seems.
--
- Andrew Gray andrew.gray@dunelm.org.uk
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
--
- Andrew Gray andrew.gray@dunelm.org.uk
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
--
- Andrew Gray andrew.gray@dunelm.org.uk
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
--
Jonathan T. Morgan
Senior Design Researcher
Wikimedia Foundation
User:Jmorgan (WMF) https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Aaron, the example of countries doesn't seem fitting for me. In many bodies like UN all countries have one vote, and small countries are disproportionaly powerful. That's part of why 196 has meaning.
Let me put it this way, if you think our 800+ wikis and 280+ Wikipedias story is not misleading, then we're pretty far apart on what constitutes meaningful communication.
Erik
From: Analytics [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Leila Zia Sent: Tuesday, October 27, 2015 18:11 To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Subject: Re: [Analytics] [Spam] Re: User statistics for video marking ENWP 5m article milestone
On Tue, Oct 27, 2015 at 9:56 AM, Aaron Halfaker ahalfaker@wikimedia.org wrote:
If we want to critique how we communicate about something, we can't do it in such general terms as "use 5+ edits". We need to know what meaning is intended to be expressed. Only within the context of "meaning" can we talk about "deception" and "misunderstanding". As an empiricist, I'd like to challenge the speculation about the low competencies of our audience.
For the purpose of the 5M report, our audience is a very large audience coming from very different walks of life, the report will be translated in many languages and will be read worldwide. We are not challenging the competency of our audience, instead we are trying to find a way to assist more of our audience to hear a story closer to the real story.
So, if we're going to communicate how people contribute to Wikipedia and not "mislead", we're going to need to give people a primer on powerlaws of participation and discuss the implications of the best fit pareto index https://en.wikipedia.org/wiki/Pareto_index for Wikipedia edits.
That's one option, but that's too hard to the extent that is impossible. The suggestion is that we do better with the understanding that many people will still not get the full picture, but many more will know a story that is closer to the reality of Wikipedia.
Leila
-Aaron
On Tue, Oct 27, 2015 at 11:03 AM, Erik Zachte ezachte@wikimedia.org wrote:
I do agree that we reject good contributions. I also agree this is a messy filter.
The main point however is do we want to communicate to the general public using such messy, fuzzy, inflated (partially), hard to not misunderstand numbers?
We have a history of using vanity metrics (800+ wikis, 280+ Wikipedias). Not untrue in some very formal sense, but totally misleading in that they play on expectations which are totally false.
Erik
From: Analytics [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Aaron Halfaker Sent: Tuesday, October 27, 2015 16:41
To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Subject: Re: [Analytics] [Spam] Re: User statistics for video marking ENWP 5m article milestone
I don't agree. There are a lot of good-faith page creations that get deleted every day. There are also many edits that get reverted. Arguably, those edits aren't productive either, but they don't disappear from the dumps like article drafts do. This is a messy filter at best.
On Tue, Oct 27, 2015 at 10:28 AM, Erik Zachte ezachte@wikimedia.org wrote:
As Aaron says. I'd like to add that if almost 3 million accounts disappeared from the dumps alltogether (vandals? school kids?) that makes the case for not using such a count even more convincing.
Erik
From: Analytics [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Aaron Halfaker Sent: Tuesday, October 27, 2015 15:48 To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Subject: Re: [Analytics] [Spam] Re: User statistics for video marking ENWP 5m article milestone
user_editcount includes edits to deleted pages and revdeleted edits. Erik's perl scripts use the XML dumps that do not include edits to deleted pages.
Strictly speaking, user_editcount is a better proxy for the number of people who have "ever edited". Erik's is the number of people whose edits appear in the history of a page at the time of an XML dump.
-Aaron
On Tue, Oct 27, 2015 at 9:34 AM, Jonathan Morgan jmorgan@wikimedia.org wrote:
I also wonder about this discrepancy. I ran a more explicit version of Andrew query, trying to eliminate some possible edge cases, and came up with the same number.
Now I'm curious. Are there junk rows in our user table, retained for legacy reasons maybe? Is user_editcount inaccurate? Erik, can you describe the processing you perform to winnow down from 8.2 million?
J
On Tue, Oct 27, 2015 at 7:06 AM, Andrew Gray andrew.gray@dunelm.org.uk wrote:
Interesting - wonder why my query's giving a higher number?
I agree entirely that we should be very careful with quoting these figures. I think you'd probably be safe to say that more than a million people have edited... but even then I'd be cautious.
Andrew.
On 27 October 2015 at 11:11, Erik Zachte ezachte@wikimedia.org wrote:
Wikistats has it that 5,644,681 registered accounts published at least once till Oct 1, 2015, and 2,181,006 three or more times. It used to publish that on [1][2] but I just removed it.
I'm campaigning against us publishing overly inflated counts since about two years (Wikimania London).
Since this thread is going on and on, I'll repost my (reworded) reservations on this particular metric, for newcomers.
Even if we state explicitly that this is not unique people, any audience will think it may be close and we are overly correct by adding the caveat. It may not be so close. For that reason imo such a metric would be of questionable value, to put it mildly.
Pine:
Is there a way to get counts for the number of accounts, including or excluding IPs, that have ever edited English Wikipedia, ?
First the anon contributors: when we'd count every ip address that shows up in the dumps, we'd count *very* many people who were just vandalizing willfully, or just pressing edit for fun, or forgot to login once, and also moved from one ip address to another over the years. On top of that many people get a new ip address (from a pool) on every session, depends on provider policy.
As for registered editors the number Wikistats used to publish may be a rather empty metric for several reasons:
- How many casual editors will have forgotten their password and just created a new user id? Only veteran editors know about sockpuppeting and how one is supposed not to do that.
- How many people will have registered in good faith just out of habit, or to tweak presentation preferences, and then played with the edit button just to see what happens? Note that roughly 2 out of 3 accounts doesn't even reach 3 edits.
Cheers, Erik Zachte
[1] https://stats.wikimedia.org/EN/TablesWikipediaEN.htm#editdistribution [2] BTW I use the term wikipedians overly inclusive in that report. A person who edited once or twice isn't a wikipedian in my book, just like a person who writes two post-it notes per month and nothing else isn't called a writer. Some terms only apply above some threshold.
-----Original Message----- From: Analytics [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Andrew Gray Sent: Tuesday, October 27, 2015 11:06 To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Subject: Re: [Analytics] User statistics for video marking ENWP 5m article milestone
To a very crude approximation, there are approximately 8.2 million accounts which have at least one edit on English Wikipedia - at least assuming my SQL query is correct! http://quarry.wmflabs.org/query/1911
This is all user accounts with one or more edits in the contributions record; it does not contain IPs, and it does not contain any accounts whose sole contributions have since been deleted (which is probably quite a substantial number). Conversely, it includes a vast panoply of single-use vandalism accounts, sockpuppets, etc etc etc. And bots, of course.
Andrew.
On 27 October 2015 at 05:50, Pine W wiki.pine@gmail.com wrote:
Is there a way to get counts for the number of accounts, including or excluding IPs, that have ever edited English Wikipedia, ? It would be preferable to know the number of unique people, but of course that's impossible.
Thanks, Pine
Aha, that is important for me to know. Thanks Andrew.
Pine
On Thu, Sep 17, 2015 at 11:07 AM, Andrew Gray andrew.gray@dunelm.org.uk wrote:
On 11 September 2015 at 19:19, James Forrester jforrester@wikimedia.org wrote:
Does it include editors on all Wikimedia projects
No.
or just those who have registered and/or edited on ENWP?
Registered, regardless of having edited.
James is of course correct, but one small caveat worth adding: because of SUL, a substantial proportion of these will be "autocreated" accounts from other projects - so even 'registration' may not mean what it seems.
--
- Andrew Gray andrew.gray@dunelm.org.uk
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
--
- Andrew Gray andrew.gray@dunelm.org.uk
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- - Andrew Gray andrew.gray@dunelm.org.uk
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
800+ wikis and 280+ Wikipedias
I thought we were talking about editor counts here.
On Tue, Oct 27, 2015 at 12:18 PM, Erik Zachte ezachte@wikimedia.org wrote:
Aaron, the example of countries doesn't seem fitting for me. In many bodies like UN all countries have one vote, and small countries are disproportionaly powerful. That's part of why 196 has meaning.
Let me put it this way, if you think our 800+ wikis and 280+ Wikipedias story is not misleading, then we're pretty far apart on what constitutes meaningful communication.
Erik
*From:* Analytics [mailto:analytics-bounces@lists.wikimedia.org] *On Behalf Of *Leila Zia *Sent:* Tuesday, October 27, 2015 18:11
*To:* A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. *Subject:* Re: [Analytics] [Spam] Re: User statistics for video marking ENWP 5m article milestone
On Tue, Oct 27, 2015 at 9:56 AM, Aaron Halfaker ahalfaker@wikimedia.org wrote:
If we want to critique how we communicate about something, we can't do it in such general terms as "use 5+ edits". We need to know what meaning is intended to be expressed. Only within the context of "meaning" can we talk about "deception" and "misunderstanding". As an empiricist, I'd like to challenge the speculation about the low competencies of our audience.
For the purpose of the 5M report, our audience is a very large audience coming from very different walks of life, the report will be translated in many languages and will be read worldwide. We are not challenging the competency of our audience, instead we are trying to find a way to assist more of our audience to hear a story closer to the real story.
So, if we're going to communicate how people contribute to Wikipedia and not "mislead", we're going to need to give people a primer on powerlaws of participation and discuss the implications of the best fit pareto index https://en.wikipedia.org/wiki/Pareto_index for Wikipedia edits.
That's one option, but that's too hard to the extent that is impossible. The suggestion is that we do better with the understanding that many people will still not get the full picture, but many more will know a story that is closer to the reality of Wikipedia.
Leila
-Aaron
On Tue, Oct 27, 2015 at 11:03 AM, Erik Zachte ezachte@wikimedia.org wrote:
I do agree that we reject good contributions. I also agree this is a messy filter.
The main point however is do we want to communicate to the general public using such messy, fuzzy, inflated (partially), hard to not misunderstand numbers?
We have a history of using vanity metrics (800+ wikis, 280+ Wikipedias). Not untrue in some very formal sense, but totally misleading in that they play on expectations which are totally false.
Erik
*From:* Analytics [mailto:analytics-bounces@lists.wikimedia.org] *On Behalf Of *Aaron Halfaker *Sent:* Tuesday, October 27, 2015 16:41
*To:* A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. *Subject:* Re: [Analytics] [Spam] Re: User statistics for video marking ENWP 5m article milestone
I don't agree. There are a lot of good-faith page creations that get deleted every day. There are also many edits that get reverted. Arguably, those edits aren't productive either, but they don't disappear from the dumps like article drafts do. This is a messy filter at best.
On Tue, Oct 27, 2015 at 10:28 AM, Erik Zachte ezachte@wikimedia.org wrote:
As Aaron says. I'd like to add that if almost 3 million accounts disappeared from the dumps alltogether (vandals? school kids?) that makes the case for not using such a count even more convincing.
Erik
*From:* Analytics [mailto:analytics-bounces@lists.wikimedia.org] *On Behalf Of *Aaron Halfaker *Sent:* Tuesday, October 27, 2015 15:48 *To:* A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. *Subject:* Re: [Analytics] [Spam] Re: User statistics for video marking ENWP 5m article milestone
user_editcount includes edits to deleted pages and revdeleted edits. Erik's perl scripts use the XML dumps that do not include edits to deleted pages.
Strictly speaking, user_editcount is a better proxy for the number of people who have "ever edited". Erik's is the number of people whose edits appear in the history of a page at the time of an XML dump.
-Aaron
On Tue, Oct 27, 2015 at 9:34 AM, Jonathan Morgan jmorgan@wikimedia.org wrote:
I also wonder about this discrepancy. I ran a more explicit version of Andrew query, trying to eliminate some possible edge cases, and came up with the same number.
Now I'm curious. Are there junk rows in our user table, retained for legacy reasons maybe? Is user_editcount inaccurate? Erik, can you describe the processing you perform to winnow down from 8.2 million?
J
On Tue, Oct 27, 2015 at 7:06 AM, Andrew Gray andrew.gray@dunelm.org.uk wrote:
Interesting - wonder why my query's giving a higher number?
I agree entirely that we should be very careful with quoting these figures. I think you'd probably be safe to say that more than a million people have edited... but even then I'd be cautious.
Andrew.
On 27 October 2015 at 11:11, Erik Zachte ezachte@wikimedia.org wrote:
Wikistats has it that 5,644,681 registered accounts published at least
once till Oct 1, 2015, and 2,181,006 three or more times.
It used to publish that on [1][2] but I just removed it.
I'm campaigning against us publishing overly inflated counts since about
two years (Wikimania London).
Since this thread is going on and on, I'll repost my (reworded)
reservations on this particular metric, for newcomers.
Even if we state explicitly that this is not unique people, any audience
will think it may be close and we are overly correct by adding the caveat. It may not be so close. For that reason imo such a metric would be of questionable value, to put it mildly.
Pine:
Is there a way to get counts for the number of accounts, including or
excluding IPs, that have ever edited English Wikipedia, ?
First the anon contributors: when we'd count every ip address that shows
up in the dumps, we'd count *very* many people who were just vandalizing willfully, or just pressing edit for fun, or forgot to login once, and also moved from one ip address to another over the years. On top of that many people get a new ip address (from a pool) on every session, depends on provider policy.
As for registered editors the number Wikistats used to publish may be a
rather empty metric for several reasons:
- How many casual editors will have forgotten their password and just
created a new user id? Only veteran editors know about sockpuppeting and how one is supposed not to do that.
- How many people will have registered in good faith just out of habit,
or to tweak presentation preferences, and then played with the edit button just to see what happens? Note that roughly 2 out of 3 accounts doesn't even reach 3 edits.
Cheers, Erik Zachte
[1]
https://stats.wikimedia.org/EN/TablesWikipediaEN.htm#editdistribution
[2] BTW I use the term wikipedians overly inclusive in that report. A
person who edited once or twice isn't a wikipedian in my book, just like a person who writes two post-it notes per month and nothing else isn't called a writer. Some terms only apply above some threshold.
-----Original Message----- From: Analytics [mailto:analytics-bounces@lists.wikimedia.org] On
Behalf Of Andrew Gray
Sent: Tuesday, October 27, 2015 11:06 To: A mailing list for the Analytics Team at WMF and everybody who has
an interest in Wikipedia and analytics.
Subject: Re: [Analytics] User statistics for video marking ENWP 5m
article milestone
To a very crude approximation, there are approximately 8.2 million
accounts which have at least one edit on English Wikipedia - at least assuming my SQL query is correct! http://quarry.wmflabs.org/query/1911
This is all user accounts with one or more edits in the contributions
record; it does not contain IPs, and it does not contain any accounts whose sole contributions have since been deleted (which is probably quite a substantial number). Conversely, it includes a vast panoply of single-use vandalism accounts, sockpuppets, etc etc etc. And bots, of course.
Andrew.
On 27 October 2015 at 05:50, Pine W wiki.pine@gmail.com wrote:
Is there a way to get counts for the number of accounts, including or excluding IPs, that have ever edited English Wikipedia, ? It would be preferable to know the number of unique people, but of course that's impossible.
Thanks, Pine
Aha, that is important for me to know. Thanks Andrew.
Pine
On Thu, Sep 17, 2015 at 11:07 AM, Andrew Gray andrew.gray@dunelm.org.uk wrote:
On 11 September 2015 at 19:19, James Forrester jforrester@wikimedia.org wrote:
Does it include editors on all Wikimedia projects
No.
or just those who have registered and/or edited on ENWP?
Registered, regardless of having edited.
James is of course correct, but one small caveat worth adding: because of SUL, a substantial proportion of these will be "autocreated" accounts from other projects - so even 'registration' may not mean what it seems.
--
- Andrew Gray andrew.gray@dunelm.org.uk
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
--
- Andrew Gray andrew.gray@dunelm.org.uk
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
--
- Andrew Gray andrew.gray@dunelm.org.uk
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
--
Jonathan T. Morgan
Senior Design Researcher
Wikimedia Foundation
User:Jmorgan (WMF) https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Yes we were talking about editor counts, then we moved on to countries, that's what one calls an analogy ;-)
Erik
From: Analytics [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Aaron Halfaker Sent: Tuesday, October 27, 2015 18:19 To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Subject: Re: [Analytics] [Spam] Re: User statistics for video marking ENWP 5m article milestone
800+ wikis and 280+ Wikipedias
I thought we were talking about editor counts here.
On Tue, Oct 27, 2015 at 12:18 PM, Erik Zachte ezachte@wikimedia.org wrote:
Aaron, the example of countries doesn't seem fitting for me. In many bodies like UN all countries have one vote, and small countries are disproportionaly powerful. That's part of why 196 has meaning.
Let me put it this way, if you think our 800+ wikis and 280+ Wikipedias story is not misleading, then we're pretty far apart on what constitutes meaningful communication.
Erik
From: Analytics [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Leila Zia Sent: Tuesday, October 27, 2015 18:11
To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Subject: Re: [Analytics] [Spam] Re: User statistics for video marking ENWP 5m article milestone
On Tue, Oct 27, 2015 at 9:56 AM, Aaron Halfaker ahalfaker@wikimedia.org wrote:
If we want to critique how we communicate about something, we can't do it in such general terms as "use 5+ edits". We need to know what meaning is intended to be expressed. Only within the context of "meaning" can we talk about "deception" and "misunderstanding". As an empiricist, I'd like to challenge the speculation about the low competencies of our audience.
For the purpose of the 5M report, our audience is a very large audience coming from very different walks of life, the report will be translated in many languages and will be read worldwide. We are not challenging the competency of our audience, instead we are trying to find a way to assist more of our audience to hear a story closer to the real story.
So, if we're going to communicate how people contribute to Wikipedia and not "mislead", we're going to need to give people a primer on powerlaws of participation and discuss the implications of the best fit pareto index https://en.wikipedia.org/wiki/Pareto_index for Wikipedia edits.
That's one option, but that's too hard to the extent that is impossible. The suggestion is that we do better with the understanding that many people will still not get the full picture, but many more will know a story that is closer to the reality of Wikipedia.
Leila
-Aaron
On Tue, Oct 27, 2015 at 11:03 AM, Erik Zachte ezachte@wikimedia.org wrote:
I do agree that we reject good contributions. I also agree this is a messy filter.
The main point however is do we want to communicate to the general public using such messy, fuzzy, inflated (partially), hard to not misunderstand numbers?
We have a history of using vanity metrics (800+ wikis, 280+ Wikipedias). Not untrue in some very formal sense, but totally misleading in that they play on expectations which are totally false.
Erik
From: Analytics [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Aaron Halfaker Sent: Tuesday, October 27, 2015 16:41
To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Subject: Re: [Analytics] [Spam] Re: User statistics for video marking ENWP 5m article milestone
I don't agree. There are a lot of good-faith page creations that get deleted every day. There are also many edits that get reverted. Arguably, those edits aren't productive either, but they don't disappear from the dumps like article drafts do. This is a messy filter at best.
On Tue, Oct 27, 2015 at 10:28 AM, Erik Zachte ezachte@wikimedia.org wrote:
As Aaron says. I'd like to add that if almost 3 million accounts disappeared from the dumps alltogether (vandals? school kids?) that makes the case for not using such a count even more convincing.
Erik
From: Analytics [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Aaron Halfaker Sent: Tuesday, October 27, 2015 15:48 To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Subject: Re: [Analytics] [Spam] Re: User statistics for video marking ENWP 5m article milestone
user_editcount includes edits to deleted pages and revdeleted edits. Erik's perl scripts use the XML dumps that do not include edits to deleted pages.
Strictly speaking, user_editcount is a better proxy for the number of people who have "ever edited". Erik's is the number of people whose edits appear in the history of a page at the time of an XML dump.
-Aaron
On Tue, Oct 27, 2015 at 9:34 AM, Jonathan Morgan jmorgan@wikimedia.org wrote:
I also wonder about this discrepancy. I ran a more explicit version of Andrew query, trying to eliminate some possible edge cases, and came up with the same number.
Now I'm curious. Are there junk rows in our user table, retained for legacy reasons maybe? Is user_editcount inaccurate? Erik, can you describe the processing you perform to winnow down from 8.2 million?
J
On Tue, Oct 27, 2015 at 7:06 AM, Andrew Gray andrew.gray@dunelm.org.uk wrote:
Interesting - wonder why my query's giving a higher number?
I agree entirely that we should be very careful with quoting these figures. I think you'd probably be safe to say that more than a million people have edited... but even then I'd be cautious.
Andrew.
On 27 October 2015 at 11:11, Erik Zachte ezachte@wikimedia.org wrote:
Wikistats has it that 5,644,681 registered accounts published at least once till Oct 1, 2015, and 2,181,006 three or more times. It used to publish that on [1][2] but I just removed it.
I'm campaigning against us publishing overly inflated counts since about two years (Wikimania London).
Since this thread is going on and on, I'll repost my (reworded) reservations on this particular metric, for newcomers.
Even if we state explicitly that this is not unique people, any audience will think it may be close and we are overly correct by adding the caveat. It may not be so close. For that reason imo such a metric would be of questionable value, to put it mildly.
Pine:
Is there a way to get counts for the number of accounts, including or excluding IPs, that have ever edited English Wikipedia, ?
First the anon contributors: when we'd count every ip address that shows up in the dumps, we'd count *very* many people who were just vandalizing willfully, or just pressing edit for fun, or forgot to login once, and also moved from one ip address to another over the years. On top of that many people get a new ip address (from a pool) on every session, depends on provider policy.
As for registered editors the number Wikistats used to publish may be a rather empty metric for several reasons:
- How many casual editors will have forgotten their password and just created a new user id? Only veteran editors know about sockpuppeting and how one is supposed not to do that.
- How many people will have registered in good faith just out of habit, or to tweak presentation preferences, and then played with the edit button just to see what happens? Note that roughly 2 out of 3 accounts doesn't even reach 3 edits.
Cheers, Erik Zachte
[1] https://stats.wikimedia.org/EN/TablesWikipediaEN.htm#editdistribution [2] BTW I use the term wikipedians overly inclusive in that report. A person who edited once or twice isn't a wikipedian in my book, just like a person who writes two post-it notes per month and nothing else isn't called a writer. Some terms only apply above some threshold.
-----Original Message----- From: Analytics [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Andrew Gray Sent: Tuesday, October 27, 2015 11:06 To: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. Subject: Re: [Analytics] User statistics for video marking ENWP 5m article milestone
To a very crude approximation, there are approximately 8.2 million accounts which have at least one edit on English Wikipedia - at least assuming my SQL query is correct! http://quarry.wmflabs.org/query/1911
This is all user accounts with one or more edits in the contributions record; it does not contain IPs, and it does not contain any accounts whose sole contributions have since been deleted (which is probably quite a substantial number). Conversely, it includes a vast panoply of single-use vandalism accounts, sockpuppets, etc etc etc. And bots, of course.
Andrew.
On 27 October 2015 at 05:50, Pine W wiki.pine@gmail.com wrote:
Is there a way to get counts for the number of accounts, including or excluding IPs, that have ever edited English Wikipedia, ? It would be preferable to know the number of unique people, but of course that's impossible.
Thanks, Pine
Aha, that is important for me to know. Thanks Andrew.
Pine
On Thu, Sep 17, 2015 at 11:07 AM, Andrew Gray andrew.gray@dunelm.org.uk wrote:
On 11 September 2015 at 19:19, James Forrester jforrester@wikimedia.org wrote:
Does it include editors on all Wikimedia projects
No.
or just those who have registered and/or edited on ENWP?
Registered, regardless of having edited.
James is of course correct, but one small caveat worth adding: because of SUL, a substantial proportion of these will be "autocreated" accounts from other projects - so even 'registration' may not mean what it seems.
--
- Andrew Gray andrew.gray@dunelm.org.uk
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
--
- Andrew Gray andrew.gray@dunelm.org.uk
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- - Andrew Gray andrew.gray@dunelm.org.uk
_______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Just to complicate this more...
since Pine's question was "accounts... that have ever edited English Wikipedia", we might consider restricting our counts to namespace 0 only.
Either way, it's safe to say that the total number is in the millions.
J
On Tue, Oct 27, 2015 at 10:21 AM, Erik Zachte ezachte@wikimedia.org wrote:
Yes we were talking about editor counts, then we moved on to countries, that's what one calls an analogy ;-)
Erik
*From:* Analytics [mailto:analytics-bounces@lists.wikimedia.org] *On Behalf Of *Aaron Halfaker *Sent:* Tuesday, October 27, 2015 18:19
*To:* A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. *Subject:* Re: [Analytics] [Spam] Re: User statistics for video marking ENWP 5m article milestone
800+ wikis and 280+ Wikipedias
I thought we were talking about editor counts here.
On Tue, Oct 27, 2015 at 12:18 PM, Erik Zachte ezachte@wikimedia.org wrote:
Aaron, the example of countries doesn't seem fitting for me. In many bodies like UN all countries have one vote, and small countries are disproportionaly powerful. That's part of why 196 has meaning.
Let me put it this way, if you think our 800+ wikis and 280+ Wikipedias story is not misleading, then we're pretty far apart on what constitutes meaningful communication.
Erik
*From:* Analytics [mailto:analytics-bounces@lists.wikimedia.org] *On Behalf Of *Leila Zia *Sent:* Tuesday, October 27, 2015 18:11
*To:* A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. *Subject:* Re: [Analytics] [Spam] Re: User statistics for video marking ENWP 5m article milestone
On Tue, Oct 27, 2015 at 9:56 AM, Aaron Halfaker ahalfaker@wikimedia.org wrote:
If we want to critique how we communicate about something, we can't do it in such general terms as "use 5+ edits". We need to know what meaning is intended to be expressed. Only within the context of "meaning" can we talk about "deception" and "misunderstanding". As an empiricist, I'd like to challenge the speculation about the low competencies of our audience.
For the purpose of the 5M report, our audience is a very large audience coming from very different walks of life, the report will be translated in many languages and will be read worldwide. We are not challenging the competency of our audience, instead we are trying to find a way to assist more of our audience to hear a story closer to the real story.
So, if we're going to communicate how people contribute to Wikipedia and not "mislead", we're going to need to give people a primer on powerlaws of participation and discuss the implications of the best fit pareto index https://en.wikipedia.org/wiki/Pareto_index for Wikipedia edits.
That's one option, but that's too hard to the extent that is impossible. The suggestion is that we do better with the understanding that many people will still not get the full picture, but many more will know a story that is closer to the reality of Wikipedia.
Leila
-Aaron
On Tue, Oct 27, 2015 at 11:03 AM, Erik Zachte ezachte@wikimedia.org wrote:
I do agree that we reject good contributions. I also agree this is a messy filter.
The main point however is do we want to communicate to the general public using such messy, fuzzy, inflated (partially), hard to not misunderstand numbers?
We have a history of using vanity metrics (800+ wikis, 280+ Wikipedias). Not untrue in some very formal sense, but totally misleading in that they play on expectations which are totally false.
Erik
*From:* Analytics [mailto:analytics-bounces@lists.wikimedia.org] *On Behalf Of *Aaron Halfaker *Sent:* Tuesday, October 27, 2015 16:41
*To:* A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. *Subject:* Re: [Analytics] [Spam] Re: User statistics for video marking ENWP 5m article milestone
I don't agree. There are a lot of good-faith page creations that get deleted every day. There are also many edits that get reverted. Arguably, those edits aren't productive either, but they don't disappear from the dumps like article drafts do. This is a messy filter at best.
On Tue, Oct 27, 2015 at 10:28 AM, Erik Zachte ezachte@wikimedia.org wrote:
As Aaron says. I'd like to add that if almost 3 million accounts disappeared from the dumps alltogether (vandals? school kids?) that makes the case for not using such a count even more convincing.
Erik
*From:* Analytics [mailto:analytics-bounces@lists.wikimedia.org] *On Behalf Of *Aaron Halfaker *Sent:* Tuesday, October 27, 2015 15:48 *To:* A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. *Subject:* Re: [Analytics] [Spam] Re: User statistics for video marking ENWP 5m article milestone
user_editcount includes edits to deleted pages and revdeleted edits. Erik's perl scripts use the XML dumps that do not include edits to deleted pages.
Strictly speaking, user_editcount is a better proxy for the number of people who have "ever edited". Erik's is the number of people whose edits appear in the history of a page at the time of an XML dump.
-Aaron
On Tue, Oct 27, 2015 at 9:34 AM, Jonathan Morgan jmorgan@wikimedia.org wrote:
I also wonder about this discrepancy. I ran a more explicit version of Andrew query, trying to eliminate some possible edge cases, and came up with the same number.
Now I'm curious. Are there junk rows in our user table, retained for legacy reasons maybe? Is user_editcount inaccurate? Erik, can you describe the processing you perform to winnow down from 8.2 million?
J
On Tue, Oct 27, 2015 at 7:06 AM, Andrew Gray andrew.gray@dunelm.org.uk wrote:
Interesting - wonder why my query's giving a higher number?
I agree entirely that we should be very careful with quoting these figures. I think you'd probably be safe to say that more than a million people have edited... but even then I'd be cautious.
Andrew.
On 27 October 2015 at 11:11, Erik Zachte ezachte@wikimedia.org wrote:
Wikistats has it that 5,644,681 registered accounts published at least
once till Oct 1, 2015, and 2,181,006 three or more times.
It used to publish that on [1][2] but I just removed it.
I'm campaigning against us publishing overly inflated counts since about
two years (Wikimania London).
Since this thread is going on and on, I'll repost my (reworded)
reservations on this particular metric, for newcomers.
Even if we state explicitly that this is not unique people, any audience
will think it may be close and we are overly correct by adding the caveat. It may not be so close. For that reason imo such a metric would be of questionable value, to put it mildly.
Pine:
Is there a way to get counts for the number of accounts, including or
excluding IPs, that have ever edited English Wikipedia, ?
First the anon contributors: when we'd count every ip address that shows
up in the dumps, we'd count *very* many people who were just vandalizing willfully, or just pressing edit for fun, or forgot to login once, and also moved from one ip address to another over the years. On top of that many people get a new ip address (from a pool) on every session, depends on provider policy.
As for registered editors the number Wikistats used to publish may be a
rather empty metric for several reasons:
- How many casual editors will have forgotten their password and just
created a new user id? Only veteran editors know about sockpuppeting and how one is supposed not to do that.
- How many people will have registered in good faith just out of habit,
or to tweak presentation preferences, and then played with the edit button just to see what happens? Note that roughly 2 out of 3 accounts doesn't even reach 3 edits.
Cheers, Erik Zachte
[1]
https://stats.wikimedia.org/EN/TablesWikipediaEN.htm#editdistribution
[2] BTW I use the term wikipedians overly inclusive in that report. A
person who edited once or twice isn't a wikipedian in my book, just like a person who writes two post-it notes per month and nothing else isn't called a writer. Some terms only apply above some threshold.
-----Original Message----- From: Analytics [mailto:analytics-bounces@lists.wikimedia.org] On
Behalf Of Andrew Gray
Sent: Tuesday, October 27, 2015 11:06 To: A mailing list for the Analytics Team at WMF and everybody who has
an interest in Wikipedia and analytics.
Subject: Re: [Analytics] User statistics for video marking ENWP 5m
article milestone
To a very crude approximation, there are approximately 8.2 million
accounts which have at least one edit on English Wikipedia - at least assuming my SQL query is correct! http://quarry.wmflabs.org/query/1911
This is all user accounts with one or more edits in the contributions
record; it does not contain IPs, and it does not contain any accounts whose sole contributions have since been deleted (which is probably quite a substantial number). Conversely, it includes a vast panoply of single-use vandalism accounts, sockpuppets, etc etc etc. And bots, of course.
Andrew.
On 27 October 2015 at 05:50, Pine W wiki.pine@gmail.com wrote:
Is there a way to get counts for the number of accounts, including or excluding IPs, that have ever edited English Wikipedia, ? It would be preferable to know the number of unique people, but of course that's impossible.
Thanks, Pine
Aha, that is important for me to know. Thanks Andrew.
Pine
On Thu, Sep 17, 2015 at 11:07 AM, Andrew Gray andrew.gray@dunelm.org.uk wrote:
On 11 September 2015 at 19:19, James Forrester jforrester@wikimedia.org wrote:
Does it include editors on all Wikimedia projects
No.
or just those who have registered and/or edited on ENWP?
Registered, regardless of having edited.
James is of course correct, but one small caveat worth adding: because of SUL, a substantial proportion of these will be "autocreated" accounts from other projects - so even 'registration' may not mean what it seems.
--
- Andrew Gray andrew.gray@dunelm.org.uk
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
--
- Andrew Gray andrew.gray@dunelm.org.uk
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
--
- Andrew Gray andrew.gray@dunelm.org.uk
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
--
Jonathan T. Morgan
Senior Design Researcher
Wikimedia Foundation
User:Jmorgan (WMF) https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Jonathan Morgan, 27/10/2015 18:53:
Either way, it's safe to say that the total number is in the millions.
+1. It's correct to say that millions have edited Wikipedia, and probably editors for Wikimedia projects are in the order of 10^7. There is no information gain in trying to give more precise numbers.
The point here is very simple, Wikimedia wikis are the most massively multi-author work ever created. Sure, the number may be misleading if associated to different claims.
Nemo