I do agree that we reject good contributions. I also agree this is a messy filter.
The main point however is do we want to communicate to the general public using such
messy, fuzzy, inflated (partially), hard to not misunderstand numbers?
We have a history of using vanity metrics (800+ wikis, 280+ Wikipedias). Not untrue in
some very formal sense, but totally misleading in that they play on expectations which are
totally false.
Erik
From: Analytics [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Aaron
Halfaker
Sent: Tuesday, October 27, 2015 16:41
To: A mailing list for the Analytics Team at WMF and everybody who has an interest in
Wikipedia and analytics.
Subject: Re: [Analytics] [Spam] Re: User statistics for video marking ENWP 5m article
milestone
I don't agree. There are a lot of good-faith page creations that get deleted every
day. There are also many edits that get reverted. Arguably, those edits aren't
productive either, but they don't disappear from the dumps like article drafts do.
This is a messy filter at best.
On Tue, Oct 27, 2015 at 10:28 AM, Erik Zachte <ezachte(a)wikimedia.org> wrote:
As Aaron says. I'd like to add that if almost 3 million accounts disappeared from the
dumps alltogether (vandals? school kids?) that makes the case for not using such a count
even more convincing.
Erik
From: Analytics [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Aaron
Halfaker
Sent: Tuesday, October 27, 2015 15:48
To: A mailing list for the Analytics Team at WMF and everybody who has an interest in
Wikipedia and analytics.
Subject: Re: [Analytics] [Spam] Re: User statistics for video marking ENWP 5m article
milestone
user_editcount includes edits to deleted pages and revdeleted edits. Erik's perl
scripts use the XML dumps that do not include edits to deleted pages.
Strictly speaking, user_editcount is a better proxy for the number of people who have
"ever edited". Erik's is the number of people whose edits appear in the
history of a page at the time of an XML dump.
-Aaron
On Tue, Oct 27, 2015 at 9:34 AM, Jonathan Morgan <jmorgan(a)wikimedia.org> wrote:
I also wonder about this discrepancy. I ran a more explicit version of Andrew query,
trying to eliminate some possible edge cases, and came up with the same number.
Now I'm curious. Are there junk rows in our user table, retained for legacy reasons
maybe? Is user_editcount inaccurate? Erik, can you describe the processing you perform to
winnow down from 8.2 million?
J
On Tue, Oct 27, 2015 at 7:06 AM, Andrew Gray <andrew.gray(a)dunelm.org.uk> wrote:
Interesting - wonder why my query's giving a higher number?
I agree entirely that we should be very careful with quoting these
figures. I think you'd probably be safe to say that more than a
million people have edited... but even then I'd be cautious.
Andrew.
On 27 October 2015 at 11:11, Erik Zachte <ezachte(a)wikimedia.org> wrote:
Wikistats has it that 5,644,681 registered accounts
published at least once till Oct 1, 2015, and 2,181,006 three or more times.
It used to publish that on [1][2] but I just removed it.
I'm campaigning against us publishing overly inflated counts since about two years
(Wikimania London).
Since this thread is going on and on, I'll repost my (reworded) reservations on this
particular metric, for newcomers.
Even if we state explicitly that this is not unique people, any audience will think it
may be close and we are overly correct by adding the caveat. It may not be so close. For
that reason imo such a metric would be of questionable value, to put it mildly.
Pine:
Is there a way to get counts for the number of
accounts, including or excluding IPs, that have ever edited English Wikipedia, ?
First the anon contributors: when we'd count every ip address that shows up in the
dumps, we'd count *very* many people who were just vandalizing willfully, or just
pressing edit for fun, or forgot to login once, and also moved from one ip address to
another over the years. On top of that many people get a new ip address (from a pool) on
every session, depends on provider policy.
As for registered editors the number Wikistats used to publish may be a rather empty
metric for several reasons:
- How many casual editors will have forgotten their password and just created a new user
id? Only veteran editors know about sockpuppeting and how one is supposed not to do that.
- How many people will have registered in good faith just out of habit, or to tweak
presentation preferences, and then played with the edit button just to see what happens?
Note that roughly 2 out of 3 accounts doesn't even reach 3 edits.
Cheers,
Erik Zachte
[1]
https://stats.wikimedia.org/EN/TablesWikipediaEN.htm#editdistribution
[2] BTW I use the term wikipedians overly inclusive in that report. A person who edited
once or twice isn't a wikipedian in my book, just like a person who writes two post-it
notes per month and nothing else isn't called a writer. Some terms only apply above
some threshold.
-----Original Message-----
From: Analytics [mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Andrew Gray
Sent: Tuesday, October 27, 2015 11:06
To: A mailing list for the Analytics Team at WMF and everybody who has an interest in
Wikipedia and analytics.
Subject: Re: [Analytics] User statistics for video marking ENWP 5m article milestone
To a very crude approximation, there are approximately 8.2 million accounts which have at
least one edit on English Wikipedia - at least assuming my SQL query is correct!
http://quarry.wmflabs.org/query/1911
This is all user accounts with one or more edits in the contributions record; it does not
contain IPs, and it does not contain any accounts whose sole contributions have since been
deleted (which is probably quite a substantial number). Conversely, it includes a vast
panoply of single-use vandalism accounts, sockpuppets, etc etc etc. And bots, of course.
Andrew.
On 27 October 2015 at 05:50, Pine W <wiki.pine(a)gmail.com> wrote:
Is there a way to get counts for the number of
accounts, including or
excluding IPs, that have ever edited English Wikipedia, ? It would be
preferable to know the number of unique people, but of course that's
impossible.
Thanks,
Pine
Aha, that is important for me to know. Thanks Andrew.
Pine
On Thu, Sep 17, 2015 at 11:07 AM, Andrew Gray
<andrew.gray(a)dunelm.org.uk>
wrote:
On 11 September 2015 at 19:19, James Forrester
<jforrester(a)wikimedia.org>
wrote:
> Does it include editors on all Wikimedia
projects
No.
> or just those who have registered and/or edited on ENWP?
Registered, regardless of having edited.
James is of course correct, but one small caveat worth adding:
because of SUL, a substantial proportion of these will be "autocreated"
accounts from other projects - so even 'registration' may not mean
what it seems.
--
- Andrew Gray
andrew.gray(a)dunelm.org.uk
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
--
- Andrew Gray
andrew.gray(a)dunelm.org.uk
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
--
- Andrew Gray
andrew.gray(a)dunelm.org.uk
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
--
Jonathan T. Morgan
Senior Design Researcher
Wikimedia Foundation
User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics