On Sun, Nov 29, 2009 at 9:32 PM, Carl (CBM) cbm.wikipedia@gmail.com wrote:
This is slightly misleading as it is actually referring to logged-in users only.
[snip]
~600,000 distinct usernames and IP addresses recorded at least 1 edit ~100,000 distinct usernames and IP addresses recorded at least 6 edits ~150,000 distinct usernames recorded at least 1 edit ~42,000 distinct usernames recorded at least 6 edits
Those are all per-month numbers. Logged-out editing is quite significant.
It's not misleading in juxtaposition against the press claims based on Felipe Ortega's thesis:
"As a result, anonymous users will be consistently filtered out throughout this thesis work"
Unfortunately it is a lot harder to reason about the number of 'anonymous' editors: a single IP could be less than one person (i.e. a dynamic IP pool) or thousands of people. But I agree with what you're thinking: It's not appropriate to simply discard that data. For example, a possible long term trend is that some users have decided to edit logged out rather than logging in (something that I've done, since it's easier to avoid getting pulled into meta-discussions as an IP; I have no clue if its a significant phenomena).
I don't have any good suggestion on how to perform an editor-count analysis which includes anons, but at least any conclusions about a change in the editor count ought to at least attempt to control for changes in the logged-in / logged-out state of editors as there are plenty of reasons to expect the proportion of logged in vs out to change over time.
I assume your numbers are also ignoring deleted articles as Felipe's did, though it appears that he was unaware of the exclusion of deleted articles ("However, Wikipedia dump files include a complete list of all contributions performed within the period of analysis, so we do not have to deal with other distinct types of censoring here"). While I haven't checked lately I'm quite confident that on EnWP deleted articles reflect an enormous number of one-edit accounts due to the requirement of an account for article creation as well as other obvious factors.