[WikiEN-l] BBC blog on WSJ study

Gregory Maxwell gmaxwell at gmail.com
Mon Nov 30 03:36:35 UTC 2009


On Sun, Nov 29, 2009 at 9:32 PM, Carl (CBM) <cbm.wikipedia at gmail.com> wrote:
> This is slightly misleading as it is actually referring to logged-in
> users only.
[snip]

> ~600,000 distinct usernames and IP addresses recorded at least 1 edit
> ~100,000 distinct usernames and IP addresses recorded at least 6 edits
> ~150,000 distinct usernames recorded at least 1 edit
> ~42,000 distinct usernames recorded at least 6 edits
>
> Those are all per-month numbers. Logged-out editing is quite significant.

It's not misleading in juxtaposition against the press claims based on
Felipe Ortega's thesis:

"As a result, anonymous users will be consistently filtered out
throughout this thesis work"

Unfortunately it is a lot harder to reason about the number of
'anonymous' editors: a single IP could be less than one person (i.e. a
dynamic IP pool) or thousands of people.  But I agree with what you're
thinking: It's not appropriate to simply discard that data. For
example, a possible long term trend is that some users have decided to
edit logged out rather than logging in (something that I've done,
since it's easier to avoid getting pulled into meta-discussions as an
IP; I have no clue if its a significant phenomena).

I don't have any good suggestion on how to perform an editor-count
analysis which includes anons, but at least any conclusions about a
change in the editor count ought to at least attempt to control for
changes in the logged-in / logged-out state of editors as there are
plenty of reasons to expect the proportion of logged in vs out to
change over time.

I assume your numbers are also ignoring deleted articles as Felipe's
did, though it appears that he was unaware of the exclusion of deleted
articles ("However, Wikipedia dump files include a complete list of
all
 contributions performed within the period of analysis, so we do not
have to deal with other distinct types of censoring here"). While I
haven't checked lately I'm quite confident that on EnWP deleted
articles reflect an enormous number of one-edit accounts due to the
requirement of an account for article creation as well as other
obvious factors.



More information about the WikiEN-l mailing list