[Foundation-l] Wikimedia Audited Financial Statements for 2009-10 Fiscal Year Now Available

Erik Zachte erikzachte at infodisiac.com
Sun Oct 31 03:46:33 UTC 2010


This thread drifted off topic into a discussion about how me measure our 
editor base.

[Summary: editor stats will never be precise, but filtering duplicates 
will be a good next step]

Some earlier comments noted how the count is inaccurate, or even skewed 
systemically.
I agree with most comments. I have little hope that an accurate count 
will ever be possible.
We can try to get to a more robust and realistic (and certainly lower) 
approximation.

Let me first explain two reasons why we present a count of
+/- 85,000 active editors compared to 100,000 some months ago.

1st: In July a bug fix stopped double counting of editors on Commons
(for a while Commons wiki had been listed on two queues)
Since wikistats always regenerates counts for all months no trace of 
this bug has been left. [1]

2nd: Starting August editors on Commons are no longer included at all
in overall editor total, on the assumption that most editors on Commons
also edit on one or more other projects. [2]

Of course this very rough way to get to a more conservative editor count
is less ideal to say the least, but pending better analysis of our user 
tables,
this is a step closer to a count of unique registered human contributors.

What we really need to do is to ignore confirmed duplicates,

Only since Single User Login (where many users have formally merged 
accounts from multiple wikis)
there is this possibility to check whether user John Doe on English 
Wikipedia and
user John Doe on German Wikipedia are really the same person.
Once private SUL dumps are available (a long standing request) this will 
be looked into asap.

Caveats: the user may have left before SUL was introduced,
or decided not to merge accounts for whatever reason.

[1] http://stats.wikimedia.org/reportcard/RC_2010_07_synopsis.html
[2] http://stats.wikimedia.org/reportcard/RC_2010_08_synopsis.html

---------------------------

As mentioned in an earlier post in this thread,
the total number of active editors that participated 5+ edits
at least in one month of the year will be higher than for any month alone,
as e.g. Mr X only qualifies on odd months and Mrs Y only on even months.

True but in the context of this thread statusses at end of consecutive 
budget years matter most.

We do have a metric that counts total registered editors *for all time*,
albeit with a different threshold: editors need to have at least 10 
edits, not necessarily in same month.
For English Wikipedia alone end June 2010 already 608,000 accounts 
qualified.
This metric and the one discussed above (5+ edits in any given month) 
are apples and oranges.

--------------------------

To which extent certain persons made multiple accounts on the same wiki
(so called sock puppets) is not known.
Yet another reason to work towards a conservative estimate.

-------------------------

Anonymous editors are no longer counted at all.
This would have resulted in millions of addresses
(nowhere near the 75,000  someone stated earlier in this thread).
Add to this the difficulties to match people and unique editors
- One ip address can serve a whole school of cafe
- Some providers send a new temporary ip address on every session
- Many users edit from different PC's over time (e.g. work , home)

We can however follow anonymous edits over time, rather than editors
See e.g. http://stats.wikimedia.org/EN/PlotsPngEditHistoryTop.htm

----------------------------
Bottom line:
Given all ambiguities in our data, and arbitrary thresholds, and 
architectural changes,
we will never have an accurate count of number of active editors.

Personally I would rather publish a conservative estimate than an 
inflated one.
We aren't there yet.

I think it is even more important that any definition stays simple,
and any methodology is consistent over time.
The latter in particular is needed to allow meaningful trend analysis.

Erik Zachte





More information about the foundation-l mailing list