[Wikimedia-l] article bytes more meaningful than users or revisions (was Re: Updates on VE data analysis)
Mark
delirium at hackish.org
Sat Jul 27 20:50:10 UTC 2013
On 7/27/13 10:29 AM, Denny Vrandečić wrote:
> I still would worry, though: our content is increasing linearly, as you
> say, but the number of active contributors is not. If we take for granted
> that active contributors are the ones who provide quality control for the
> articles, this means that since 2006 or so the ratio of content per
> contributor is linearly declining, which would mean that our quality would
> suffer.
>
One useful bit of information is what *kind* of editors there are, not
just the raw numbers..
For example, here is a hypothetical situation, which I think James and
John are contemplating, which would result in a numerical decline in
editors-per-article with no real change in actual editorial attention to
the article:
* Article in 2007, with 19 editors: Initial content written by 1 person,
moderate expansions from 3 people, copyediting from 5 people,
vandalism-rollback from 10 people
* Similar article in 2013, with 12 editors: Initial content written by 1
person, moderate expansions from 3 people, copyediting from 3 people and
1 typo-fixing bot, vandalism-rollback from 2 people and 2 anti-vandal bots
Basically all that happened in this hypothetical is that two of the
typo-fixers were replaced by a typo-fixing bot, and 8 rollbacks that
would've once been done by recent-changes patrollers were instead done
by a smaller number of anti-vandal bots. Maybe that's not what the
change looks like, but I don't think the raw edit-count data can tell us
either way.
I think this is also a potential issue with the definition of active
users, which is defined as 5 edits/month for "active" and 100
edits/month for "very active". The latter in particular much more
heavily favors people who make many smaller edits versus fewer large
edits. And are there editors contributing substantial amounts of content
to Wikipedia who don't even hit the lower threshold? One possible group
are people whose main contribution is to write new articles, and do
little to no other editing. Some people write offline and then
contribute a new, well-referenced article in a single edit. If that's
their only involvement in Wikipedia, they wouldn't be counted as active
Wikipedians in the numbers, even if they're sending us a steady stream
of 1-2 new articles/month.
I'm not sure how to best answer those questions automatically. Bytes, as
James suggests, could be one possible proxy, but in addition to total
bytes, we could look at the editor level. Has there been a decline in
"active editors" if we define active editing as changing more than N
bytes in the encyclopedia in a month, not counting rollbacks? That would
count people who wrote substantial new articles as active, even if they
did it in only 1 or 2 edits/month (although on the other hand, it
wouldn't count people who made 100 rollbacks and no other edits).
Another possibility could be to sample a subset of either articles, or
of editors, and manually annotate what kind of editing is going on. More
tedious and would of necessity be on a small subset of the encyclopedia,
but might avoid papering over things that are obvious when you look at
them but tend to get lost in big-data analyses.
-Mark
More information about the Wikimedia-l
mailing list