[Wikimedia-l] article bytes more meaningful than users or revisions (was Re: Updates on VE data analysis)

Mark delirium at hackish.org
Sat Jul 27 20:50:10 UTC 2013


On 7/27/13 10:29 AM, Denny Vrandečić wrote:
> I still would worry, though: our content is increasing linearly, as you
> say, but the number of active contributors is not. If we take for granted
> that active contributors are the ones who provide quality control for the
> articles, this means that since 2006 or so the ratio of content per
> contributor is linearly declining, which would mean that our quality would
> suffer.
>

One useful bit of information is what *kind* of editors there are, not 
just the raw numbers..

For example, here is a hypothetical situation, which I think James and 
John are contemplating, which would result in a numerical decline in 
editors-per-article with no real change in actual editorial attention to 
the article:

* Article in 2007, with 19 editors: Initial content written by 1 person, 
moderate expansions from 3 people, copyediting from 5 people, 
vandalism-rollback from 10 people

* Similar article in 2013, with 12 editors: Initial content written by 1 
person, moderate expansions from 3 people, copyediting from 3 people and 
1 typo-fixing bot, vandalism-rollback from 2 people and 2 anti-vandal bots

Basically all that happened in this hypothetical is that two of the 
typo-fixers were replaced by a typo-fixing bot, and 8 rollbacks that 
would've once been done by recent-changes patrollers were instead done 
by a smaller number of anti-vandal bots. Maybe that's not what the 
change looks like, but I don't think the raw edit-count data can tell us 
either way.

I think this is also a potential issue with the definition of active 
users, which is defined as 5 edits/month for "active" and 100 
edits/month for "very active". The latter in particular much more 
heavily favors people who make many smaller edits versus fewer large 
edits. And are there editors contributing substantial amounts of content 
to Wikipedia who don't even hit the lower threshold? One possible group 
are people whose main contribution is to write new articles, and do 
little to no other editing. Some people write offline and then 
contribute a new, well-referenced article in a single edit. If that's 
their only involvement in Wikipedia, they wouldn't be counted as active 
Wikipedians in the numbers, even if they're sending us a steady stream 
of 1-2 new articles/month.

I'm not sure how to best answer those questions automatically. Bytes, as 
James suggests, could be one possible proxy, but in addition to total 
bytes, we could look at the editor level. Has there been a decline in 
"active editors" if we define active editing as changing more than N 
bytes in the encyclopedia in a month, not counting rollbacks? That would 
count people who wrote substantial new articles as active, even if they 
did it in only 1 or 2 edits/month (although on the other hand, it 
wouldn't count people who made 100 rollbacks and no other edits).

Another possibility could be to sample a subset of either articles, or 
of editors, and manually annotate what kind of editing is going on. More 
tedious and would of necessity be on a small subset of the encyclopedia, 
but might avoid papering over things that are obvious when you look at 
them but tend to get lost in big-data analyses.

-Mark



More information about the Wikimedia-l mailing list