[Wikimedia-l] article bytes more meaningful than users or revisions (was Re: Updates on VE data analysis)

James Salsman jsalsman at gmail.com
Sat Jul 27 06:38:50 UTC 2013


MZMcBride wrote:
>... the number of non-deleted revisions per day for the
> English Wikipedia. The results are here:
> https://en.wikipedia.org/wiki/Special:Permalink/565971356

So, that looks terrible: http://i.imgur.com/Z9lYCWj.png

It looks terrible in the same way that every other graph of active
users and several other related measures look like.

But it isn't. It doesn't account for the power law of practice which
causes everyone who has ever edited Wikipedia to get better at it with
time. And since so many IP editors are obviously returning, that means
a lot more than under the false but very common assumption that every
IP editor is new.

Here's what really matters, articlespace size:  http://i.imgur.com/TfaD99V.png

The size of the article text in bytes has been marching on linearly
since the beginning of Wikipedia, with extremely low variation, just
like the short popular vital articles and every other measure of
quality content.

There is no legitimate basis to worry about anything until the linear
trend of the total article bytes breaks out of its 12 year linear
trend.

(If you multiply columns 'E' and 'I' from
http://stats.wikimedia.org/EN/TablesWikipediaEN.htm the database size
shows a cusp at around 2006, corresponding to the growth modes, but
two separate linear trends fit both modes far better than any growth
model fits the entire curve.)



More information about the Wikimedia-l mailing list