[Wikimedia-l] article bytes more meaningful than users or revisions (was Re: Updates on VE data analysis)

James Salsman jsalsman at gmail.com
Sat Jul 27 15:56:05 UTC 2013


Denny Vrandečić wrote:
>...
> Is the graph <http://i.imgur.com/TfaD99V.png> based on actual data?

Yes, the precise sizes for the
dumps.wikimedia.org/enwiki/YYYYMMDD/enwiki-YYYYMMDD-pages-articles-multistream.xml.bz2
files are:

2012-07-02 9524994664
2012-08-02 9824345489
2012-09-02 9929910893
2012-10-01 10015876877
2012-11-01 10124555675
2012-12-01 10220499338
2013-01-02 10315766966
2013-02-04 10425240648
2013-03-04 10430830645
2013-04-03 10433658645
2013-05-03 10525475953
2013-06-04 10617572833
2013-07-08 10721955835

The byte count approximations from multiplying columns 'E' and 'I'
from http://stats.wikimedia.org/EN/TablesWikipediaEN.htm are at the
end of this message. Again, that data best fits two linear trends,
with a cusp around 2006.

> our content is increasing... but the number of active
> contributors is not.

I'm becoming increasingly convinced that as contributors become more
experienced, they choose to do most of their work logged out. What are
the advantages of using a registered account? Theoretically you can
prove that you made contributions, but as far as I know only one
person so far has ever obtained professional credit for their
contributions (there is a recent thread on wiki-research-l about
this.) What are the disadvantages of using a registered account to
edit? Anyone who opposes an edit politically is likely to examine the
entirety of the editor's contribution history and will all too often
stalk, punish by reverting old edits, or dispute the contributor's
work. Anonymous IP editors rarely face such time wasting scrutiny and
hassles. For anyone whose primary goal is to build an encyclopedia as
opposed to socializing, amassing administrative power, or obtaining a
job with the Foundation, the choice is obvious.  Those who wish their
contributions to be remembered for posterity are more likely to become
serial puppeteers than registered editors, unless they want to spend
most of their time being hassled in article space.

John Vandenberg wrote:
>...
> I would love to see stats about quality rather than quantity....

It would be a mistake to rely on volunteer or Foundation assessments
of quality, because the likelihood that they would be biased is far to
great. We should rely only on third party assessments of article
quality, such as those in
http://en.wikipedia.org/wiki/Reliability_of_Wikipedia#Assessments
nearly all of which show continuous ongoing improvement.

Automatic measures of quality proposed so far have not really
impressed me, but I think http://arxiv.org/pdf/1206.2517.pdf has huge
potential and I am confident that the ideas it promotes will be easily
automated by bots after it is proven through peer review.

> Does anyone have stats for the number of blocked users per month

Yes, but it's almost meaningless because the vast majority of blocks
are for persistent vandalism, often at schools or libraries where we
really have no way to determine whether the editors involved ever
returned to do productive work.

---

Products of columns 'E' and 'I' from
http://stats.wikimedia.org/EN/TablesWikipediaEN.htm :

Jan-10 11330500000
Dec-09 11262300000
Nov-09 11206500000
Oct-09 10788000000
Sep-09 10725000000
Aug-09 10653000000
Jul-09 10263100000
Jun-09 10213800000
May-09 9791600000
Apr-09 9718800000
Mar-09 9328500000
Feb-09 9301500000
Jan-09 9250200000
Dec-08 8855600000
Nov-08 8806200000
Oct-08 8415000000
Sep-08 8375000000
Aug-08 8317500000
Jul-08 7960800000
Jun-08 7941600000
May-08 7557800000
Apr-08 7498000000
Mar-08 7112600000
Feb-08 7068600000
Jan-08 6738900000
Dec-07 6699000000
Nov-07 6318000000
Oct-07 6256000000
Sep-07 5859600000
Aug-07 5823500000
Jul-07 5499000000
Jun-07 5181600000
May-07 5140800000
Apr-07 4793600000
Mar-07 4724800000
Feb-07 4662400000
Jan-07 4320000000
Dec-06 4257000000
Nov-06 3917200000
Oct-06 3871000000
Sep-06 3551600000
Aug-06 3510000000
Jul-06 3195600000
Jun-06 2896300000
May-06 2856700000
Apr-06 2557000000
Mar-06 2476177000
Feb-06 2312907000
Jan-06 2170049000
Dec-05 2013600000
Nov-05 1869076000
Oct-05 1746960000
Sep-05 1627864000
Aug-05 1526784000
Jul-05 1407976000
Jun-05 1300334000
May-05 1209984000
Apr-05 1002925000
Mar-05 924630000
Feb-05 872320000
Jan-05 838272000
Dec-04 861724000
Nov-04 806195000
Oct-04 743904000
Sep-04 689924000
Aug-04 644502000
Jul-04 595665000
Jun-04 552900000
May-04 511038000
Apr-04 476750000
Mar-04 440286000
Feb-04 403010000
Jan-04 375536000
Dec-03 350336000
Nov-03 329219000
Oct-03 310616000
Sep-03 294689000
Aug-03 278630000
Jul-03 261555000
Jun-03 244454000
May-03 230328000
Apr-03 217200000
Mar-03 204630000
Feb-03 193475000
Jan-03 182936000
Dec-02 171010000
Nov-02 162150000
Oct-02 150480000
Sep-02 80733000
Aug-02 66990000
Jul-02 59755000
Jun-02 55420000
May-02 49259000
Apr-02 47790000
Mar-02 44968000
Feb-02 39350000
Jan-02 30582000
Dec-01 26832000
Nov-01 21994000
Oct-01 17244000
Sep-01 10982000
Aug-01 7100000
Jul-01 4186000
Jun-01 3240000
May-01 2373600
Apr-01 1295800
Mar-01 596904
Feb-01 186636
Jan-01 33800



More information about the Wikimedia-l mailing list