It would be interesting to have some coarse characterisation of edits to see if any growth in edit count is spread uniformly against all contribution types or if the growth is disproportionate some way. I would suspect that the change in the length of the article is probably a poor man’s approximation for the nature of the edit. Using the “generalisation from single example” method J I took a look at my own recent contributions. As a rough characterisation ….


An increase of over 200 bytes seems to equate to adding content in the form of new sentences, so likely to be  new facts. And most edits in 100-200 extra bytes are content related (or at least added citations).


Adding under 100 bytes seems to be more “housekeeping” of existing content. Nothing factually new, but I might be adding a section header, some wikilinks, copyediting, adding categories, etc


Reductions in a small number of bytes 0-50 is most likely copyediting.


Reductions by more than 50 bytes is usually deleting content (although it might be part of moving/merging process in which the content is actually preserved elsewhere, as I use section editing a lot in the source editor). Not being a deletionist, my larger deletions (where my intention is to remove the content entirely from WP) are usually pretty blatant vandalism or nonsense. Generally if I sense good faith, I try to see if I can fix it up rather than just chuck it out. As section blanking etc is usually dealt with by ClueBot and similar, I am rarely needing to restore large amounts of inexplicably deleted content.


The above comments relate to articles rather than talk pages where different patterns apply.


So I’d be curious to know if there’s any change to the proportion of (say) 200+ byte additions to articles (not talk, etc) over time, as I think that’s a reasonable indicator of new content rather than the maintenance of existing content.






From: [] On Behalf Of Jonathan Morgan
Sent: Tuesday, 25 August 2015 2:48 AM
To: Research into Wikimedia content and communities <>
Subject: Re: [Wiki-research-l] Has the recent increase in English wikipedia's core community gone beyond a statistical blip?


I don't think Jonathan was saying we should buy a full page adin the NYT and declare editor retention solved. I share his cautious optimism. The rate of the editor decline has decreased along several metrics, and we're seeing an intriguing uptick in 100+ editor activity. 


Back in 2011, when he and I (and several others on this list) were participating in the Summer of Research, the month-over-month metrics were decreasing at a rate that was kind of alarming. Some combination of factors seems to have changed that pattern. Worth looking into.




On Mon, Aug 24, 2015 at 9:35 AM, Oliver Keyes <> wrote:

"Until we can prove it is good data we should treat it as good data"
is not how data works.

Absent exactly that analysis it is almost certainly a bad idea for us
to declare this to be good news; validate, /then/ celebrate.

On 24 August 2015 at 12:26, WereSpielChequers

<> wrote:
> 100 edits a month does indeed have the disadvantage that all edits are not
> equal, there may be some people for whom that represents 100 hours
> contributed, others a single hour. So an individual month could be inflated
> by something as trivial as a vandalfighting bot going down for a couple of
> days and a bunch of oldtimers responding to a call on IRC by coming back and
> running huggle for an hour.
> But 7 months in a row where the total is higher than the same month the
> previous year looks to me like a pattern.
> Across the 3,000 or so editors on English wikipedia who contribute over a
> hundred edits per month there could be a hidden pattern of an increase in
> Huggle, stiki and AWB users more than offsetting a decline in manual
> editing, but unless anyone analyses that and reruns those stats on some
> metric such as "unique calender hours in which someone saves an edit" I
> think it best to treat this as an imperfect indicator of community health.
> I'm not suggesting that we are out of the woods - there are other indicators
> that are still looking bad, and I would love to see a better proxy for
> active editors. But this is good news.
> On 23 August 2015 at 19:31, Mark J. Nelson <> wrote:
>> WereSpielChequers <> writes:
>> > Could you be more specific re "In general I'm not sure the 100+ count is
>> > among the most reliable." What in particular do you think is unreliable
>> > about that metric?
>> The main thing I have questions about with that metric is whether it's a
>> good proxy for editing activity in general, or is dominated by
>> fluctuations in "bookkeeping" contributions, i.e. people doing
>> mass-moves of categories and that kind of thing (which makes it quite
>> easy to get to 100 edits). This has long been a complaint about edit
>> counts as a metric, which have never really been solidly validated.
>> Looking through my own personal editing history, it looks like there's
>> an anti-correlation between hitting the 100-edit threshold and making
>> more substantial edits. In months when I work on article-writing I
>> typically have only 20-30 edits, because each edit takes a lot of
>> library research, so I can't make more than one or two a day. In months
>> where I do more bookkeeping-type edits I can easily have 500 or 1000
>> edits.
>> But that's just for me; it's certainly possible that Wikipedia-wide,
>> there's a good correlation between raw edit count and other kinds of
>> desirable activity measures. But is there evidence of that?
>> --
>> Mark J. Nelson
>> Anadrome Research
>> _______________________________________________
>> Wiki-research-l mailing list
> _______________________________________________
> Wiki-research-l mailing list

Oliver Keyes
Count Logula
Wikimedia Foundation

Wiki-research-l mailing list



Jonathan T. Morgan

Senior Design Researcher

Wikimedia Foundation