Further to Joan’s comment, there are some other ways to stratify edits:
- Whether an edit is vandalism, a vandalism revert, an “actual" change. Vandal edits and reverts are both quick compared to good-faith additions and changes. Heavily vandalized articles will have long edit histories, even though sometimes not much effort was put into them.
- Whether the edit was made by a human or bot.
- Whether a human edit was made with a tool such as AWB or HotCat. AWB in particular can be used to make very fast edits.
Another thought is that if you’re trying to measure contributor effort, why not look at article Talk pages as well? For controversial articles, a large proportion of editor time is spent on discussion.
Cheers, Su-Laine (longtime Wikipedia contributor)
On Oct 20, 2020, at 12:37 PM, Johan Jönsson brevlistor@gmail.com wrote:
A few comments from an editing perspective, in case anything here is useful:
I think Levenshtein distance might be a useful concept here, given the indication that I've read through and made some sort of decision around a whole article or a significant part of an article – both for additions and subtractions.
When it comes to article content, the most important signifier of effort spent on an edit beyond text length that comes to mind is whether a new ref tag is added. If I'm referencing something, there's a fair chance that I've not only identified a shortage or deficiency, but potentially spent time both finding a source and reading through it to be able to reference it, even if it results in a short sentence.
In some languages, translations of other Wikipedia articles are common; there might be a big difference between adding the same type of content translated from another language version and writing it from scratch.
//Johan Jönsson
Den tis 20 okt. 2020 kl 20:32 skrev Nate E TeBlunthuis nathante@uw.edu:
Greetings!
Quantifying effort is obviously a fraught prospect, but Geiger and Halfaker [1] used edit sessions defined as consecutive edits by an editor without a gap longer than an hour to quantify the total number of labor hours spent on Wikipedia. I'm familiar with other papers that use this approach to measure things like editor experience.
I'm curious about the amount of effort put into each particular article. Edit sessions seem like a good approach, but there are some problems:
- How much time does an edit session of length 1 take?
- Should article edit sessions be consecutive in the same article?
- What if someone makes an edit to related article in the middle of
their session?
I wonder what folks here think about alternatives for quantifying effort to an article like
- Number of wikitext characters added/removed
- Levenshtein (edit) distance (of characters or tokens)
- Simply the number of edits
Thanks for your help!
[1] Geiger, R. S., & Halfaker, A. (2013). Using edit sessions to measure participation in Wikipedia. Proceedings of the 2013 Conference on Computer Supported Cooperative Work, 861–870. http://dl.acm.org/citation.cfm?id=2441873
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l