Thanks for raising this question Nate! Really interested in this discussion. Another option to throw into the mix though it would require a fair bit of work:
The Growth team put together a taxonomy of tasks that editors do and their perceived difficulty level: https://www.mediawiki.org/wiki/Growth/Personalized_first_day/Newcomer_tasks#... I'm not sure how complete the taxonomy is, but you could:
- come up with a complete-ish taxonomy of edit types (another option to consider: https://github.com/diyiy/Wiki_Semantic_Intention) - assign each edit type a general difficulty level and time estimate (hopefully backed up with some empirical data from edit session data or which user groups engage in a given type of edit though for all the reasons mentioned by you and others, that can be really hard to calculate) - build detectors for each type of edit (unfortunately this is going to require parsing a lot of wikitext but hopefully you can simply do things like just compare the count # of links, images, templates, etc. in the previous revision and current revision with mwparserfromhell) - classify each edit based on what changes it had to the difficulty level and therefore estimated time/expertise involved.
Alternatively, you could just count up # of links etc. for the current version of the page and multiply each link etc. by estimated time to add. This would be highly conservative though because it would miss all the collaboration / updating / adjusting / etc. so it would be more of an estimate of minimum time to build a page.
On Tue, Oct 20, 2020 at 5:43 PM Ziko van Dijk zvandijk@gmail.com wrote:
Hello Nate,
Thank you for your interesting question, and thank you for your paper with Shaw and Mako Hill 2018 on the rise and decline of populations.
Your endeavour seems to be most difficult and hardly possible. My thinking would be the following: there are certain patterns behind an edit, or: editing activity. For example, imagine someone who reads an article and corrects some minor typos and linguistic issues on the going. How long is the article, how long may it take to read it? How long may it take to make those edits (or, one big edit)?
On the one hand, you may ask editors or observe them to find out how much time they need for this kind of activity. On the other hand, you may try to find this pattern back in certain characteristics of the edit (edit of the whole page; small changes of letters at several locations of the text).
It would be a philosophical question what is exactly part of the editing activity. If I read a whole article for my own purposes, as a reader, without intention to edit, and then I find a small error and quickly correct it - does that make my whole reading of the article a part of my editing activity? I would have read the article anyway.
There would be many other patterns. E.g., someone adds a picture. How much time this takes, that depends on whether the editor has searched for it on Commons, or took the same one he found in a different language version. So, if the picture appears in other language versions, you assume that the editor needed 10 minutes to find it, and otherwise, that he needed only two minutes to find the picture on a different language version?
A last example: On a meeting of administrators I remember an admin explaining that dealing with one vandalism report on the list of incidents costs him for about half an hour. Maybe a useful starting point for further considerations?
Good luck, and kind regards Ziko
Am Di., 20. Okt. 2020 um 23:18 Uhr schrieb Su-Laine Brodsky sulainey@gmail.com:
Further to Joan’s comment, there are some other ways to stratify edits:
- Whether an edit is vandalism, a vandalism revert, an “actual" change.
Vandal edits and reverts are both quick compared to good-faith additions and changes. Heavily vandalized articles will have long edit histories, even though sometimes not much effort was put into them.
Whether the edit was made by a human or bot.
Whether a human edit was made with a tool such as AWB or HotCat. AWB
in particular can be used to make very fast edits.
Another thought is that if you’re trying to measure contributor effort,
why not look at article Talk pages as well? For controversial articles, a large proportion of editor time is spent on discussion.
Cheers, Su-Laine (longtime Wikipedia contributor)
On Oct 20, 2020, at 12:37 PM, Johan Jönsson brevlistor@gmail.com
wrote:
A few comments from an editing perspective, in case anything here is
useful:
I think Levenshtein distance might be a useful concept here, given the indication that I've read through and made some sort of decision
around a
whole article or a significant part of an article – both for additions
and
subtractions.
When it comes to article content, the most important signifier of
effort
spent on an edit beyond text length that comes to mind is whether a
new ref
tag is added. If I'm referencing something, there's a fair chance that
I've
not only identified a shortage or deficiency, but potentially spent
time
both finding a source and reading through it to be able to reference
it,
even if it results in a short sentence.
In some languages, translations of other Wikipedia articles are common; there might be a big difference between adding the same type of content translated from another language version and writing it from scratch.
//Johan Jönsson
Den tis 20 okt. 2020 kl 20:32 skrev Nate E TeBlunthuis <
nathante@uw.edu>:
Greetings!
Quantifying effort is obviously a fraught prospect, but Geiger and Halfaker [1] used edit sessions defined as consecutive edits by an
editor
without a gap longer than an hour to quantify the total number of
labor
hours spent on Wikipedia. I'm familiar with other papers that use
this
approach to measure things like editor experience.
I'm curious about the amount of effort put into each particular
article.
Edit sessions seem like a good approach, but there are some problems:
- How much time does an edit session of length 1 take?
- Should article edit sessions be consecutive in the same article?
- What if someone makes an edit to related article in the middle of
their session?
I wonder what folks here think about alternatives for quantifying
effort
to an article like
- Number of wikitext characters added/removed
- Levenshtein (edit) distance (of characters or tokens)
- Simply the number of edits
Thanks for your help!
[1] Geiger, R. S., & Halfaker, A. (2013). Using edit sessions to
measure
participation in Wikipedia. Proceedings of the 2013 Conference on
Computer
Supported Cooperative Work, 861–870. http://dl.acm.org/citation.cfm?id=2441873
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l