Dear folks,
Are there studies that have examined what might affect edit size (e.g., # of words add/delete/modify in each revision). I am especially interested in the impact of editor's tenure/experience.
Thanks, Haifeng Zhang
Not answering your question about studies, but I think your assumption that an editor has some kind of "normal" edit size dictated solely by tenure/experience might not be valid.
I would note that even for the same contributor, there are different kinds of contribution and these will have different patterns and hence sizes. For example, I think of myself principally as a content writer but I also manage a large watchlist. I would be very surprised if my edit size didn't vary between depending on the task. When content writing, I am likely to large positive size edits (as I am adding content), but I'm human and make mistakes, so a large edit might be followed by some smaller copyedits. But when I am managing my watchlist, my edits will most often be deleting material (vandalism, spam, uncited dubious claims or opinions) so I would imagine that I would mostly do negative size edits. When I am doing some task in AutoWikiBrowser usually to do maintenance across a set of articles (e.g. replace a changed domain name in citation URLs or rename links because of a page move), it will probably show a long run of same/similar sized edits, which might be positive or negative in size depending on the relative length of the old/new text.
You may need to consider a couple more variables that come from the tags on the edits, such as using visual editor and mobile editors, as the tool you use to edit does alter the way you edit. For example, if I click a section edit in source editor, I only get to edit that section so I may do a number of section edits to complete an overall task. If I am using visual editor, it always open the whole article and so I may do the complete task in a single edit. If I am on a mobile device, I will usually do the minimal edit necessary (because it is so hard to edit that way) and come back later on my laptop to finish the task properly, so I might remove some incorrect information with the mobile edit (as leaving it place misleads the reader) but wait until later to add the correct information as adding the citations for the correct information on a mobile device is just too hard for me.
Finally if I am on a poor Internet connection, I will tend to publish frequently for fear of losing my work. If I am on a good Internet connection, I become complacent and publish less frequently. If a person is only a Visual Editor user, then they probably rely on its ability to recover a partial edit if the session terminates unexpectedly and may be less inclined to publish frequently.
I also do training for new users. And new users exhibit a range of behaviours. Some publish very frequently. Add one sentence, publish, add the citation, publish, replace a word, publish. Others forget to publish at all.
And finally if you have an editor with edit-count-itis, expect them to do a lot of small edits using tools to implement lots of minor changes of little net value, because their goal is simply to increase their edit count (and hence their ego) in the guise of contributing. I often think it might be a good idea to hide the edit count statistic; while we might lose a lot of edits as a result, we probably wouldn't miss them and the rest of us would waste less time as our watchlists would not get inflated by these massive number of trivial changes.
Finally I note it is easier to know the number of bytes changed with each edit (the change in the size of the article wikitext) than it is to know the number of words changed as that involves comparison of the text. Which is easy I guess for straight text "how now brown cow" is 4 words but how many words change when using templates, citations, etc, is it the number of words in the wikitext or the number of words rendered to the reader? If I change a template definition, I can alter the number of words in thousands of Wikipedia articles that transclude it.
Kerry
-----Original Message----- From: Wiki-research-l [mailto:wiki-research-l-bounces@lists.wikimedia.org] On Behalf Of Haifeng Zhang Sent: Saturday, 8 June 2019 7:44 AM To: Research into Wikimedia content and communities wiki-research-l@lists.wikimedia.org Subject: [Wiki-research-l] Research on Edit Size
Dear folks,
Are there studies that have examined what might affect edit size (e.g., # of words add/delete/modify in each revision). I am especially interested in the impact of editor's tenure/experience.
Thanks, Haifeng Zhang _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Hi, Kerry,
Thanks a lot for sharing your insight!
The factors you mentioned do seem to be more sensible than tenure.
Best,
Haifeng Zhang ________________________________ From: Wiki-research-l wiki-research-l-bounces@lists.wikimedia.org on behalf of Kerry Raymond kerry.raymond@gmail.com Sent: Friday, June 7, 2019 11:12:02 PM To: 'Research into Wikimedia content and communities' Subject: Re: [Wiki-research-l] Research on Edit Size
Not answering your question about studies, but I think your assumption that an editor has some kind of "normal" edit size dictated solely by tenure/experience might not be valid.
I would note that even for the same contributor, there are different kinds of contribution and these will have different patterns and hence sizes. For example, I think of myself principally as a content writer but I also manage a large watchlist. I would be very surprised if my edit size didn't vary between depending on the task. When content writing, I am likely to large positive size edits (as I am adding content), but I'm human and make mistakes, so a large edit might be followed by some smaller copyedits. But when I am managing my watchlist, my edits will most often be deleting material (vandalism, spam, uncited dubious claims or opinions) so I would imagine that I would mostly do negative size edits. When I am doing some task in AutoWikiBrowser usually to do maintenance across a set of articles (e.g. replace a changed domain name in citation URLs or rename links because of a page move), it will probably show a long run of same/similar sized edits, which might be positive or negative in size depending on the relative length of the old/new text.
You may need to consider a couple more variables that come from the tags on the edits, such as using visual editor and mobile editors, as the tool you use to edit does alter the way you edit. For example, if I click a section edit in source editor, I only get to edit that section so I may do a number of section edits to complete an overall task. If I am using visual editor, it always open the whole article and so I may do the complete task in a single edit. If I am on a mobile device, I will usually do the minimal edit necessary (because it is so hard to edit that way) and come back later on my laptop to finish the task properly, so I might remove some incorrect information with the mobile edit (as leaving it place misleads the reader) but wait until later to add the correct information as adding the citations for the correct information on a mobile device is just too hard for me.
Finally if I am on a poor Internet connection, I will tend to publish frequently for fear of losing my work. If I am on a good Internet connection, I become complacent and publish less frequently. If a person is only a Visual Editor user, then they probably rely on its ability to recover a partial edit if the session terminates unexpectedly and may be less inclined to publish frequently.
I also do training for new users. And new users exhibit a range of behaviours. Some publish very frequently. Add one sentence, publish, add the citation, publish, replace a word, publish. Others forget to publish at all.
And finally if you have an editor with edit-count-itis, expect them to do a lot of small edits using tools to implement lots of minor changes of little net value, because their goal is simply to increase their edit count (and hence their ego) in the guise of contributing. I often think it might be a good idea to hide the edit count statistic; while we might lose a lot of edits as a result, we probably wouldn't miss them and the rest of us would waste less time as our watchlists would not get inflated by these massive number of trivial changes.
Finally I note it is easier to know the number of bytes changed with each edit (the change in the size of the article wikitext) than it is to know the number of words changed as that involves comparison of the text. Which is easy I guess for straight text "how now brown cow" is 4 words but how many words change when using templates, citations, etc, is it the number of words in the wikitext or the number of words rendered to the reader? If I change a template definition, I can alter the number of words in thousands of Wikipedia articles that transclude it.
Kerry
-----Original Message----- From: Wiki-research-l [mailto:wiki-research-l-bounces@lists.wikimedia.org] On Behalf Of Haifeng Zhang Sent: Saturday, 8 June 2019 7:44 AM To: Research into Wikimedia content and communities wiki-research-l@lists.wikimedia.org Subject: [Wiki-research-l] Research on Edit Size
Dear folks,
Are there studies that have examined what might affect edit size (e.g., # of words add/delete/modify in each revision). I am especially interested in the impact of editor's tenure/experience.
Thanks, Haifeng Zhang _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Dear Haifeng Zhang,
If I were you, looking at this, I'd watch out for templates. Templates particularly substituted ones involve a lot of bytes that someone hasn't typed. I recently did an edit that involved me typing {{subst|Infobox academic}} you might be surprised how many bytes that generated. And how many more key depressions that edit involved compared to my typical edit. Similarly reversion can involve adding a lot of bytes, but on further inspection you might simple be reverting a vandal who removed four paragraphs of text that others had contributed.
You might also want to look at an editors edit rate per hour, and time since their previous edit. If their previous edit was half an hour earlier they might have been making a cup of tea, cutting the grass or taking a phone call, or they might have spent half an hour on that edit. But if they have made forty edits in that previous half hour then you are pretty safe to assume that those edits on average represent less than a minute of work.
As well as what Kerry said, there are two things you might want to take into consideration. Firstly those of us with experience of breaking news stories quickly learn the hard way to save little and often, especially on a topical subject. Take for example the article on Sarah Palin in the hours after she was announced as John McCain's running mate. My memory was of multiple concurrent edit wars and a tidal wave of vandalism, I went back later and measured it as peaking at 25 edits per minute, I don't think we even log the edits lost to edit conflicts, but in practice anyone clicking the edit button at the top was going to get an edit conflict - your only chance of getting an edit to save would have been to edit by section.
Secondly, over time editors pick up tools, some of which make a big difference to edit rates. Edit summaries are a good indicator of this, watch for words such as Twinkle, Hotcat, Huggle and AWB. I haven't used Catalot on Wikipedia, but it is the reason why my edit count is higher on Wikimedia commons, despite my spending rather more time on Wikipedia.
Regards
Jonathan
On Fri, 7 Jun 2019 at 22:44, Haifeng Zhang haifeng1@andrew.cmu.edu wrote:
Dear folks,
Are there studies that have examined what might affect edit size (e.g., # of words add/delete/modify in each revision). I am especially interested in the impact of editor's tenure/experience.
Thanks, Haifeng Zhang _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Not only that, but we'd also have to exclude reverts. If someone replaces an article with "LULZ I HAX WIKI", and I revert that, the software will see me as "adding" all the text that was previously there, but of course I didn't in any reasonable sense actually do that.
Todd
On Sat, Jun 22, 2019 at 9:38 AM WereSpielChequers < werespielchequers@gmail.com> wrote:
Dear Haifeng Zhang,
If I were you, looking at this, I'd watch out for templates. Templates particularly substituted ones involve a lot of bytes that someone hasn't typed. I recently did an edit that involved me typing {{subst|Infobox academic}} you might be surprised how many bytes that generated. And how many more key depressions that edit involved compared to my typical edit. Similarly reversion can involve adding a lot of bytes, but on further inspection you might simple be reverting a vandal who removed four paragraphs of text that others had contributed.
You might also want to look at an editors edit rate per hour, and time since their previous edit. If their previous edit was half an hour earlier they might have been making a cup of tea, cutting the grass or taking a phone call, or they might have spent half an hour on that edit. But if they have made forty edits in that previous half hour then you are pretty safe to assume that those edits on average represent less than a minute of work.
As well as what Kerry said, there are two things you might want to take into consideration. Firstly those of us with experience of breaking news stories quickly learn the hard way to save little and often, especially on a topical subject. Take for example the article on Sarah Palin in the hours after she was announced as John McCain's running mate. My memory was of multiple concurrent edit wars and a tidal wave of vandalism, I went back later and measured it as peaking at 25 edits per minute, I don't think we even log the edits lost to edit conflicts, but in practice anyone clicking the edit button at the top was going to get an edit conflict - your only chance of getting an edit to save would have been to edit by section.
Secondly, over time editors pick up tools, some of which make a big difference to edit rates. Edit summaries are a good indicator of this, watch for words such as Twinkle, Hotcat, Huggle and AWB. I haven't used Catalot on Wikipedia, but it is the reason why my edit count is higher on Wikimedia commons, despite my spending rather more time on Wikipedia.
Regards
Jonathan
On Fri, 7 Jun 2019 at 22:44, Haifeng Zhang haifeng1@andrew.cmu.edu wrote:
Dear folks,
Are there studies that have examined what might affect edit size (e.g., # of words add/delete/modify in each revision). I am especially interested
in
the impact of editor's tenure/experience.
Thanks, Haifeng Zhang _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
wiki-research-l@lists.wikimedia.org