Hi all,
I wonder if you know if somebody verified and / or further researched Aaron Swartz's thesis on structure of Wikipedia participation. You can find it here: http://www.aaronsw.com/weblog/whowriteswikipedia
Best, Krzysztof Gajewski
Cory Doctorow may have talked about it...
On Tue, Jun 23, 2015 at 9:46 AM, Krzysztof Gajewski krzysztofgajewski@gmail.com wrote:
Hi all,
I wonder if you know if somebody verified and / or further researched Aaron Swartz's thesis on structure of Wikipedia participation. You can find it here: http://www.aaronsw.com/weblog/whowriteswikipedia
Best, Krzysztof Gajewski
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Thank you for your response. Would you give me some details? Google returns only SOPA story
On Tue, Jun 23, 2015 at 4:55 PM, Sam Katz smkatz@gmail.com wrote:
Cory Doctorow may have talked about it...
On Tue, Jun 23, 2015 at 9:46 AM, Krzysztof Gajewski krzysztofgajewski@gmail.com wrote:
Hi all,
I wonder if you know if somebody verified and / or further researched Aaron Swartz's thesis on structure of Wikipedia participation. You can find it here: http://www.aaronsw.com/weblog/whowriteswikipedia
Best, Krzysztof Gajewski
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Krzysztof Gajewski, 23/06/2015 16:46:
I wonder if you know if somebody verified and / or further researched Aaron Swartz's thesis on structure of Wikipedia participation. You can find it here: http://www.aaronsw.com/weblog/whowriteswikipedia
http://wikipapers.referata.com/wiki/Authorship has some related studies, though the question can't have a clearcut answer.
Nemo
The issue was discussed a bit in 2008 under the title "Regular contributor", see the thread here:
https://lists.wikimedia.org/pipermail/wiki-research-l/2008-November/000672.h...
I have attempted to summarize the issue in the section "User contribution" here: "Wikipedia research and tools: Review and comments." http://www2.compute.dtu.dk/pubdb/views/edoc_download.php/6012/pdf/imm6012.pd...
There is also a few pointers in the "Participation Trends" section in our "The people's encyclopedia under the gaze of the sages: A systematic review of scholarly research on Wikipedia" http://orbit.dtu.dk/fedora/objects/orbit:119482/datastreams/file_73b48cd3-a7...
One interesting original study is this one: "Creating, Destroying, and Restoring Value in Wikipedia" from 2007 by Reid Priedhorsky and others. http://dx.doi.org/10.1145/1316624.1316663
They conclude:
"We show that 1/10th of 1% of editors contributed nearly half of the value, measured by words read."
best regards Finn Årup Nielsen
On 06/23/2015 04:46 PM, Krzysztof Gajewski wrote:
Hi all,
I wonder if you know if somebody verified and / or further researched Aaron Swartz's thesis on structure of Wikipedia participation. You can find it here: http://www.aaronsw.com/weblog/whowriteswikipedia
Best, Krzysztof Gajewski
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
On Tue, Jun 23, 2015 at 9:08 AM, Finn Årup Nielsen fn@imm.dtu.dk wrote:
One interesting original study is this one: "Creating, Destroying, and Restoring Value in Wikipedia" from 2007 by Reid Priedhorsky and others. http://dx.doi.org/10.1145/1316624.1316663
Yes, this is the best study of which I'm aware.
- J
best regards Finn Årup Nielsen
On 06/23/2015 04:46 PM, Krzysztof Gajewski wrote:
Hi all,
I wonder if you know if somebody verified and / or further researched Aaron Swartz's thesis on structure of Wikipedia participation. You can find it here: http://www.aaronsw.com/weblog/whowriteswikipedia
Best, Krzysztof Gajewski
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
-- Finn Årup Nielsen http://people.compute.dtu.dk/faan/
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Given what we that active editors have been declining since about 2006, I have to wonder if a 2015 study would produce very different results from the earlier period.
From an entirely anecdotal perspective, I do observe that there is a lot of
housekeeping edits that go on. I create a lot of new articles and would characterise my own editing as writing a lot of new content in new and existing articles; this is my primary interest. However, I am both amused and annoyed at the way that within moments of my edit, there can be a rash of people wanting to add project tags, add esoteric categories that I cannot imagine being used for navigation by real readers, replace a dash of one length with a dash of another length, remove the word comprised (one of the most annoying!), and so on. Many of these folks have massive edit counts and appear (from a quick look at the last screen of recent contributions) to devote themselves entirely to this kind of editing. Indeed, I go so far as to say many suffer from editcountitis, a condition that often can be diagnosed by the User page being largely devoted to reporting on their number of edits :-)
IMHO, I would have to say that the value-add of these housekeeping edits is mixed. Some are genuinely useful (people pick up mistakes Ive made) or add categories I am unaware of that are relevant to the topic. Some are useful if you happen to believe the reader experience is genuinely improved by rigid adherence the Manual of Style (I would be interested in a study on how important the consistency of the use of various-length dashes and other MoS detail is to the reader experience). Some like project tagging appear to be utterly pointless as most of the projects involved are moribund. Other than meeting some deep need to mark your territory like a dog (or get your edit count up), what earthly point is there to project tagging unless the project has some active processes to improve articles? Some are just annoying (like the user who dislikes the word comprised) and many of these people create edit conflicts for me as I add further content which is ****ing annoying. Edit conflicts is a particular problem trying to do your second/third edit to a new article, as new articles attract housekeeping edits like vultures to a carcass. The folks I particularly despise are the ones who try to add multiple quality tags or speedy delete a new stub after its first edit (which is sometimes cut short because I am interrupted folks, give me 5 minutes please to come back and do a little more work on it).
I teach Wikipedia editing (indeed I am off to a local university to do it this morning) and I see first hand how this kind of housekeeping behaviour is really disruptive to new contributors (even the more useful and well-intended housekeeping) because of the edit conflicts it creates. New contributors spend a long time writing and previewing before SAVE, which is probably a desirable behaviour if it wasnt for the housekeepers. Whereas anyone who studied my patterns of edits would see me saving very frequently, because of this issue with edit conflicts from the housekeepers. I try to teach people to SAVE, SAVE, SAVE as often as possible. Having seen the impact of edit conflicts in edit training sessions where I am there to explain whats happening, I suspect that housekeeping edits are probably frightening off or frustrating away new contributors who dont have someone leaning over their shoulder to advise them on dealing with edit conflicts. Because it is quick and easy to do a housekeeping edit and slow to write good content with citations, the housekeepers can easily drive away a content contributor.
Kerry
_____
From: wiki-research-l-bounces@lists.wikimedia.org [mailto:wiki-research-l-bounces@lists.wikimedia.org] On Behalf Of Jonathan Morgan Sent: Wednesday, 24 June 2015 3:24 AM To: Research into Wikimedia content and communities Subject: Re: [Wiki-research-l] Aaron Swartz Hypothesis on WikipediaAuthorship
On Tue, Jun 23, 2015 at 9:08 AM, Finn Årup Nielsen fn@imm.dtu.dk wrote:
One interesting original study is this one: "Creating, Destroying, and Restoring Value in Wikipedia" from 2007 by Reid Priedhorsky and others. http://dx.doi.org/10.1145/1316624.1316663
Yes, this is the best study of which I'm aware.
- J
best regards Finn Årup Nielsen
On 06/23/2015 04:46 PM, Krzysztof Gajewski wrote:
Hi all,
I wonder if you know if somebody verified and / or further researched Aaron Swartz's thesis on structure of Wikipedia participation. You can find it here: http://www.aaronsw.com/weblog/whowriteswikipedia
Best, Krzysztof Gajewski
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Dear Kerry,
Though the vast majority of my edits are precisely the sort of minor housekeeping edits that you describe, I agree with almost all that you say. But would make three little observations.
1 the solution to the edit conflict problem is to fix the software so we have fewer edit conflicts. It wouldn't be a big change to have the software treat categories and project tags as their own sections and not reject newbies edits as conflicts with the taggers and the categorisers. When you are training newbies you can minimise these problems by getting them to start articles in sandboxes and to create sections. But the solution is to get a high priority for various low priority and won't fix bugs on phabricator that would reduce edit conflicts. For the research community the big opportunity is to do research on edit conflicts, if the research showed that they are as I believe the biggest biter of good faith newbies then there is a good chance that some programming resource could be allocated to them. If the research showed that they are not significant and that projects like AFT, Visual Editor, liquid threads, flow and the media wiki viewer really were a better investment for the WMF than reducing edit conflicts, then I will be astonished, and the WMF somewhat vindicated.
2 don't take the "editors have been in decline since 2006/7" too seriously. These are raw figures on edits, they don't take account of the edit filters which during that era lost us most of our vandalism and with it the vandal reversion, vandal warnings, aiv reports and block messages that were generated in response. Nor do they allow for the migration to wikidata of things like intrawiki links. The truth is I'm pretty sure no-one has meaningful figures for community size in that era.
3 project tagging even for currently dormant projects shouldn't cause edit conflicts on articles as the tags go on talk pages. Whether project tagging has use or not depends on your attitude about the health of the community. If we are experiencing uniform and irreversible decline with a dwindling band of editors who aren't changing their editing interests and no new recruits then I could see the argument that once a wiki project has become moribund it won't revive. If however we are broadly stable but with a steady in flow of new editors, then I would see dormant wiki projects as an opportunity for newish editors to take on a role within the community. Again, somebody could earn a doctorate studying this.
Regards
Jonathan
On 23 Jun 2015, at 22:44, Kerry Raymond kerry.raymond@gmail.com wrote:
Given what we that active editors have been declining since about 2006, I have to wonder if a 2015 study would produce very different results from the earlier period.
From an entirely anecdotal perspective, I do observe that there is a lot of “housekeeping” edits that go on. I create a lot of new articles and would characterise my own editing as writing a lot of new content in new and existing articles; this is my primary interest. However, I am both amused and annoyed at the way that within moments of my edit, there can be a rash of people wanting to add project tags, add esoteric categories that I cannot imagine being used for navigation by real readers, replace a dash of one length with a dash of another length, remove the word “comprised” (one of the most annoying!), and so on. Many of these folks have massive edit counts and appear (from a quick look at the last screen of recent contributions) to devote themselves entirely to this kind of editing. Indeed, I go so far as to say many suffer from editcountitis, a condition that often can be diagnosed by the User page being largely devoted to reporting on their number of edits J
IMHO, I would have to say that the value-add of these housekeeping edits is mixed. Some are genuinely useful (people pick up mistakes I’ve made) or add categories I am unaware of that are relevant to the topic. Some are useful if you happen to believe the reader experience is genuinely improved by rigid adherence the Manual of Style (I would be interested in a study on how important the consistency of the use of various-length dashes and other MoS detail is to the reader experience). Some like project tagging appear to be utterly pointless as most of the projects involved are moribund. Other than meeting some deep need to “mark your territory” like a dog (or get your edit count up), what earthly point is there to project tagging unless the project has some active processes to improve articles? Some are just annoying (like the user who dislikes the word “comprised”) and many of these people create edit conflicts for me as I add further content which is ****ing annoying. Edit conflicts is a particular problem trying to do your second/third edit to a new article, as new articles attract housekeeping edits like vultures to a carcass. The folks I particularly despise are the ones who try to add multiple quality tags or speedy delete a new stub after its first edit (which is sometimes cut short because I am interrupted – folks, give me 5 minutes please to come back and do a little more work on it).
I teach Wikipedia editing (indeed I am off to a local university to do it this morning) and I see first hand how this kind of housekeeping behaviour is really disruptive to new contributors (even the more useful and well-intended housekeeping) because of the edit conflicts it creates. New contributors spend a long time writing and previewing before SAVE, which is probably a desirable behaviour if it wasn’t for the housekeepers. Whereas anyone who studied my patterns of edits would see me saving very frequently, because of this issue with edit conflicts from the housekeepers. I try to teach people to SAVE, SAVE, SAVE as often as possible. Having seen the impact of edit conflicts in edit training sessions where I am there to explain what’s happening, I suspect that housekeeping edits are probably frightening off or frustrating away new contributors who don’t have someone leaning over their shoulder to advise them on dealing with edit conflicts. Because it is quick and easy to do a housekeeping edit and slow to write good content with citations, the housekeepers can easily drive away a content contributor.
Kerry
From: wiki-research-l-bounces@lists.wikimedia.org [mailto:wiki-research-l-bounces@lists.wikimedia.org] On Behalf Of Jonathan Morgan Sent: Wednesday, 24 June 2015 3:24 AM To: Research into Wikimedia content and communities Subject: Re: [Wiki-research-l] Aaron Swartz Hypothesis on WikipediaAuthorship
On Tue, Jun 23, 2015 at 9:08 AM, Finn Årup Nielsen fn@imm.dtu.dk wrote:
One interesting original study is this one: "Creating, Destroying, and Restoring Value in Wikipedia" from 2007 by Reid Priedhorsky and others. http://dx.doi.org/10.1145/1316624.1316663
Yes, this is the best study of which I'm aware.
- J
best regards Finn Årup Nielsen
On 06/23/2015 04:46 PM, Krzysztof Gajewski wrote: Hi all,
I wonder if you know if somebody verified and / or further researched Aaron Swartz's thesis on structure of Wikipedia participation. You can find it here: http://www.aaronsw.com/weblog/whowriteswikipedia
Best, Krzysztof Gajewski
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
-- Finn Årup Nielsen http://people.compute.dtu.dk/faan/
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
-- Jonathan T. Morgan Senior Design Researcher Wikimedia Foundation User:Jmorgan (WMF)
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Hi folks,
thank you so much for your messages and a very interesting discussion. Special thanks to Finn for all the hints.
In my opinion results acquired by Priedhorsky's team don't falsify Swartz's hypothesis at all. Cases analyzed by Swartz showed that even when a user contributed with a large amount of text, it could be a translation or a paste-and-copy of a text found somewhere in the Internet. Swartz remarks that this kind of content was typical for active users --- editcountitis, as Kerry wrote. This phenomenon makes impossible, or very difficult, to measure a valuable user contribution with a software. If you want to exclude translation or plagiarism, we must engage a human, or try to created quite sophisticated algorithm. As far as I remember Priedhorsky's method counts translations and plagiarism as a valuable content.
Best, Krzysztof
PS. BTW anybody measured how much of Wikipedia text was copied-and-pasted from another sites, i. e. plagiarized?
On Wed, Jun 24, 2015 at 10:57 AM, WereSpielChequers werespielchequers@gmail.com wrote:
Dear Kerry,
Though the vast majority of my edits are precisely the sort of minor housekeeping edits that you describe, I agree with almost all that you say. But would make three little observations.
1 the solution to the edit conflict problem is to fix the software so we have fewer edit conflicts. It wouldn't be a big change to have the software treat categories and project tags as their own sections and not reject newbies edits as conflicts with the taggers and the categorisers. When you are training newbies you can minimise these problems by getting them to start articles in sandboxes and to create sections. But the solution is to get a high priority for various low priority and won't fix bugs on phabricator that would reduce edit conflicts. For the research community the big opportunity is to do research on edit conflicts, if the research showed that they are as I believe the biggest biter of good faith newbies then there is a good chance that some programming resource could be allocated to them. If the research showed that they are not significant and that projects like AFT, Visual Editor, liquid threads, flow and the media wiki viewer really were a better investment for the WMF than reducing edit conflicts, then I will be astonished, and the WMF somewhat vindicated.
2 don't take the "editors have been in decline since 2006/7" too seriously. These are raw figures on edits, they don't take account of the edit filters which during that era lost us most of our vandalism and with it the vandal reversion, vandal warnings, aiv reports and block messages that were generated in response. Nor do they allow for the migration to wikidata of things like intrawiki links. The truth is I'm pretty sure no-one has meaningful figures for community size in that era.
3 project tagging even for currently dormant projects shouldn't cause edit conflicts on articles as the tags go on talk pages. Whether project tagging has use or not depends on your attitude about the health of the community. If we are experiencing uniform and irreversible decline with a dwindling band of editors who aren't changing their editing interests and no new recruits then I could see the argument that once a wiki project has become moribund it won't revive. If however we are broadly stable but with a steady in flow of new editors, then I would see dormant wiki projects as an opportunity for newish editors to take on a role within the community. Again, somebody could earn a doctorate studying this.
Regards
Jonathan
On 23 Jun 2015, at 22:44, Kerry Raymond kerry.raymond@gmail.com wrote:
Given what we that active editors have been declining since about 2006, I have to wonder if a 2015 study would produce very different results from the earlier period.
From an entirely anecdotal perspective, I do observe that there is a lot of “housekeeping” edits that go on. I create a lot of new articles and would characterise my own editing as writing a lot of new content in new and existing articles; this is my primary interest. However, I am both amused and annoyed at the way that within moments of my edit, there can be a rash of people wanting to add project tags, add esoteric categories that I cannot imagine being used for navigation by real readers, replace a dash of one length with a dash of another length, remove the word “comprised” (one of the most annoying!), and so on. Many of these folks have massive edit counts and appear (from a quick look at the last screen of recent contributions) to devote themselves entirely to this kind of editing. Indeed, I go so far as to say many suffer from editcountitis, a condition that often can be diagnosed by the User page being largely devoted to reporting on their number of edits J
IMHO, I would have to say that the value-add of these housekeeping edits is mixed. Some are genuinely useful (people pick up mistakes I’ve made) or add categories I am unaware of that are relevant to the topic. Some are useful if you happen to believe the reader experience is genuinely improved by rigid adherence the Manual of Style (I would be interested in a study on how important the consistency of the use of various-length dashes and other MoS detail is to the reader experience). Some like project tagging appear to be utterly pointless as most of the projects involved are moribund. Other than meeting some deep need to “mark your territory” like a dog (or get your edit count up), what earthly point is there to project tagging unless the project has some active processes to improve articles? Some are just annoying (like the user who dislikes the word “comprised”) and many of these people create edit conflicts for me as I add further content which is ****ing annoying. Edit conflicts is a particular problem trying to do your second/third edit to a new article, as new articles attract housekeeping edits like vultures to a carcass. The folks I particularly despise are the ones who try to add multiple quality tags or speedy delete a new stub after its first edit (which is sometimes cut short because I am interrupted – folks, give me 5 minutes please to come back and do a little more work on it).
I teach Wikipedia editing (indeed I am off to a local university to do it this morning) and I see first hand how this kind of housekeeping behaviour is really disruptive to new contributors (even the more useful and well-intended housekeeping) because of the edit conflicts it creates. New contributors spend a long time writing and previewing before SAVE, which is probably a desirable behaviour if it wasn’t for the housekeepers. Whereas anyone who studied my patterns of edits would see me saving very frequently, because of this issue with edit conflicts from the housekeepers. I try to teach people to SAVE, SAVE, SAVE as often as possible. Having seen the impact of edit conflicts in edit training sessions where I am there to explain what’s happening, I suspect that housekeeping edits are probably frightening off or frustrating away new contributors who don’t have someone leaning over their shoulder to advise them on dealing with edit conflicts. Because it is quick and easy to do a housekeeping edit and slow to write good content with citations, the housekeepers can easily drive away a content contributor.
Kerry
From: wiki-research-l-bounces@lists.wikimedia.org [mailto:wiki-research-l-bounces@lists.wikimedia.org] On Behalf Of Jonathan Morgan Sent: Wednesday, 24 June 2015 3:24 AM To: Research into Wikimedia content and communities Subject: Re: [Wiki-research-l] Aaron Swartz Hypothesis on WikipediaAuthorship
On Tue, Jun 23, 2015 at 9:08 AM, Finn Årup Nielsen fn@imm.dtu.dk wrote:
One interesting original study is this one: "Creating, Destroying, and Restoring Value in Wikipedia" from 2007 by Reid Priedhorsky and others. http://dx.doi.org/10.1145/1316624.1316663
Yes, this is the best study of which I'm aware.
- J
best regards Finn Årup Nielsen
On 06/23/2015 04:46 PM, Krzysztof Gajewski wrote:
Hi all,
I wonder if you know if somebody verified and / or further researched Aaron Swartz's thesis on structure of Wikipedia participation. You can find it here: http://www.aaronsw.com/weblog/whowriteswikipedia
Best, Krzysztof Gajewski
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
-- Finn Årup Nielsen http://people.compute.dtu.dk/faan/
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
--
Jonathan T. Morgan
Senior Design Researcher
Wikimedia Foundation
User:Jmorgan (WMF)
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Priedhorsky's method counts translations and plagiarism as a valuable content.
Only if that translation and plagiarism sticks in the article without being edited or removed.
Honestly, my main concern with Priedhorsky's method is that it measures *actualized* value -- not value-added. So, if you were to make a contribution 5 years ago rather than today, you're contribution would have *actualized* a ton of value in that 5 years that it couldn't have actualized today. This is a problematic property of the measurement strategy when trying to answer questions like "who adds value to Wikipedia" as opposed to "who added the value that readers got out of Wikipedia"?
With some basic simplifications to the way that Priedhorsky looked at page views and /importance/ generally, I think we can adjust the strategy and maintain the benefits. This is something I'm working on right now. :) https://meta.wikimedia.org/wiki/Research:Measuring_value-added Collaborators welcome!
-Aaron
On Wed, Jun 24, 2015 at 11:28 AM, Krzysztof Gajewski < krzysztofgajewski@gmail.com> wrote:
Hi folks,
thank you so much for your messages and a very interesting discussion. Special thanks to Finn for all the hints.
In my opinion results acquired by Priedhorsky's team don't falsify Swartz's hypothesis at all. Cases analyzed by Swartz showed that even when a user contributed with a large amount of text, it could be a translation or a paste-and-copy of a text found somewhere in the Internet. Swartz remarks that this kind of content was typical for active users --- editcountitis, as Kerry wrote. This phenomenon makes impossible, or very difficult, to measure a valuable user contribution with a software. If you want to exclude translation or plagiarism, we must engage a human, or try to created quite sophisticated algorithm. As far as I remember Priedhorsky's method counts translations and plagiarism as a valuable content.
Best, Krzysztof
PS. BTW anybody measured how much of Wikipedia text was copied-and-pasted from another sites, i. e. plagiarized?
On Wed, Jun 24, 2015 at 10:57 AM, WereSpielChequers werespielchequers@gmail.com wrote:
Dear Kerry,
Though the vast majority of my edits are precisely the sort of minor housekeeping edits that you describe, I agree with almost all that you
say.
But would make three little observations.
1 the solution to the edit conflict problem is to fix the software so we have fewer edit conflicts. It wouldn't be a big change to have the
software
treat categories and project tags as their own sections and not reject newbies edits as conflicts with the taggers and the categorisers. When
you
are training newbies you can minimise these problems by getting them to start articles in sandboxes and to create sections. But the solution is
to
get a high priority for various low priority and won't fix bugs on phabricator that would reduce edit conflicts. For the research community
the
big opportunity is to do research on edit conflicts, if the research
showed
that they are as I believe the biggest biter of good faith newbies then there is a good chance that some programming resource could be allocated
to
them. If the research showed that they are not significant and that
projects
like AFT, Visual Editor, liquid threads, flow and the media wiki viewer really were a better investment for the WMF than reducing edit conflicts, then I will be astonished, and the WMF somewhat vindicated.
2 don't take the "editors have been in decline since 2006/7" too
seriously.
These are raw figures on edits, they don't take account of the edit
filters
which during that era lost us most of our vandalism and with it the
vandal
reversion, vandal warnings, aiv reports and block messages that were generated in response. Nor do they allow for the migration to wikidata of things like intrawiki links. The truth is I'm pretty sure no-one has meaningful figures for community size in that era.
3 project tagging even for currently dormant projects shouldn't cause
edit
conflicts on articles as the tags go on talk pages. Whether project
tagging
has use or not depends on your attitude about the health of the
community.
If we are experiencing uniform and irreversible decline with a dwindling band of editors who aren't changing their editing interests and no new recruits then I could see the argument that once a wiki project has
become
moribund it won't revive. If however we are broadly stable but with a
steady
in flow of new editors, then I would see dormant wiki projects as an opportunity for newish editors to take on a role within the community. Again, somebody could earn a doctorate studying this.
Regards
Jonathan
On 23 Jun 2015, at 22:44, Kerry Raymond kerry.raymond@gmail.com wrote:
Given what we that active editors have been declining since about 2006, I have to wonder if a 2015 study would produce very different results from
the
earlier period.
From an entirely anecdotal perspective, I do observe that there is a lot
of
“housekeeping” edits that go on. I create a lot of new articles and would characterise my own editing as writing a lot of new content in new and existing articles; this is my primary interest. However, I am both amused and annoyed at the way that within moments of my edit, there can be a
rash
of people wanting to add project tags, add esoteric categories that I
cannot
imagine being used for navigation by real readers, replace a dash of one length with a dash of another length, remove the word “comprised” (one of the most annoying!), and so on. Many of these folks have massive edit
counts
and appear (from a quick look at the last screen of recent
contributions) to
devote themselves entirely to this kind of editing. Indeed, I go so far
as
to say many suffer from editcountitis, a condition that often can be diagnosed by the User page being largely devoted to reporting on their number of edits J
IMHO, I would have to say that the value-add of these housekeeping edits
is
mixed. Some are genuinely useful (people pick up mistakes I’ve made) or
add
categories I am unaware of that are relevant to the topic. Some are
useful
if you happen to believe the reader experience is genuinely improved by rigid adherence the Manual of Style (I would be interested in a study on
how
important the consistency of the use of various-length dashes and other
MoS
detail is to the reader experience). Some like project tagging appear to
be
utterly pointless as most of the projects involved are moribund. Other
than
meeting some deep need to “mark your territory” like a dog (or get your
edit
count up), what earthly point is there to project tagging unless the
project
has some active processes to improve articles? Some are just annoying
(like
the user who dislikes the word “comprised”) and many of these people
create
edit conflicts for me as I add further content which is ****ing annoying. Edit conflicts is a particular problem trying to do your second/third
edit
to a new article, as new articles attract housekeeping edits like
vultures
to a carcass. The folks I particularly despise are the ones who try to
add
multiple quality tags or speedy delete a new stub after its first edit (which is sometimes cut short because I am interrupted – folks, give me 5 minutes please to come back and do a little more work on it).
I teach Wikipedia editing (indeed I am off to a local university to do it this morning) and I see first hand how this kind of housekeeping
behaviour
is really disruptive to new contributors (even the more useful and well-intended housekeeping) because of the edit conflicts it creates. New contributors spend a long time writing and previewing before SAVE, which
is
probably a desirable behaviour if it wasn’t for the housekeepers. Whereas anyone who studied my patterns of edits would see me saving very
frequently,
because of this issue with edit conflicts from the housekeepers. I try to teach people to SAVE, SAVE, SAVE as often as possible. Having seen the impact of edit conflicts in edit training sessions where I am there to explain what’s happening, I suspect that housekeeping edits are probably frightening off or frustrating away new contributors who don’t have
someone
leaning over their shoulder to advise them on dealing with edit
conflicts.
Because it is quick and easy to do a housekeeping edit and slow to write good content with citations, the housekeepers can easily drive away a content contributor.
Kerry
From: wiki-research-l-bounces@lists.wikimedia.org [mailto:wiki-research-l-bounces@lists.wikimedia.org] On Behalf Of
Jonathan
Morgan Sent: Wednesday, 24 June 2015 3:24 AM To: Research into Wikimedia content and communities Subject: Re: [Wiki-research-l] Aaron Swartz Hypothesis on WikipediaAuthorship
On Tue, Jun 23, 2015 at 9:08 AM, Finn Årup Nielsen fn@imm.dtu.dk
wrote:
One interesting original study is this one: "Creating, Destroying, and Restoring Value in Wikipedia" from 2007 by Reid Priedhorsky and others. http://dx.doi.org/10.1145/1316624.1316663
Yes, this is the best study of which I'm aware.
- J
best regards Finn Årup Nielsen
On 06/23/2015 04:46 PM, Krzysztof Gajewski wrote:
Hi all,
I wonder if you know if somebody verified and / or further researched Aaron Swartz's thesis on structure of Wikipedia participation. You can find it here: http://www.aaronsw.com/weblog/whowriteswikipedia
Best, Krzysztof Gajewski
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
-- Finn Årup Nielsen http://people.compute.dtu.dk/faan/
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
--
Jonathan T. Morgan
Senior Design Researcher
Wikimedia Foundation
User:Jmorgan (WMF)
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
-- Krzysztof Gajewski +48 698 793 756
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
I think there is a difference between measuring the value to the reader of a contribution and the effort of the editor of that same contribution. On that basis, plagiarism and translation (so long as they are not copyright violations and are appropriately attributed etc) are valuable contributions to the reader, regardless of whether they are less work for the writer.
The introduction of Google Knowledge and the drop in Wikipedia hits tells us that a lot of readers are only looking for 1-2 sentence introduction. It may be that greater coverage of more topics at the stub level is more valuable to the reader than greater depth in existing articles. It would be nice to know more about what readers (as opposed to contributors) want from articles. I know there was a trial of a feedback mechanism, but AFAIK the idea was dumped as the feedback wasn't terribly useful.
Kerry
-----Original Message----- From: wiki-research-l-bounces@lists.wikimedia.org [mailto:wiki-research-l-bounces@lists.wikimedia.org] On Behalf Of Krzysztof Gajewski Sent: Thursday, 25 June 2015 2:28 AM To: Research into Wikimedia content and communities Subject: Re: [Wiki-research-l] Aaron Swartz Hypothesis on WikipediaAuthorship
Hi folks,
thank you so much for your messages and a very interesting discussion. Special thanks to Finn for all the hints.
In my opinion results acquired by Priedhorsky's team don't falsify Swartz's hypothesis at all. Cases analyzed by Swartz showed that even when a user contributed with a large amount of text, it could be a translation or a paste-and-copy of a text found somewhere in the Internet. Swartz remarks that this kind of content was typical for active users --- editcountitis, as Kerry wrote. This phenomenon makes impossible, or very difficult, to measure a valuable user contribution with a software. If you want to exclude translation or plagiarism, we must engage a human, or try to created quite sophisticated algorithm. As far as I remember Priedhorsky's method counts translations and plagiarism as a valuable content.
Best, Krzysztof
PS. BTW anybody measured how much of Wikipedia text was copied-and-pasted from another sites, i. e. plagiarized?
On Wed, Jun 24, 2015 at 10:57 AM, WereSpielChequers werespielchequers@gmail.com wrote:
Dear Kerry,
Though the vast majority of my edits are precisely the sort of minor housekeeping edits that you describe, I agree with almost all that you
say.
But would make three little observations.
1 the solution to the edit conflict problem is to fix the software so we have fewer edit conflicts. It wouldn't be a big change to have the
software
treat categories and project tags as their own sections and not reject newbies edits as conflicts with the taggers and the categorisers. When you are training newbies you can minimise these problems by getting them to start articles in sandboxes and to create sections. But the solution is to get a high priority for various low priority and won't fix bugs on phabricator that would reduce edit conflicts. For the research community
the
big opportunity is to do research on edit conflicts, if the research
showed
that they are as I believe the biggest biter of good faith newbies then there is a good chance that some programming resource could be allocated
to
them. If the research showed that they are not significant and that
projects
like AFT, Visual Editor, liquid threads, flow and the media wiki viewer really were a better investment for the WMF than reducing edit conflicts, then I will be astonished, and the WMF somewhat vindicated.
2 don't take the "editors have been in decline since 2006/7" too
seriously.
These are raw figures on edits, they don't take account of the edit
filters
which during that era lost us most of our vandalism and with it the vandal reversion, vandal warnings, aiv reports and block messages that were generated in response. Nor do they allow for the migration to wikidata of things like intrawiki links. The truth is I'm pretty sure no-one has meaningful figures for community size in that era.
3 project tagging even for currently dormant projects shouldn't cause edit conflicts on articles as the tags go on talk pages. Whether project
tagging
has use or not depends on your attitude about the health of the community. If we are experiencing uniform and irreversible decline with a dwindling band of editors who aren't changing their editing interests and no new recruits then I could see the argument that once a wiki project has become moribund it won't revive. If however we are broadly stable but with a
steady
in flow of new editors, then I would see dormant wiki projects as an opportunity for newish editors to take on a role within the community. Again, somebody could earn a doctorate studying this.
Regards
Jonathan
On 23 Jun 2015, at 22:44, Kerry Raymond kerry.raymond@gmail.com wrote:
Given what we that active editors have been declining since about 2006, I have to wonder if a 2015 study would produce very different results from
the
earlier period.
From an entirely anecdotal perspective, I do observe that there is a lot
of
housekeeping edits that go on. I create a lot of new articles and would characterise my own editing as writing a lot of new content in new and existing articles; this is my primary interest. However, I am both amused and annoyed at the way that within moments of my edit, there can be a rash of people wanting to add project tags, add esoteric categories that I
cannot
imagine being used for navigation by real readers, replace a dash of one length with a dash of another length, remove the word comprised (one of the most annoying!), and so on. Many of these folks have massive edit
counts
and appear (from a quick look at the last screen of recent contributions)
to
devote themselves entirely to this kind of editing. Indeed, I go so far as to say many suffer from editcountitis, a condition that often can be diagnosed by the User page being largely devoted to reporting on their number of edits J
IMHO, I would have to say that the value-add of these housekeeping edits
is
mixed. Some are genuinely useful (people pick up mistakes Ive made) or
add
categories I am unaware of that are relevant to the topic. Some are useful if you happen to believe the reader experience is genuinely improved by rigid adherence the Manual of Style (I would be interested in a study on
how
important the consistency of the use of various-length dashes and other
MoS
detail is to the reader experience). Some like project tagging appear to
be
utterly pointless as most of the projects involved are moribund. Other
than
meeting some deep need to mark your territory like a dog (or get your
edit
count up), what earthly point is there to project tagging unless the
project
has some active processes to improve articles? Some are just annoying
(like
the user who dislikes the word comprised) and many of these people
create
edit conflicts for me as I add further content which is ****ing annoying. Edit conflicts is a particular problem trying to do your second/third edit to a new article, as new articles attract housekeeping edits like vultures to a carcass. The folks I particularly despise are the ones who try to add multiple quality tags or speedy delete a new stub after its first edit (which is sometimes cut short because I am interrupted folks, give me 5 minutes please to come back and do a little more work on it).
I teach Wikipedia editing (indeed I am off to a local university to do it this morning) and I see first hand how this kind of housekeeping behaviour is really disruptive to new contributors (even the more useful and well-intended housekeeping) because of the edit conflicts it creates. New contributors spend a long time writing and previewing before SAVE, which
is
probably a desirable behaviour if it wasnt for the housekeepers. Whereas anyone who studied my patterns of edits would see me saving very
frequently,
because of this issue with edit conflicts from the housekeepers. I try to teach people to SAVE, SAVE, SAVE as often as possible. Having seen the impact of edit conflicts in edit training sessions where I am there to explain whats happening, I suspect that housekeeping edits are probably frightening off or frustrating away new contributors who dont have
someone
leaning over their shoulder to advise them on dealing with edit conflicts. Because it is quick and easy to do a housekeeping edit and slow to write good content with citations, the housekeepers can easily drive away a content contributor.
Kerry
From: wiki-research-l-bounces@lists.wikimedia.org [mailto:wiki-research-l-bounces@lists.wikimedia.org] On Behalf Of Jonathan Morgan Sent: Wednesday, 24 June 2015 3:24 AM To: Research into Wikimedia content and communities Subject: Re: [Wiki-research-l] Aaron Swartz Hypothesis on WikipediaAuthorship
On Tue, Jun 23, 2015 at 9:08 AM, Finn Årup Nielsen fn@imm.dtu.dk wrote:
One interesting original study is this one: "Creating, Destroying, and Restoring Value in Wikipedia" from 2007 by Reid Priedhorsky and others. http://dx.doi.org/10.1145/1316624.1316663
Yes, this is the best study of which I'm aware.
- J
best regards Finn Årup Nielsen
On 06/23/2015 04:46 PM, Krzysztof Gajewski wrote:
Hi all,
I wonder if you know if somebody verified and / or further researched Aaron Swartz's thesis on structure of Wikipedia participation. You can find it here: http://www.aaronsw.com/weblog/whowriteswikipedia
Best, Krzysztof Gajewski
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
-- Finn Årup Nielsen http://people.compute.dtu.dk/faan/
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
--
Jonathan T. Morgan
Senior Design Researcher
Wikimedia Foundation
User:Jmorgan (WMF)
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Hi Krzysztof, that is a very interesting point and I'm not aware of a study other than those pointed out by Finn/Nemo.
I actually thought about doing a larger study (like whole language editions) on exactly that question myself with the wikiwho tool [1] that we developed for that purpose, but didn't find the time yet. But if anybody is interested in conducting such a study I would certainly be interested to assist (or to know if you find someone that has already done it).
Cheers, Fabian
[1] https://github.com/maribelacosta/wikiwho/
On 23.06.2015, at 16:46, Krzysztof Gajewski <krzysztofgajewski@gmail.commailto:krzysztofgajewski@gmail.com> wrote:
Hi all,
I wonder if you know if somebody verified and / or further researched Aaron Swartz's thesis on structure of Wikipedia participation. You can find it here: http://www.aaronsw.com/weblog/whowriteswikipedia
Best, Krzysztof Gajewski
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.orgmailto:Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
-- Fabian Flöck Research Associate Computational Social Science department @GESIS Unter Sachsenhausen 6-8, 50667 Cologne, Germany Tel: + 49 (0) 221-47694-208 fabian.floeck@gesis.orgmailto:fabian.floeck@gesis.org
www.gesis.org www.facebook.com/gesis.org
wiki-research-l@lists.wikimedia.org