Reid Priedhorsky, Jilin Chen, Shyong (Tony) K. Lam, Kathering Panciera, Loren Terveen, John Riedl. "Creating, Destroying, and Restoring Value in Wikipedia." To appear in Proc. GROUP 2007. 10 pages.
http://www.cs.umn.edu/~reid/papers/group282-priedhorsky.pdf
Abstract: Wikipedia's brilliance and curse is that any user can edit any of the encyclopedia entries. We introduce the notion of the impact of an edit, measured by the number of times the edited version is viewed. Using several datasets, including recent logs of all article views, we show that an overwhelming majority of the viewed words were written by frequent editors and that this majority is increasing. Similarly, using the same impact measure, we show that the probability of a typical article view being damaged is small but increasing, and we present empirically grounded classes of damage. Finally, we make policy recommendations for Wikipedia and other wikis in light of these findings.
On 04/10/2007, Gregory Maxwell gmaxwell@gmail.com wrote:
viewed. Using several datasets, including recent logs of all article views, we show that an overwhelming majority of the viewed words were written by frequent editors and that this majority is increasing.
During the Board elections, didn't someone wave around a figure that the majority of our content was written by infrequent contributors? The two aren't strictly incompatible, of course, but this is rather interesting as a counterpoint (and it certainly fits my gut feeling).
On 10/4/07, Andrew Gray shimgray@gmail.com wrote:
On 04/10/2007, Gregory Maxwell gmaxwell@gmail.com wrote:
viewed. Using several datasets, including recent logs of all article views, we show that an overwhelming majority of the viewed words were written by frequent editors and that this majority is increasing.
During the Board elections, didn't someone wave around a figure that the majority of our content was written by infrequent contributors? The two aren't strictly incompatible, of course, but this is rather interesting as a counterpoint (and it certainly fits my gut feeling).
It has been claimed: http://www.aaronsw.com/weblog/whowriteswikipedia
And disputed: http://lists.wikimedia.org/pipermail/wikien-l/2006-September/053315.html
A related but mostly orthogonal claim (by myself) is that most articles have a single or very small number of authors. It would be interesting to see what direction that trend is moving in.
On 04/10/2007, Gregory Maxwell gmaxwell@gmail.com wrote:
On 10/4/07, Andrew Gray shimgray@gmail.com wrote:
On 04/10/2007, Gregory Maxwell gmaxwell@gmail.com wrote:
viewed. Using several datasets, including recent logs of all article views, we show that an overwhelming majority of the viewed words were written by frequent editors and that this majority is increasing.
During the Board elections, didn't someone wave around a figure that the majority of our content was written by infrequent contributors? The two aren't strictly incompatible, of course, but this is rather interesting as a counterpoint (and it certainly fits my gut feeling).
It has been claimed: http://www.aaronsw.com/weblog/whowriteswikipedia And disputed: http://lists.wikimedia.org/pipermail/wikien-l/2006-September/053315.html A related but mostly orthogonal claim (by myself) is that most articles have a single or very small number of authors. It would be interesting to see what direction that trend is moving in.
Note that "most text is by anons" and "most text *that people actually read* is by frequent editors" don't contradict.
- d.
Gregory Maxwell wrote:
Abstract: Wikipedia's brilliance and curse is that any user can edit any of the encyclopedia entries. We introduce the notion of the impact of an edit, measured by the number of times the edited version is viewed.
So I guess editors who delete articles are recorded as not contributing anything to Wikipedia by that action. Whether one sees this as a flaw in the study's methodology or as commentary on editing ethos probably depends on the eye of the beholder. :)
Would have been interesting to see that top ten list of "most impactful" editors they mention in the paper, though that may just be latent editcountitis talking.
On 10/4/07, Bryan Derksen bryan.derksen@shaw.ca wrote:
Gregory Maxwell wrote:
Abstract: Wikipedia's brilliance and curse is that any user can edit any of the encyclopedia entries. We introduce the notion of the impact of an edit, measured by the number of times the edited version is viewed.
So I guess editors who delete articles are recorded as not contributing anything to Wikipedia by that action.
Wouldn't the impact of a deletion be measured by the number of times "Wikipedia does not have an article with this exact name." is viewed?
On 10/4/07, Anthony wikimail@inbox.org wrote:
Wouldn't the impact of a deletion be measured by the number of times "Wikipedia does not have an article with this exact name." is viewed?
The negative impact, perhaps... but even that incompletely: Web searches don't normally take people to non-existent pages.
There is also a positive impact from some deletion, but thats harder to measure.
Gregory Maxwell wrote:
On 10/4/07, Anthony wikimail@inbox.org wrote:
Wouldn't the impact of a deletion be measured by the number of times "Wikipedia does not have an article with this exact name." is viewed?
The negative impact, perhaps... but even that incompletely: Web searches don't normally take people to non-existent pages.
There is also a positive impact from some deletion, but thats harder to measure.
Just count all the times the nonexistent pages aren't visited. Easy-peasy.
On 05/10/2007, Bryan Derksen bryan.derksen@shaw.ca wrote:
There is also a positive impact from some deletion, but thats harder to measure.
Just count all the times the nonexistent pages aren't visited. Easy-peasy.
"The impact of [[Gallery of Pokemon characters]] has been rated as i..."
On 10/4/07, Gregory Maxwell gmaxwell@gmail.com wrote:
On 10/4/07, Anthony wikimail@inbox.org wrote:
Wouldn't the impact of a deletion be measured by the number of times "Wikipedia does not have an article with this exact name." is viewed?
The negative impact, perhaps... but even that incompletely: Web searches don't normally take people to non-existent pages.
It only measures the impact according to the definition given by the researchers. That definition is what is incomplete. To be complete it'd have to measure how many people would have viewed the page would the edit not have been made. In the case of deletion which lasts a long time that's significant, but it's there in every edit, especially ones that remove information.
There is also a positive impact from some deletion, but thats harder to measure.
I think the impact has to be measured separately from whether or not the edit was positive.
If "Wikipedia does not have an article with this exact name" is better than the previous text, then it's a positive deletion, and anyone who viewed that page would be impacted by it. The impact is the same whether the deletion was positive or negative. Whether a deletion is positive or negative is usually subjective, but then again so is whether or not most edits are positive or negative.
On 10/4/07, Bryan Derksen bryan.derksen@shaw.ca wrote:
Would have been interesting to see that top ten list of "most impactful" editors they mention in the paper...
I suspect that most of the names on that list would also appear on the top ten list of "least popular" editors whom the community perennially loves to hate. Except for Rambot, of course.
—C.W.
On 04/10/2007, Gregory Maxwell gmaxwell@gmail.com wrote:
Abstract: Wikipedia's brilliance and curse is that any user can edit any of the encyclopedia entries. We introduce the notion of the impact of an edit, measured by the number of times the edited version is viewed.
Interesting points.
Seems to me, off hand, that this suggests:
- wikipedia needs to keep track of article views (possibly over different time periods) if we aren't already (I don't remember seeing it)
- It seems to me this concept suggests that watchlists should be sorted, not by date, but by article view rate so that the most viewed articles are first.
- we also should develop an index that determines which articles are both least watched and most viewed and highlight any edits that occur to these which are not being checked.
Using several datasets, including recent logs of all article views, we show that an overwhelming majority of the viewed words were written by frequent editors and that this majority is increasing.
This depends on your definition of "frequent editors". The paper doesn't make this terribly clear, but as I read it these are the key numbers:
* 25 trillion out of 34 trillion (73%) "page word views" (PWVs) are from content contributed by registered users. 27% is from anons. *The top 10% of editors (by edit count) account for 86% of the 25 trillion PWVs. But that's only 63% of total PWVs, when anon contributions are included. Also, that 10% is ca. 420,000 editors. We definitely don't have that many "frequent editors", for most values of frequent. *Top 1% of editors account of about 70% of the 25 trillion PWVs, or 51% of the total. So the top 42,000 editors account for about half. According to Erik Zachte's statistics, that's also about the number of users making more than 5 edits in a give month (as of October 2006, the end date of the study) *The top .1% (4,200) editors account for 44% of the 25 trillion, or 32% of the total. That's the approximate number of editors making
100 edits per month.
*The PWV share of the top 10% and top 1% groups were decreasing, not increasing. Only the top .1% group was increasing, and they don't account for a majority of content.
As David Gerard points out, this doesn't fully solve the question of "who writes Wikipedia", but it's certainly relevant (and I would say it points to a middle ground between the positions of Jimbo and Aaron Swartz). Unfortunately, it's terribly out of date; Wikipedia, and its readership, has changed a lot in the past year.
Yours in discouse, Sage (User:Ragesoss)
On 10/8/07, Sage Ross ragesoss+wikipedia@gmail.com wrote:
Using several datasets, including recent logs of all article views, we show that an overwhelming majority of the viewed words were written by frequent editors and that this majority is increasing.
This depends on your definition of "frequent editors". The paper doesn't make this terribly clear, but as I read it these are the key numbers:
- 25 trillion out of 34 trillion (73%) "page word views" (PWVs) are
from content contributed by registered users. 27% is from anons. *The top 10% of editors (by edit count) account for 86% of the 25 trillion PWVs. But that's only 63% of total PWVs, when anon contributions are included. Also, that 10% is ca. 420,000 editors. We definitely don't have that many "frequent editors", for most values of frequent. *Top 1% of editors account of about 70% of the 25 trillion PWVs, or 51% of the total. So the top 42,000 editors account for about half. According to Erik Zachte's statistics, that's also about the number of users making more than 5 edits in a give month (as of October 2006, the end date of the study) *The top .1% (4,200) editors account for 44% of the 25 trillion, or 32% of the total. That's the approximate number of editors making
100 edits per month.
*The PWV share of the top 10% and top 1% groups were decreasing, not increasing. Only the top .1% group was increasing, and they don't account for a majority of content.
As David Gerard points out, this doesn't fully solve the question of "who writes Wikipedia", but it's certainly relevant (and I would say it points to a middle ground between the positions of Jimbo and Aaron Swartz). Unfortunately, it's terribly out of date; Wikipedia, and its readership, has changed a lot in the past year.
I think I misinterpreted the paper; I think the percentiles are based on both registered and anonymous editors. See http://en.wikipedia.org/wiki/Wikipedia_talk:Wikipedia_Signpost/2007-10-08/Va... for more.
-Sage
Maveric is apparently the number 1 most influential editor -- he's contributed 0.5% of all viewed words, cementing my image of him as some sort of soft-spoken nerd superhero.
On 10/9/07, Sage Ross ragesoss+wikipedia@gmail.com wrote:
On 10/8/07, Sage Ross ragesoss+wikipedia@gmail.com wrote:
Using several datasets, including recent logs of all article views, we show that an overwhelming majority of the viewed words were written by frequent editors and that this majority is increasing.
This depends on your definition of "frequent editors". The paper doesn't make this terribly clear, but as I read it these are the key numbers:
- 25 trillion out of 34 trillion (73%) "page word views" (PWVs) are
from content contributed by registered users. 27% is from anons. *The top 10% of editors (by edit count) account for 86% of the 25 trillion PWVs. But that's only 63% of total PWVs, when anon contributions are included. Also, that 10% is ca. 420,000 editors. We definitely don't have that many "frequent editors", for most values of frequent. *Top 1% of editors account of about 70% of the 25 trillion PWVs, or 51% of the total. So the top 42,000 editors account for about half. According to Erik Zachte's statistics, that's also about the number of users making more than 5 edits in a give month (as of October 2006, the end date of the study) *The top .1% (4,200) editors account for 44% of the 25 trillion, or 32% of the total. That's the approximate number of editors making
100 edits per month.
*The PWV share of the top 10% and top 1% groups were decreasing, not increasing. Only the top .1% group was increasing, and they don't account for a majority of content.
As David Gerard points out, this doesn't fully solve the question of "who writes Wikipedia", but it's certainly relevant (and I would say it points to a middle ground between the positions of Jimbo and Aaron Swartz). Unfortunately, it's terribly out of date; Wikipedia, and its readership, has changed a lot in the past year.
I think I misinterpreted the paper; I think the percentiles are based on both registered and anonymous editors. See http://en.wikipedia.org/wiki/Wikipedia_talk:Wikipedia_Signpost/2007-10-08/Va... for more.
-Sage
WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: http://lists.wikimedia.org/mailman/listinfo/wikien-l
The people who write wikipedia are the registered users that you can see on AFD, RFA, The reference desk and the help desk. There's only about 6,000 of them so that's a tiny amount. Also, I read somewhere that those with more than 3200 contributions account for 50% of our edits
Phoenix 15
On 09/10/2007, Phoenix wiki phoenix.wiki@gmail.com wrote:
The people who write wikipedia are the registered users that you can see on AFD, RFA, The reference desk and the help desk.
Well, no. The people who write Wikipedia are the people who go and write articles. Participation in community support activities (RD, HD), internal community processes (RFA) and editorial content meta-debates (AFD) is ancillary to writing articles, and there are no shortage of people who spend far more time there than they do actually, you know, writing.
There's only about 6,000 of them so that's a tiny amount. Also, I read somewhere that those with more than 3200 contributions account for 50% of our edits
But those with more than 3200 contributions have a higher proportion of automated or semi-automated edits with little or no editorial signficiance than the rest of the population, and invariably a far, far higher proportion of simple vandalism reversion (which whilst a useful role, is completely "content neutral" and adds nothing new to the encyclopedia).
Editcount and community involvement can be general indicators of a "significant user", and it's rare to name one without these factors, but they certainly do not *make* someone one.
On 10/10/07, Andrew Gray shimgray@gmail.com wrote: On 09/10/2007, Phoenix wiki phoenix.wiki@gmail.com wrote:
The people who write wikipedia are the registered users that you can see
on
AFD, RFA, The reference desk and the help desk.
Well, no. The people who write Wikipedia are the people who go and write articles. Participation in community support activities (RD, HD), internal community processes (RFA) and editorial content meta-debates (AFD) is ancillary to writing articles, and there are no shortage of people who spend far more time there than they do actually, you know, writing.
What I meant was that you see them in those places. Most of them do loads of mainspace work too. The fact that many of the edits are automated and insignificant makes no difference because there are so many of them. Ips mainl do typo work
Phoenix 15
On 10/11/07, Phoenix wiki phoenix.wiki@gmail.com wrote:
What I meant was that you see them in those places. Most of them do loads of mainspace work too. The fact that many of the edits are automated and insignificant makes no difference because there are so many of them. Ips mainl do typo work
Mainly, perhaps, but that's like saying registered editors mainly do formatting, metadata and vandalism reversion. IPs are responsible for 27% of what gets read (as measured by "page word views"), and probably a larger proportion of total prose. It's not overwhelming, but it's definitely not insignificant.
-Sage
On 10/11/07, Sage Ross ragesoss+wikipedia@gmail.com wrote:
Mainly, perhaps, but that's like saying registered editors mainly do formatting, metadata and vandalism reversion. IPs are responsible for 27% of what gets read (as measured by "page word views"), and probably a larger proportion of total prose. It's not overwhelming, but it's definitely not insignificant.
Not to mention that a lot of people who are not IPs today *started* as IPs...
'Outsiders', 'Non-community members' are responsible for all of the content: We just became the community in the process.
Deep, enh?