Hello All,
Can you point me to research, or have ideas about metrics of user performance? I know edit count, and total bytes have their limitations. Right now I am counting the occurrences of "thank" "appreciate" and "barnstar" for a User in User talk namespace (and recursive subpages). What else is there?
Let me explain more about my current project
I am trying to develop some new techniques to measure user and article performance. I am repurposing the bi-partite economics trade model of countries-products, but instead using editors-articles. This means that I arrive at a new metric for users, and articles. Now I am calibrating some of the variables in this model, by comparing my results to exogenous variable. On pages, I use the metric that this listed pointed me to last time, like the actionable metrics from Group lens, and cleanup tags from Stein. (Thank list!). When I rank articles in a category using my economics method, versus the article-text methods I acheive .7 spearman correlation. Using my count-thanks-on-user-talk method for users in the user domain I acheive .50 spearman ranking correlation, which is still quite good, but I want to make sure there aren't better baselines to which to compare.
Thanks,
Maximilian Klein Wikipedian in Residence, OCLC +17074787023
Sort of related, an ongoing education@ discussion "student evaluation criteria". http://thread.gmane.org/gmane.org.wikimedia.education/854
Nemo
Hey Max,
There's a class of metrics that might be relevant to your purposes. I refer to them as "content persistence" metrics and wrote up some docs about how they work including an example. See https://meta.wikimedia.org/wiki/Research:Content_persistence.
I gathered a list of papers below to provide a starting point. I've included links to open access versions where I could. These metrics are a little bit painful to compute due to the computational complexity of diffs, but I have some hardware to throw at the problem and another project that's bringing me in this direction, so I'd be interested in collaborating.
Priedhorsky, Reid, et al. "Creating, destroying, and restoring value in Wikipedia." *Proceedings of the 2007 international ACM conference on Supporting group work*. ACM, 2007. http://reidster.net/pubs/group282-priedhorsky.pdf:
- Describes "Persistent word views" which is a measure of value added per editor. (IMO, value *actualized*)
B. Thomas Adler, Krishnendu Chatterjee, Luca de Alfaro, Marco Faella, Ian Pye, and Vishwanath Raman. 2008. Assigning trust to Wikipedia content. In Proceedings of the 4th International Symposium on Wikis (WikiSym '08). ACM, New York, NY, USA, , Article 26 , 12 pages. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.141.2047&rep=re...
- Describes a complex strategy for assigning trustworthiness to content based on implicit review. See http://wikitrust.soe.ucsc.edu/
Halfaker, A., Kittur, A., Kraut, R., & Riedl, J. (2009, October). A jury of your peers: quality, experience and ownership in Wikipedia. In *Proceedings of the 5th International Symposium on Wikis and Open Collaboration* (p. 15). ACM. http://www-users.cs.umn.edu/~halfak/publications/A_Jury_of_Your_Peers/halfak...
- Describes the use of "Persistent word revisions per word" as a measure of article contribution quality.
Halfaker, A., Kittur, A., & Riedl, J. (2011, October). Don't bite the newbies: how reverts affect the quantity and quality of Wikipedia work. In *Proceedings of the 7th International Symposium on Wikis and Open Collaboration* (pp. 163-172). ACM. http://www-users.cs.umn.edu/~halfak/publications/Don%27t_Bite_the_Newbies/ha...
- Describes the use of raw "Persistent work revisions" as a measure of editor productivity - Looking back on the study, I think I'd rather use log(# of revisions a word persists) * words.
-Aaron
On Fri, Feb 7, 2014 at 1:48 AM, Federico Leva (Nemo) nemowiki@gmail.comwrote:
Sort of related, an ongoing education@ discussion "student evaluation criteria". http://thread.gmane.org/gmane.org.wikimedia.education/854
Nemo
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Thanks Nemo, I'll re-read that discussion. I think that conversation is where I became tentative of using bytes or edit counts.
Aaron, in my own search I also noticed you wrote with Geiger. About counting edit hour and edit sessions. [1] Calculating content persistence is a bit too heavyweight for me right now since I am trying to submit to ACM Web Science in 2 weeks (hose CFP was just on this list). The technique looks great though, and I would like to help support making a WMFlabs tool that can return this measure.
It seems like I could calculate approximate edit-hours from just looking at Special:Contributions timestamps. Is that correct? Would you suggest this route?
[1] http://www-users.cs.umn.edu/~halfak/publications/Using_Edit_Sessions_to_Meas...
Maximilian Klein Wikipedian in Residence, OCLC +17074787023
________________________________ From: wiki-research-l-bounces@lists.wikimedia.org wiki-research-l-bounces@lists.wikimedia.org on behalf of Aaron Halfaker aaron.halfaker@gmail.com Sent: Friday, February 07, 2014 7:12 AM To: Research into Wikimedia content and communities Subject: Re: [Wiki-research-l] Preexsiting Researchers on Metrics for Users?
Hey Max,
There's a class of metrics that might be relevant to your purposes. I refer to them as "content persistence" metrics and wrote up some docs about how they work including an example. See https://meta.wikimedia.org/wiki/Research:Content_persistence.
I gathered a list of papers below to provide a starting point. I've included links to open access versions where I could. These metrics are a little bit painful to compute due to the computational complexity of diffs, but I have some hardware to throw at the problem and another project that's bringing me in this direction, so I'd be interested in collaborating.
Priedhorsky, Reid, et al. "Creating, destroying, and restoring value in Wikipedia." Proceedings of the 2007 international ACM conference on Supporting group work. ACM, 2007. http://reidster.net/pubs/group282-priedhorsky.pdf:
* Describes "Persistent word views" which is a measure of value added per editor. (IMO, value actualized)
B. Thomas Adler, Krishnendu Chatterjee, Luca de Alfaro, Marco Faella, Ian Pye, and Vishwanath Raman. 2008. Assigning trust to Wikipedia content. In Proceedings of the 4th International Symposium on Wikis (WikiSym '08). ACM, New York, NY, USA, , Article 26 , 12 pages. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.141.2047&rep=re...
* Describes a complex strategy for assigning trustworthiness to content based on implicit review. See http://wikitrust.soe.ucsc.edu/
Halfaker, A., Kittur, A., Kraut, R., & Riedl, J. (2009, October). A jury of your peers: quality, experience and ownership in Wikipedia. In Proceedings of the 5th International Symposium on Wikis and Open Collaboration (p. 15). ACM. http://www-users.cs.umn.edu/~halfak/publications/A_Jury_of_Your_Peers/halfak...
* Describes the use of "Persistent word revisions per word" as a measure of article contribution quality.
Halfaker, A., Kittur, A., & Riedl, J. (2011, October). Don't bite the newbies: how reverts affect the quantity and quality of Wikipedia work. In Proceedings of the 7th International Symposium on Wikis and Open Collaboration (pp. 163-172). ACM. http://www-users.cs.umn.edu/~halfak/publications/Don%27t_Bite_the_Newbies/ha...
* Describes the use of raw "Persistent work revisions" as a measure of editor productivity * Looking back on the study, I think I'd rather use log(# of revisions a word persists) * words.
-Aaron
On Fri, Feb 7, 2014 at 1:48 AM, Federico Leva (Nemo) <nemowiki@gmail.commailto:nemowiki@gmail.com> wrote: Sort of related, an ongoing education@ discussion "student evaluation criteria". http://thread.gmane.org/gmane.org.wikimedia.education/854
Nemo
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.orgmailto:Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
I talked to Max on IRC, but I'm pointing here for the lurkers :)
I think that measuring labor hours via edit sessions is a great idea and I have python library to help extract sessions from edit histories. See https://bitbucket.org/halfak/mediawiki-utilities.
Assuming that you have a list of a user's revisions from the API, using the session extractor to build a set of session start and end timestamps for a user would look like this:
---------------------------- *from mwutil.lib import sessions*
# Get your revisions ordered by timestamp # revisions = <some API call result>
events = (rev['user'], rev['timestamp'], rev) for rev in revisions
for user, session in *sessions.sessions*(events):
# write out a TSV file print "\t".join( str(v) for v in [user, len(session), session[0]['timestamp'], session[-1]['timestamp'] ) ---------------------------
On Fri, Feb 7, 2014 at 12:25 PM, Klein,Max kleinm@oclc.org wrote:
Thanks Nemo, I'll re-read that discussion. I think that conversation is where I became tentative of using bytes or edit counts.
Aaron, in my own search I also noticed you wrote with Geiger. About counting edit hour and edit sessions. [1] Calculating content persistence is a bit too heavyweight for me right now since I am trying to submit to ACM Web Science in 2 weeks (hose CFP was just on this list). The technique looks great though, and I would like to help support making a WMFlabs tool that can return this measure.
It seems like I could calculate approximate edit-hours from just looking at Special:Contributions timestamps. Is that correct? Would you suggest this route?
[1] http://www-users.cs.umn.edu/~halfak/publications/Using_Edit_Sessions_to_Meas...
Maximilian Klein Wikipedian in Residence, OCLC +17074787023
*From:* wiki-research-l-bounces@lists.wikimedia.org < wiki-research-l-bounces@lists.wikimedia.org> on behalf of Aaron Halfaker < aaron.halfaker@gmail.com> *Sent:* Friday, February 07, 2014 7:12 AM *To:* Research into Wikimedia content and communities *Subject:* Re: [Wiki-research-l] Preexsiting Researchers on Metrics for Users?
Hey Max,
There's a class of metrics that might be relevant to your purposes. I refer to them as "content persistence" metrics and wrote up some docs about how they work including an example. See https://meta.wikimedia.org/wiki/Research:Content_persistence.
I gathered a list of papers below to provide a starting point. I've included links to open access versions where I could. These metrics are a little bit painful to compute due to the computational complexity of diffs, but I have some hardware to throw at the problem and another project that's bringing me in this direction, so I'd be interested in collaborating.
Priedhorsky, Reid, et al. "Creating, destroying, and restoring value in Wikipedia." *Proceedings of the 2007 international ACM conference on Supporting group work*. ACM, 2007. http://reidster.net/pubs/group282-priedhorsky.pdf:
- Describes "Persistent word views" which is a measure of value added
per editor. (IMO, value *actualized*)
B. Thomas Adler, Krishnendu Chatterjee, Luca de Alfaro, Marco Faella, Ian Pye, and Vishwanath Raman. 2008. Assigning trust to Wikipedia content. In Proceedings of the 4th International Symposium on Wikis (WikiSym '08). ACM, New York, NY, USA, , Article 26 , 12 pages. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.141.2047&rep=re...
- Describes a complex strategy for assigning trustworthiness to
content based on implicit review. See http://wikitrust.soe.ucsc.edu/
Halfaker, A., Kittur, A., Kraut, R., & Riedl, J. (2009, October). A jury of your peers: quality, experience and ownership in Wikipedia. In *Proceedings of the 5th International Symposium on Wikis and Open Collaboration* (p. 15). ACM. http://www-users.cs.umn.edu/~halfak/publications/A_Jury_of_Your_Peers/halfak...
- Describes the use of "Persistent word revisions per word" as a
measure of article contribution quality.
Halfaker, A., Kittur, A., & Riedl, J. (2011, October). Don't bite the newbies: how reverts affect the quantity and quality of Wikipedia work. In *Proceedings of the 7th International Symposium on Wikis and Open Collaboration* (pp. 163-172). ACM. http://www-users.cs.umn.edu/~halfak/publications/Don%27t_Bite_the_Newbies/ha...
- Describes the use of raw "Persistent work revisions" as a measure of
editor productivity
- Looking back on the study, I think I'd rather use log(# of revisions
a word persists) * words.
-Aaron
On Fri, Feb 7, 2014 at 1:48 AM, Federico Leva (Nemo) nemowiki@gmail.comwrote:
Sort of related, an ongoing education@ discussion "student evaluation criteria". http://thread.gmane.org/gmane.org.wikimedia.education/854
Nemo
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
However, measuring productivity by the difference of the times of first and last edits won't do much for those of us who work on pages for hours before pressing the save button and only save once. (: It also doesn't measure time spent on private wikis or discussions on email and IRC, which also are not countable as productivity if you look only at public edit counts and logged actions.
I'm assuming that login and logout times on all wikis are not available for research use. If they were there would be privacy issues although mitigation is possible.
Pine
From: aaron.halfaker@gmail.com Date: Fri, 7 Feb 2014 17:15:36 -0600 To: wiki-research-l@lists.wikimedia.org Subject: Re: [Wiki-research-l] Preexsiting Researchers on Metrics for Users?
I talked to Max on IRC, but I'm pointing here for the lurkers :) I think that measuring labor hours via edit sessions is a great idea and I have python library to help extract sessions from edit histories. See https://bitbucket.org/halfak/mediawiki-utilities.
Assuming that you have a list of a user's revisions from the API, using the session extractor to build a set of session start and end timestamps for a user would look like this:
----------------------------from mwutil.lib import sessions
# Get your revisions ordered by timestamp# revisions = <some API call result>
events = (rev['user'], rev['timestamp'], rev) for rev in revisions for user, session in sessions.sessions(events):
# write out a TSV file print "\t".join(
str(v) for v in [user, len(session), session[0]['timestamp'], session[-1]['timestamp']
)---------------------------
On Fri, Feb 7, 2014 at 12:25 PM, Klein,Max kleinm@oclc.org wrote:
Thanks Nemo, I'll re-read that discussion. I think that conversation is where I became tentative of using bytes or edit counts.
Aaron, in my own search I also noticed you wrote with Geiger. About counting edit hour and edit sessions. [1] Calculating content persistence is a bit too heavyweight for me right now since I am trying to submit to ACM Web Science in 2 weeks (hose CFP was just on this list). The technique looks great though, and I would like to help support making a WMFlabs tool that can return this measure.
It seems like I could calculate approximate edit-hours from just looking at Special:Contributions timestamps. Is that correct? Would you suggest this route?
[1] http://www-users.cs.umn.edu/~halfak/publications/Using_Edit_Sessions_to_Meas...
Maximilian Klein
Wikipedian in Residence, OCLC
+17074787023
From: wiki-research-l-bounces@lists.wikimedia.org wiki-research-l-bounces@lists.wikimedia.org on behalf of Aaron Halfaker aaron.halfaker@gmail.com
Sent: Friday, February 07, 2014 7:12 AM
To: Research into Wikimedia content and communities
Subject: Re: [Wiki-research-l] Preexsiting Researchers on Metrics for Users?
Hey Max,
There's a class of metrics that might be relevant to your purposes. I refer to them as "content persistence" metrics and wrote up some docs about how they work including an example. See https://meta.wikimedia.org/wiki/Research:Content_persistence.
I gathered a list of papers below to provide a starting point. I've included links to open access versions where I could. These metrics are a little bit painful to compute due to the computational complexity of diffs, but I have some hardware to throw at the problem and another project that's bringing me in this direction, so I'd be interested in collaborating.
Priedhorsky, Reid, et al. "Creating, destroying, and restoring value in Wikipedia." Proceedings of the 2007 international ACM conference on Supporting group work. ACM, 2007. http://reidster.net/pubs/group282-priedhorsky.pdf:
Describes "Persistent word views" which is a measure of value added per editor. (IMO, value actualized)
B. Thomas Adler, Krishnendu Chatterjee, Luca de Alfaro, Marco Faella, Ian Pye, and Vishwanath Raman. 2008. Assigning trust to Wikipedia content. In Proceedings of the 4th International Symposium on Wikis (WikiSym '08). ACM, New York, NY, USA, , Article 26 , 12 pages. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.141.2047&rep=re...
Describes a complex strategy for assigning trustworthiness to content based on implicit review. See http://wikitrust.soe.ucsc.edu/
Halfaker, A., Kittur, A., Kraut, R., & Riedl, J. (2009, October). A jury of your peers: quality, experience and ownership in Wikipedia. In Proceedings of the 5th International Symposium on Wikis and Open Collaboration (p. 15). ACM. http://www-users.cs.umn.edu/~halfak/publications/A_Jury_of_Your_Peers/halfak...
Describes the use of "Persistent word revisions per word" as a measure of article contribution quality.
Halfaker, A., Kittur, A., & Riedl, J. (2011, October). Don't bite the newbies: how reverts affect the quantity and quality of Wikipedia work. In Proceedings of the 7th International Symposium on Wikis and Open Collaboration (pp. 163-172). ACM. http://www-users.cs.umn.edu/~halfak/publications/Don%27t_Bite_the_Newbies/ha...
Describes the use of raw "Persistent work revisions" as a measure of editor productivity Looking back on the study, I think I'd rather use log(# of revisions a word persists) * words.
-Aaron
On Fri, Feb 7, 2014 at 1:48 AM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Sort of related, an ongoing education@ discussion "student evaluation criteria". http://thread.gmane.org/gmane.org.wikimedia.education/854
Nemo
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
However, measuring productivity by the difference of the times of first
and last edits won't do much for those of us who work on pages for hours before pressing the save button and only save once.
Agreed. This is a limitation. However, if you're doing other work while writing the article or making intermittent saves as you go, then it will be captured. Ethnographic work suggests that what you describe is uncommon, but present. for this reason and others, it's important to see this "labor hours" estimate as a lower bound. There's a lot of off-wiki work that isn't accounted for in any candidate measures using Wikipedia data. For example, you'd have the exact same issue with edit counts and content persistence.
On Fri, Feb 7, 2014 at 9:29 PM, ENWP Pine deyntestiss@hotmail.com wrote:
However, measuring productivity by the difference of the times of first and last edits won't do much for those of us who work on pages for hours before pressing the save button and only save once. (: It also doesn't measure time spent on private wikis or discussions on email and IRC, which also are not countable as productivity if you look only at public edit counts and logged actions.
I'm assuming that login and logout times on all wikis are not available for research use. If they were there would be privacy issues although mitigation is possible.
Pine
From: aaron.halfaker@gmail.com Date: Fri, 7 Feb 2014 17:15:36 -0600 To: wiki-research-l@lists.wikimedia.org
Subject: Re: [Wiki-research-l] Preexsiting Researchers on Metrics for Users?
I talked to Max on IRC, but I'm pointing here for the lurkers :)
I think that measuring labor hours via edit sessions is a great idea and I have python library to help extract sessions from edit histories. See https://bitbucket.org/halfak/mediawiki-utilities.
Assuming that you have a list of a user's revisions from the API, using the session extractor to build a set of session start and end timestamps for a user would look like this:
*from mwutil.lib import sessions*
# Get your revisions ordered by timestamp # revisions = <some API call result>
events = (rev['user'], rev['timestamp'], rev) for rev in revisions
for user, session in *sessions.sessions*(events):
# write out a TSV file print "\t".join( str(v) for v in [user, len(session), session[0]['timestamp'],
session[-1]['timestamp'] )
On Fri, Feb 7, 2014 at 12:25 PM, Klein,Max kleinm@oclc.org wrote:
Thanks Nemo, I'll re-read that discussion. I think that conversation is where I became tentative of using bytes or edit counts.
Aaron, in my own search I also noticed you wrote with Geiger. About counting edit hour and edit sessions. [1] Calculating content persistence is a bit too heavyweight for me right now since I am trying to submit to ACM Web Science in 2 weeks (hose CFP was just on this list). The technique looks great though, and I would like to help support making a WMFlabs tool that can return this measure.
It seems like I could calculate approximate edit-hours from just looking at Special:Contributions timestamps. Is that correct? Would you suggest this route?
[1] http://www-users.cs.umn.edu/~halfak/publications/Using_Edit_Sessions_to_Meas...
Maximilian Klein Wikipedian in Residence, OCLC +17074787023
*From:* wiki-research-l-bounces@lists.wikimedia.org < wiki-research-l-bounces@lists.wikimedia.org> on behalf of Aaron Halfaker < aaron.halfaker@gmail.com> *Sent:* Friday, February 07, 2014 7:12 AM *To:* Research into Wikimedia content and communities *Subject:* Re: [Wiki-research-l] Preexsiting Researchers on Metrics for Users?
Hey Max,
There's a class of metrics that might be relevant to your purposes. I refer to them as "content persistence" metrics and wrote up some docs about how they work including an example. See https://meta.wikimedia.org/wiki/Research:Content_persistence.
I gathered a list of papers below to provide a starting point. I've included links to open access versions where I could. These metrics are a little bit painful to compute due to the computational complexity of diffs, but I have some hardware to throw at the problem and another project that's bringing me in this direction, so I'd be interested in collaborating.
Priedhorsky, Reid, et al. "Creating, destroying, and restoring value in Wikipedia." *Proceedings of the 2007 international ACM conference on Supporting group work*. ACM, 2007. http://reidster.net/pubs/group282-priedhorsky.pdf:
- Describes "Persistent word views" which is a measure of value added
per editor. (IMO, value *actualized*)
B. Thomas Adler, Krishnendu Chatterjee, Luca de Alfaro, Marco Faella, Ian Pye, and Vishwanath Raman. 2008. Assigning trust to Wikipedia content. In Proceedings of the 4th International Symposium on Wikis (WikiSym '08). ACM, New York, NY, USA, , Article 26 , 12 pages. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.141.2047&rep=re...
- Describes a complex strategy for assigning trustworthiness to
content based on implicit review. See http://wikitrust.soe.ucsc.edu/
Halfaker, A., Kittur, A., Kraut, R., & Riedl, J. (2009, October). A jury of your peers: quality, experience and ownership in Wikipedia. In *Proceedings of the 5th International Symposium on Wikis and Open Collaboration* (p. 15). ACM. http://www-users.cs.umn.edu/~halfak/publications/A_Jury_of_Your_Peers/halfak...
- Describes the use of "Persistent word revisions per word" as a
measure of article contribution quality.
Halfaker, A., Kittur, A., & Riedl, J. (2011, October). Don't bite the newbies: how reverts affect the quantity and quality of Wikipedia work. In *Proceedings of the 7th International Symposium on Wikis and Open Collaboration* (pp. 163-172). ACM. http://www-users.cs.umn.edu/~halfak/publications/Don%27t_Bite_the_Newbies/ha...
- Describes the use of raw "Persistent work revisions" as a measure of
editor productivity
- Looking back on the study, I think I'd rather use log(# of revisions
a word persists) * words.
-Aaron
On Fri, Feb 7, 2014 at 1:48 AM, Federico Leva (Nemo) nemowiki@gmail.comwrote:
Sort of related, an ongoing education@ discussion "student evaluation criteria". http://thread.gmane.org/gmane.org.wikimedia.education/854
Nemo
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Pine, that's a good point. The entire purpose of my research is to try to propose a new better metric for editors, because the ones we have at the moment are incomplete as you pointed out. The reason I need these imperfect ones is to calibrate my model against the best metrics we have at the moment.
This economic model tries to capture hidden capabilities. The intuition is that edits to articles that are not less heavily edited by others show that you are doing good background reading, or have some obscure knowledge. You know, it's only the Swiss that make Swiss watches, but every country and their mother can export Apples. Similarly, in this case I would say you're edits to Obama's page are less import than your edits to "Non-euclidean geometry". How much should the number of other editors on the articles you edit count? Well I "calibrate" that variable such that my model most closely correlates with the best available metrics we have so far.
Side note: It is still very network heavy to compute labour-hours over the network via API. Did not even manage 600 users in 8 hours. So its germane to use the SQL replicas. I have been trying to do this over an ssh tunnel, locally or on wmflabs directly.
Maximilian Klein Wikipedian in Residence, OCLC +17074787023
________________________________ From: wiki-research-l-bounces@lists.wikimedia.org wiki-research-l-bounces@lists.wikimedia.org on behalf of Aaron Halfaker aaron.halfaker@gmail.com Sent: Saturday, February 08, 2014 8:24 AM To: Research into Wikimedia content and communities Subject: Re: [Wiki-research-l] Preexsiting Researchers on Metrics for Users?
However, measuring productivity by the difference of the times of first and last edits won't do much for those of us who work on pages for hours before pressing the save button and only save once.
Agreed. This is a limitation. However, if you're doing other work while writing the article or making intermittent saves as you go, then it will be captured. Ethnographic work suggests that what you describe is uncommon, but present. for this reason and others, it's important to see this "labor hours" estimate as a lower bound. There's a lot of off-wiki work that isn't accounted for in any candidate measures using Wikipedia data. For example, you'd have the exact same issue with edit counts and content persistence.
On Fri, Feb 7, 2014 at 9:29 PM, ENWP Pine <deyntestiss@hotmail.commailto:deyntestiss@hotmail.com> wrote: However, measuring productivity by the difference of the times of first and last edits won't do much for those of us who work on pages for hours before pressing the save button and only save once. (: It also doesn't measure time spent on private wikis or discussions on email and IRC, which also are not countable as productivity if you look only at public edit counts and logged actions.
I'm assuming that login and logout times on all wikis are not available for research use. If they were there would be privacy issues although mitigation is possible.
Pine
________________________________ From: aaron.halfaker@gmail.commailto:aaron.halfaker@gmail.com Date: Fri, 7 Feb 2014 17:15:36 -0600 To: wiki-research-l@lists.wikimedia.orgmailto:wiki-research-l@lists.wikimedia.org
Subject: Re: [Wiki-research-l] Preexsiting Researchers on Metrics for Users?
I talked to Max on IRC, but I'm pointing here for the lurkers :)
I think that measuring labor hours via edit sessions is a great idea and I have python library to help extract sessions from edit histories. See https://bitbucket.org/halfak/mediawiki-utilities.
Assuming that you have a list of a user's revisions from the API, using the session extractor to build a set of session start and end timestamps for a user would look like this:
---------------------------- from mwutil.lib import sessions
# Get your revisions ordered by timestamp # revisions = <some API call result>
events = (rev['user'], rev['timestamp'], rev) for rev in revisions
for user, session in sessions.sessions(events):
# write out a TSV file print "\t".join( str(v) for v in [user, len(session), session[0]['timestamp'], session[-1]['timestamp'] ) ---------------------------
On Fri, Feb 7, 2014 at 12:25 PM, Klein,Max <kleinm@oclc.orgmailto:kleinm@oclc.org> wrote: Thanks Nemo, I'll re-read that discussion. I think that conversation is where I became tentative of using bytes or edit counts.
Aaron, in my own search I also noticed you wrote with Geiger. About counting edit hour and edit sessions. [1] Calculating content persistence is a bit too heavyweight for me right now since I am trying to submit to ACM Web Science in 2 weeks (hose CFP was just on this list). The technique looks great though, and I would like to help support making a WMFlabs tool that can return this measure.
It seems like I could calculate approximate edit-hours from just looking at Special:Contributions timestamps. Is that correct? Would you suggest this route?
[1] http://www-users.cs.umn.edu/~halfak/publications/Using_Edit_Sessions_to_Meas...
Maximilian Klein Wikipedian in Residence, OCLC +17074787023
________________________________ From: wiki-research-l-bounces@lists.wikimedia.orgmailto:wiki-research-l-bounces@lists.wikimedia.org <wiki-research-l-bounces@lists.wikimedia.orgmailto:wiki-research-l-bounces@lists.wikimedia.org> on behalf of Aaron Halfaker <aaron.halfaker@gmail.commailto:aaron.halfaker@gmail.com> Sent: Friday, February 07, 2014 7:12 AM To: Research into Wikimedia content and communities Subject: Re: [Wiki-research-l] Preexsiting Researchers on Metrics for Users?
Hey Max,
There's a class of metrics that might be relevant to your purposes. I refer to them as "content persistence" metrics and wrote up some docs about how they work including an example. See https://meta.wikimedia.org/wiki/Research:Content_persistence.
I gathered a list of papers below to provide a starting point. I've included links to open access versions where I could. These metrics are a little bit painful to compute due to the computational complexity of diffs, but I have some hardware to throw at the problem and another project that's bringing me in this direction, so I'd be interested in collaborating.
Priedhorsky, Reid, et al. "Creating, destroying, and restoring value in Wikipedia." Proceedings of the 2007 international ACM conference on Supporting group work. ACM, 2007. http://reidster.net/pubs/group282-priedhorsky.pdf:
* Describes "Persistent word views" which is a measure of value added per editor. (IMO, value actualized)
B. Thomas Adler, Krishnendu Chatterjee, Luca de Alfaro, Marco Faella, Ian Pye, and Vishwanath Raman. 2008. Assigning trust to Wikipedia content. In Proceedings of the 4th International Symposium on Wikis (WikiSym '08). ACM, New York, NY, USA, , Article 26 , 12 pages. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.141.2047&rep=re...
* Describes a complex strategy for assigning trustworthiness to content based on implicit review. See http://wikitrust.soe.ucsc.edu/
Halfaker, A., Kittur, A., Kraut, R., & Riedl, J. (2009, October). A jury of your peers: quality, experience and ownership in Wikipedia. In Proceedings of the 5th International Symposium on Wikis and Open Collaboration (p. 15). ACM. http://www-users.cs.umn.edu/~halfak/publications/A_Jury_of_Your_Peers/halfak...
* Describes the use of "Persistent word revisions per word" as a measure of article contribution quality.
Halfaker, A., Kittur, A., & Riedl, J. (2011, October). Don't bite the newbies: how reverts affect the quantity and quality of Wikipedia work. In Proceedings of the 7th International Symposium on Wikis and Open Collaboration (pp. 163-172). ACM. http://www-users.cs.umn.edu/~halfak/publications/Don%27t_Bite_the_Newbies/ha...http://www-users.cs.umn.edu/~halfak/publications/Don%27t_Bite_the_Newbies/halfaker11bite-personal.pdf
* Describes the use of raw "Persistent work revisions" as a measure of editor productivity * Looking back on the study, I think I'd rather use log(# of revisions a word persists) * words.
-Aaron
On Fri, Feb 7, 2014 at 1:48 AM, Federico Leva (Nemo) <nemowiki@gmail.commailto:nemowiki@gmail.com> wrote: Sort of related, an ongoing education@ discussion "student evaluation criteria". http://thread.gmane.org/gmane.org.wikimedia.education/854
Nemo
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.orgmailto:Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.orgmailto:Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.orgmailto:Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.orgmailto:Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
wiki-research-l@lists.wikimedia.org