Hi all,
I'm working on a study for which I'd like to know more about editors' watchlisting practices. Of course what I'd really like is to know who had what page on their watchlist when, but I understand the obvious privacy issues there. I assume those issues explain why that information is not (AFAIK) available in dumps etc.
I have read some great qualitative pieces which discuss watchlisting [e.g. 1], which are very helpful (please don't hesitate to suggest others), but haven't seen quantitative data, which our study calls for.
Failing exact data, what do we know about the distribution of practices of watchlisting?
Currently my plan is to assume that anyone who has edited an article in the past 6 months has it on their watchlist. Obviously a very corse assumption. If we had any empirical knowledge about these practices then I could use a distribution (e.g. editors have the page on their watchlist at some % chance, altering depending on their number/tenure of editing that page). I also don't have any way to estimate whether someone who has never edited a page has a page on their watchlist (or assuming that some do, whether there's any useful way to guess which pages they are likely to have on their watchlists).
Grateful for any suggestions or reactions,
Thanks, James Howison
[1]: Bryant, S., Forte, A., and Bruckman, A. (2005). Becoming Wikipedian: transformation of participation in a collaborative online encyclopedia. In Proceedings of the 2005 international ACM SIGGROUP conference on Supporting group work, page 10. ACM.
I notice that the longer I edit the less I use the watchlist, and the less investment I have in my edits. If I'm actively editing I'll use the editing history, and if sourced material has been removed, even years ago, I'll just find it and put it back. Day to day monitoring is kind of a nuisance, essentially whackamole.
Investment in articles varies greatly.
Fred Bauder
Hi all,
I'm working on a study for which I'd like to know more about editors' watchlisting practices. Of course what I'd really like is to know who had what page on their watchlist when, but I understand the obvious privacy issues there. I assume those issues explain why that information is not (AFAIK) available in dumps etc.
I have read some great qualitative pieces which discuss watchlisting [e.g. 1], which are very helpful (please don't hesitate to suggest others), but haven't seen quantitative data, which our study calls for.
Failing exact data, what do we know about the distribution of practices of watchlisting?
Currently my plan is to assume that anyone who has edited an article in the past 6 months has it on their watchlist. Obviously a very corse assumption. If we had any empirical knowledge about these practices then I could use a distribution (e.g. editors have the page on their watchlist at some % chance, altering depending on their number/tenure of editing that page). I also don't have any way to estimate whether someone who has never edited a page has a page on their watchlist (or assuming that some do, whether there's any useful way to guess which pages they are likely to have on their watchlists).
Grateful for any suggestions or reactions,
Thanks, James Howison
[1]: Bryant, S., Forte, A., and Bruckman, A. (2005). Becoming Wikipedian: transformation of participation in a collaborative online encyclopedia. In Proceedings of the 2005 international ACM SIGGROUP conference on Supporting group work, page 10. ACM.
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Neat project idea, James!
You can ask for volunteers to share their watchlists, but I don't know how you get a representative sample. But I think it's worth trying since no one has done it as far as I know. And then you just clearly describe your sample as volunteer selected and do some descriptive statistics on it and show how those users compare to known Wikipedia averages.
The database will tell you how many people are watching a page, if it's more than 30. You could compare that to the number of editors of the page.... Well, it's a start.
For what it's worth, I think my watchlisting behavior changed when I switched from the default mode of showing just the last edit for each article to the mode that shows all edits. Stuff that changes too often got unwatched fast. :-) So if you do get volunteers, I'd collect watchlist settings they use.
By the way, I wonder if the recent UI changes (moving from the word "watch" to the star icon for watching a page) may have affected things at all. A mundane but possibly non-trivial factor....
Good luck with it!
-- Amy
On 7/1/10 1:41 PM, James Howison wrote:
Hi all,
I'm working on a study for which I'd like to know more about editors' watchlisting practices. Of course what I'd really like is to know who had what page on their watchlist when, but I understand the obvious privacy issues there. I assume those issues explain why that information is not (AFAIK) available in dumps etc.
I have read some great qualitative pieces which discuss watchlisting [e.g. 1], which are very helpful (please don't hesitate to suggest others), but haven't seen quantitative data, which our study calls for.
Failing exact data, what do we know about the distribution of practices of watchlisting?
Currently my plan is to assume that anyone who has edited an article in the past 6 months has it on their watchlist. Obviously a very corse assumption. If we had any empirical knowledge about these practices then I could use a distribution (e.g. editors have the page on their watchlist at some % chance, altering depending on their number/tenure of editing that page). I also don't have any way to estimate whether someone who has never edited a page has a page on their watchlist (or assuming that some do, whether there's any useful way to guess which pages they are likely to have on their watchlists).
Grateful for any suggestions or reactions,
Thanks, James Howison
[1]: Bryant, S., Forte, A., and Bruckman, A. (2005). Becoming Wikipedian: transformation of participation in a collaborative online encyclopedia. In Proceedings of the 2005 international ACM SIGGROUP conference on Supporting group work, page 10. ACM.
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Cool idea, James! Let's chat about it one of these days if you have time. One problem I would foresee with using the number of accounts watching a page is that you don't know when they last logged in or actually checked their "watched pages" feed - I know I watch quite a few pages but in effect I don't actually check on the changes unless I am expecting a message (that would be, for watched Talk pages). There may be editors who left Wikipedia but they are still listed among the people "watching" a page even when the account is inactive.
As for asking some editors for their lists - I know some people voluntarily expose / make visible their watched lists. As Amy said, that's probably not representative, but would be a start, and they are probably more likely to be willing to talk to you about they watching behavior too.
Good luck. Andreea
-----Original Message----- From: wiki-research-l-bounces@lists.wikimedia.org [mailto:wiki-research-l-bounces@lists.wikimedia.org] On Behalf Of Amy Bruckman Sent: Thursday, July 01, 2010 9:12 PM To: wiki-research-l@lists.wikimedia.org Subject: Re: [Wiki-research-l] research on watchlist behaviors?
Neat project idea, James!
You can ask for volunteers to share their watchlists, but I don't know how you get a representative sample. But I think it's worth trying since no one has done it as far as I know. And then you just clearly describe your sample as volunteer selected and do some descriptive statistics on it and show how those users compare to known Wikipedia averages.
The database will tell you how many people are watching a page, if it's more than 30. You could compare that to the number of editors of the page.... Well, it's a start.
For what it's worth, I think my watchlisting behavior changed when I switched from the default mode of showing just the last edit for each article to the mode that shows all edits. Stuff that changes too often got unwatched fast. :-) So if you do get volunteers, I'd collect watchlist settings they use.
By the way, I wonder if the recent UI changes (moving from the word "watch" to the star icon for watching a page) may have affected things at all. A mundane but possibly non-trivial factor....
Good luck with it!
-- Amy
On 7/1/10 1:41 PM, James Howison wrote:
Hi all,
I'm working on a study for which I'd like to know more about editors' watchlisting practices. Of course what I'd really like is to know who had what page on their watchlist when, but I understand the obvious privacy issues there. I assume those issues explain why that information is not (AFAIK) available in dumps etc.
I have read some great qualitative pieces which discuss watchlisting [e.g. 1], which are very helpful (please don't hesitate to suggest others), but haven't seen quantitative data, which our study calls for.
Failing exact data, what do we know about the distribution of practices of watchlisting?
Currently my plan is to assume that anyone who has edited an article in the past 6 months has it on their watchlist. Obviously a very corse assumption. If we had any empirical knowledge about these practices then I could use a distribution (e.g. editors have the page on their watchlist at some % chance, altering depending on their number/tenure of editing that page). I also don't have any way to estimate whether someone who has never edited a page has a page on their watchlist (or assuming that some do, whether there's any useful way to guess which pages they are likely to have on their watchlists).
Grateful for any suggestions or reactions,
Thanks, James Howison
[1]: Bryant, S., Forte, A., and Bruckman, A. (2005). Becoming Wikipedian: transformation of participation in a collaborative online encyclopedia. In Proceedings of the 2005 international ACM SIGGROUP conference on Supporting group work, page 10. ACM.
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
(apologies for cross-posting)
Extended deadline: 18 July 2010
2nd International Workshop on Quality in Techno-Social Systems
at the Fourth IEEE International Conference on Self-Adaptive and Self-Organizing Systems SASO 2010 - September 28, 2010, Budapest, Hungary
Techo-social systems are ICT systems in which many people collectively coordinate and cooperate to achieve their goals without central control. These systems, for example Wikipedia, eBay, Web2.0 sites, social networks and peer-to-peer networks, have both self-organizing and self-adaptive aspects.
In this workshop we aim to put the quality aspect of these complex systems into focus. Quality outcome can be produced by the individual users through selecting, producing and rating certain kinds of content. For example, trust and reputation may emerge among a community and be used to enhance quality. The workshop will consider mechanisms by which individual peers can be brought together automatically into collectives whose members share interests and agree about the evaluation of quality in the domain.
The SASO conference focuses on how to make computer systems operate autonomously in a reliable, efficient and useful way with minimal user or operator intervention. The workshop addresses this very problem narrowing the focus down to techno-social systems. The question we ask is: how can one let a system self-organize to a high quality, desirable state, where users and their behavior form an integral part of the system (i.e., a techno-social system), and where self-organization at the system level has to be aligned with self-organization at the social level.
The workshop is centered around the following technical issues: * incentive mechanisms for self-organizing and self-adaptive systems * evolving social interaction structures for quality * ranking, rating, reputation and recommendation in distributed systems * conflict and consensus detection, correction in distributed systems * realistic models of user behaviour and the dynamic social structures that they create * analysis of empirical and experimental data for quality * computational sociology of online communities * distributed social networks * quality metrics - how to measure in distributed systems
Papers are welcome from the fields of theoretical and algorithmic foundations, algorithm design and simulation, as well as empirical data-sets collection, processing and validation.
Audience
The workshop is inherently interdisciplinary. Relevant areas include: sociology and psychology,in particular, the evolution of cooperation, opinion dynamics, the evolution of norms and trust relationships, etc; physics, in particular complex (social and technical) networks and models of the dynamics of group behavior; computer science, in particular P2P systems, data mining (ranking and recommendation), information retrieval, and distributed systems.
Paper Submission
Authors are invited to submit original unpublished papers that are neither accepted nor submitted elsewhere. All submissions will be carefully peer reviewed by the program committee.
Inclusion in Proceedings
The proceedings of all SASO workshops will be published by IEEE Computer Society Press as a bundle with the main conference proceedings, and made available as a part of the IEEE digital library.
IMPORTANT DATES
Paper submission: July 18, 2010 Acceptance Notification: August 6, 2010 Camera-ready version: August 20, 2010 Early registration deadline: August 13, 2010 Workshop: September 28, 2010
Author Guidelines
Submissions should not exceed 6 pages and formatted according to the IEEE Computer Society Press proceedings style guide and submitted electronically in pdf format. Please register as authors and submit your papers using the conference management system that will be linked at the http://qlectives.eu/qteso website well in advance of the submission deadline.
Organization Comittee
Nigel Gilbert, University of Surrey, UK Mark Jelasity, Hungarian Academy of Science and University of Szeged, Hungary Tamas Vinko, Delft University of Technology, The Netherlands
Program Committee
Fred Amblard, Université Toulouse 1 Capitole, France Nazareno Andrade, Delft University of Technology, The Netherlands Alastair Gill, University of Surrey, UK David Hales, Delft University of Technology, The Netherlands Dirk Helbing, ETH Zurich, Switzerland Sergi Lozano, ETH Zurich, Switzerland Matus Medo, University of Fribourg, Switzerland Andrzej Nowak, University of Warsaw, Poland Johan Pouwelse, Delft University of Technology, The Netherlands Camille Roth, CNRS, France Dario Taraborelli, University of Surrey, UK Yi-Cheng Zhang, University of Fribourg, Switzerland
-- Dario Taraborelli
Research Fellow Centre for Research in Social Simulation University of Surrey, Guildford GU2 7XH United Kingdom http://cress.soc.surrey.ac.uk http://nitens.org/taraborelli
Editor, Review of Philosophy and Psychology http://www.springer.com/13164
I agree on the "neat project". Probably you already know but I write it anyway just in case that the API gives you a way of knowing how many (but not which users) follow a certain page. For example the following link http://toolserver.org/~mzmcbride/watcher/?db=enwiki_p&titles=September_1... will tell you that these pages are followed by X people September 11 attacks 1314 Palestine 360 Israel 1164 Sex 965 Homepage 51 Wikipedia 3303 Main Page 68347 2010 eruptions of Eyjafjallajökull 96
I guess you can assume it as an indication of global interest, of course, unfortunately this does not solve your problem of knowing individual interests ;(
P.
On Thu, Jul 1, 2010 at 7:41 PM, James Howison james@howison.name wrote:
Hi all,
I'm working on a study for which I'd like to know more about editors' watchlisting practices. Of course what I'd really like is to know who had what page on their watchlist when, but I understand the obvious privacy issues there. I assume those issues explain why that information is not (AFAIK) available in dumps etc.
I have read some great qualitative pieces which discuss watchlisting [e.g. 1], which are very helpful (please don't hesitate to suggest others), but haven't seen quantitative data, which our study calls for.
Failing exact data, what do we know about the distribution of practices of watchlisting?
Currently my plan is to assume that anyone who has edited an article in the past 6 months has it on their watchlist. Obviously a very corse assumption. If we had any empirical knowledge about these practices then I could use a distribution (e.g. editors have the page on their watchlist at some % chance, altering depending on their number/tenure of editing that page). I also don't have any way to estimate whether someone who has never edited a page has a page on their watchlist (or assuming that some do, whether there's any useful way to guess which pages they are likely to have on their watchlists).
Grateful for any suggestions or reactions,
Thanks, James Howison
[1]: Bryant, S., Forte, A., and Bruckman, A. (2005). Becoming Wikipedian: transformation of participation in a collaborative online encyclopedia. In Proceedings of the 2005 international ACM SIGGROUP conference on Supporting group work, page 10. ACM.
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Ah hah, thanks Paolo, this I did not know, in fact as I received this I was following up Amy's suggestion of the database telling if it's > 30 which i think was leading to this tool :)
I've posted on that guy's talk page, perhaps this can be done historically, in which case I can build up a (small but random) sample time-series, compare to editing counts and have some vague idea of the ratios/tracking.
Thanks all (but if there are other suggestions or even experiences keep 'em coming ;)
Cheers, James
On Jul 1, 2010, at 15:39, paolo massa wrote:
I agree on the "neat project". Probably you already know but I write it anyway just in case that the API gives you a way of knowing how many (but not which users) follow a certain page. For example the following link http://toolserver.org/~mzmcbride/watcher/?db=enwiki_p&titles=September_1... will tell you that these pages are followed by X people September 11 attacks 1314 Palestine 360 Israel 1164 Sex 965 Homepage 51 Wikipedia 3303 Main Page 68347 2010 eruptions of Eyjafjallajökull 96
I guess you can assume it as an indication of global interest, of course, unfortunately this does not solve your problem of knowing individual interests ;(
P.
On Thu, Jul 1, 2010 at 7:41 PM, James Howison james@howison.name wrote:
Hi all,
I'm working on a study for which I'd like to know more about editors' watchlisting practices. Of course what I'd really like is to know who had what page on their watchlist when, but I understand the obvious privacy issues there. I assume those issues explain why that information is not (AFAIK) available in dumps etc.
I have read some great qualitative pieces which discuss watchlisting [e.g. 1], which are very helpful (please don't hesitate to suggest others), but haven't seen quantitative data, which our study calls for.
Failing exact data, what do we know about the distribution of practices of watchlisting?
Currently my plan is to assume that anyone who has edited an article in the past 6 months has it on their watchlist. Obviously a very corse assumption. If we had any empirical knowledge about these practices then I could use a distribution (e.g. editors have the page on their watchlist at some % chance, altering depending on their number/tenure of editing that page). I also don't have any way to estimate whether someone who has never edited a page has a page on their watchlist (or assuming that some do, whether there's any useful way to guess which pages they are likely to have on their watchlists).
Grateful for any suggestions or reactions,
Thanks, James Howison
[1]: Bryant, S., Forte, A., and Bruckman, A. (2005). Becoming Wikipedian: transformation of participation in a collaborative online encyclopedia. In Proceedings of the 2005 international ACM SIGGROUP conference on Supporting group work, page 10. ACM.
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
--
Paolo Massa Email: paolo AT gnuband DOT org Blog: http://gnuband.org
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
I'm very interested in the topic as well, so if you find anything else, might I ask you to forward it along? WikiThanks!!! ;)
P.
On Thu, Jul 1, 2010 at 9:51 PM, James Howison james@howison.name wrote:
Ah hah, thanks Paolo, this I did not know, in fact as I received this I was following up Amy's suggestion of the database telling if it's > 30 which i think was leading to this tool :)
I've posted on that guy's talk page, perhaps this can be done historically, in which case I can build up a (small but random) sample time-series, compare to editing counts and have some vague idea of the ratios/tracking.
Thanks all (but if there are other suggestions or even experiences keep 'em coming ;)
Cheers, James
On Jul 1, 2010, at 15:39, paolo massa wrote:
I agree on the "neat project". Probably you already know but I write it anyway just in case that the API gives you a way of knowing how many (but not which users) follow a certain page. For example the following link http://toolserver.org/~mzmcbride/watcher/?db=enwiki_p&titles=September_1... will tell you that these pages are followed by X people September 11 attacks 1314 Palestine 360 Israel 1164 Sex 965 Homepage 51 Wikipedia 3303 Main Page 68347 2010 eruptions of Eyjafjallajökull 96
I guess you can assume it as an indication of global interest, of course, unfortunately this does not solve your problem of knowing individual interests ;(
P.
On Thu, Jul 1, 2010 at 7:41 PM, James Howison james@howison.name wrote:
Hi all,
I'm working on a study for which I'd like to know more about editors' watchlisting practices. Of course what I'd really like is to know who had what page on their watchlist when, but I understand the obvious privacy issues there. I assume those issues explain why that information is not (AFAIK) available in dumps etc.
I have read some great qualitative pieces which discuss watchlisting [e.g. 1], which are very helpful (please don't hesitate to suggest others), but haven't seen quantitative data, which our study calls for.
Failing exact data, what do we know about the distribution of practices of watchlisting?
Currently my plan is to assume that anyone who has edited an article in the past 6 months has it on their watchlist. Obviously a very corse assumption. If we had any empirical knowledge about these practices then I could use a distribution (e.g. editors have the page on their watchlist at some % chance, altering depending on their number/tenure of editing that page). I also don't have any way to estimate whether someone who has never edited a page has a page on their watchlist (or assuming that some do, whether there's any useful way to guess which pages they are likely to have on their watchlists).
Grateful for any suggestions or reactions,
Thanks, James Howison
[1]: Bryant, S., Forte, A., and Bruckman, A. (2005). Becoming Wikipedian: transformation of participation in a collaborative online encyclopedia. In Proceedings of the 2005 international ACM SIGGROUP conference on Supporting group work, page 10. ACM.
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
--
Paolo Massa Email: paolo AT gnuband DOT org Blog: http://gnuband.org
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
James Howison wrote:
Currently my plan is to assume that anyone who has edited an article in the past 6 months has it on their watchlist. Obviously a very corse assumption.
I don't think that most people add most articles they edit to their watchlist. I don't (my watchlist has about 3k and I've edited ~30k distinct articles, IIRC).
I'd assume that people who watchlist the article are, usually: * their creator * editors who contributed to it extensively * editors who discussed it on talk * editors who have made 1+ revert on it That said, I'll stress again that neither of those is certain.
Watchlist behavior is complex confidential behavior. It is probably impossible to obtain a representative sample. What could be done is investigate patterns of watchlist, and article maintenance, behavior.
Fred
James Howison wrote:
Currently my plan is to assume that anyone who has edited an article in the past 6 months has it on their watchlist. Obviously a very corse assumption.
I don't think that most people add most articles they edit to their watchlist. I don't (my watchlist has about 3k and I've edited ~30k distinct articles, IIRC).
I'd assume that people who watchlist the article are, usually:
- their creator
- editors who contributed to it extensively
- editors who discussed it on talk
- editors who have made 1+ revert on it
That said, I'll stress again that neither of those is certain.
-- Piotr Konieczny |||____ __ __ I=I====__|_|I__I=> "Lennier, get us the hell out of here." ||| "Initiating 'getting the hell out of here' maneuver." -- Ivanova and Lennier in Babylon 5:"The Hour of the Wolf"
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
I have always been willing to share my watchlist, even publicly, and I think it should be provided as an option. I can see two reasons people might not want to: not to show what all of their interests are, and not to show who on Wikipedia they may be tracking. As for me, I do such a variety (in both) and for multiple reasons, that I see no reason not to share it & I think that many of the more active users might think similarly. However, it has now grown so large that I almost never look at it.
On Fri, Jul 2, 2010 at 8:05 AM, Fred Bauder fredbaud@fairpoint.net wrote:
Watchlist behavior is complex confidential behavior. It is probably impossible to obtain a representative sample. What could be done is investigate patterns of watchlist, and article maintenance, behavior.
Fred
James Howison wrote:
Currently my plan is to assume that anyone who has edited an article in the past 6 months has it on their watchlist. Obviously a very corse assumption.
I don't think that most people add most articles they edit to their watchlist. I don't (my watchlist has about 3k and I've edited ~30k distinct articles, IIRC).
I'd assume that people who watchlist the article are, usually:
- their creator
- editors who contributed to it extensively
- editors who discussed it on talk
- editors who have made 1+ revert on it
That said, I'll stress again that neither of those is certain.
-- Piotr Konieczny |||____ __ __ I=I====__|_|I__I=> "Lennier, get us the hell out of here." ||| "Initiating 'getting the hell out of here' maneuver." -- Ivanova and Lennier in Babylon 5:"The Hour of the Wolf"
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
On 7/2/10, James Howison james@howison.name wrote:
Hi all,
I'm working on a study for which I'd like to know more about editors' watchlisting practices. Of course what I'd really like is to know who had what page on their watchlist when, but I understand the obvious privacy issues there. I assume those issues explain why that information is not (AFAIK) available in dumps etc.
I have read some great qualitative pieces which discuss watchlisting [e.g. 1], which are very helpful (please don't hesitate to suggest others), but haven't seen quantitative data, which our study calls for.
Failing exact data, what do we know about the distribution of practices of watchlisting?
Currently my plan is to assume that anyone who has edited an article in the past 6 months has it on their watchlist. Obviously a very corse assumption.
A better assumption is that a page is on user A's watchlist if they edit the page within 10 mins of another user editing the page.
Also worth considering is the public watchlists which are created using the "related changes" feature. e.g. I have a separate watchlist for pages I create, as this is publicly information anyway:
https://secure.wikimedia.org/wikipedia/en/wiki/Special:RecentChangesLinked/U...
wrt to the watchlist, it is only possible to know which pages are on a watchlist as of _now_, so the data would need to be snapshotted periodically in order to analyse how an individual manages their watchlist, etc. I would love to know when I added a page to my watchlist, but the schema doesn't record this information.
http://www.mediawiki.org/wiki/Manual:Watchlist_table
There are quite a few watchlist related bugs, which may also give you some useful information about how users want to use their watchlist, and hints into how they are currently using it. ;-)
https://bugzilla.wikimedia.org/buglist.cgi?quicksearch=watchlist
-- John Vandenberg
Thanks to all for a great discussion on this. I appreciated all the suggestions, including people discussing their own practices. Fantastic. I just never would have thought of things like checking the bug tracker to learn about practices :)
Just wanted to update you on discussions I had with the author of the watchers tool on toolserver, MZMcBride. I include the full thread below (with his permission), but the short story is that even Wikipedia doesn't collect data on when people add a page and remove a page from their watchlists, they only maintain who has what pages on their lists at any one time (data which they don't release). So the only way to build up a true empirical picture of this would be lots of rapid snapshots of the watchlist table. It's an interesting example of the challenges in doing research on systems designed to run a busy website as opposed to collect research data!
OTOH if you are just interested in watcher counts below the minimum of 30 watchers, then it seems there is scope for that data to be released to researchers.
Most likely I will run a sensitivity analysis looking at the robustness of our results to different assumptions.
Thanks again, James
On Jul 5, 2010, at 20:40, MZMcBride wrote:
Jameshowison wrote:
I'm working on a study for which I'd like to know more about editors' watchlisting practices.
Apologies for the delayed response. The study certainly sounds interesting.
Thanks.
Currently my plan was to assume that anyone who has edited an article in the past 6 months has it on their watchlist (or is otherwise informed about new edits). Obviously a very corse assumption.
Very coarse, for a variety of reasons. The biggest reason, I imagine, is that unregistered users make up a decent-sized percentage of contributions, and unregistered users cannot have watchlists. A lot of people also use automated or semi-automated tools to do "batch editing" (vandalism reversion, code fixes, etc.) in which they'll edit a page, but never return to it or have any interest in watching it.
Yes, that's helpful. Will probably run the analysis without the unregistered users. Ah, the tradeoffs of research.
What would be fantastic would be to know the empirical distribution (e.g. editors put the page they edit on their watchlist at some % chance, perhaps changing based on their number/tenure of edits that page, especially if they had never edited the page at that point).
The watchlist table is made up of a few columns.[1] The two columns that are exposed to Toolserver users are wl_namespace and wl_title. In the public Wikimedia dumps[2] the watchlist table is excluded entirely.
If you are able to share more specific information, I'm happy to do the required computations, only ever releasing aggregate information. What were you able to share with the other researcher?
The other researcher wanted aggregate information about all page titles in all namespaces and their corresponding categories. The public "watcher" tool[3] has an imposed restriction to only show the number of watchers if the number is 30 or greater. The data I released to her had this restriction lifted.
Right, got it. So this would give a count of watchers for each page, at a single time. Never the actual watchers.
FWIW I've been using the pages associated with WikiProject Oregon as my prototype. A list of dated watch/unwatch events for those pages that I could link to revisions (edits) would be ideal; not sure if that's the format of the data or if it can be changed to such a format (seems to require the full history of the watchlist table?).
When a page is added to a watchlist, or whose watchlist it was added to, is not stored or not available, respectively.
Yeah, that makes sense. So the only way to get what I was interested in would be to track the list of editors and their watched pages (which is collected but not available) frequently, such lists would let you approximate when they added and removed this. I do understand the reasons why this data is not available, so I'm not angling for that.
Perhaps relatively frequent snapshots of watcher numbers (not thresholded at 30) would provide some data, although not the type that could be correlated with editing actions.
Most likely I will simply vary the length of my coarse assumption and see if the results are sensitive to that. It's not ideal, but at least I'm sure I haven't missed something obvious.
ps. the wl_notificationtimestamp is intriguing, I checked the manual, seems to only hold the last notification, not a history of notification, right?
This is explained at the manual page for the watchlist table.[1] Essentially this column is only used on wikis where e-mail notifications (ENotif) is enabled. Sites like the English Wikipedia do not have this feature enabled due to performance reasons.
Thanks a million for your help.
On Jul 6, 2010, at 17:58, MZMcBride wrote:
James Howison wrote:
There was some interest in this question on the wiki-research-l@lists.wikimedia.org list; would you mind if I quoted parts of your email in my summary to that list? Happy to run a draft by you to ensure I'm not misrepresenting or otherwise stepping on toes.
Please feel free to pass along anything you found helpful. No need to run it by me first.
MZMcBride
Not that this is a solution to the general problem, but perhaps we might consider lowering the level at which we display counts to 20, without invading privacy.
On Fri, Jul 9, 2010 at 1:28 PM, James Howison james@howison.name wrote:
Thanks to all for a great discussion on this. I appreciated all the suggestions, including people discussing their own practices. Fantastic. I just never would have thought of things like checking the bug tracker to learn about practices :)
Just wanted to update you on discussions I had with the author of the watchers tool on toolserver, MZMcBride. I include the full thread below (with his permission), but the short story is that even Wikipedia doesn't collect data on when people add a page and remove a page from their watchlists, they only maintain who has what pages on their lists at any one time (data which they don't release). So the only way to build up a true empirical picture of this would be lots of rapid snapshots of the watchlist table. It's an interesting example of the challenges in doing research on systems designed to run a busy website as opposed to collect research data!
OTOH if you are just interested in watcher counts below the minimum of 30 watchers, then it seems there is scope for that data to be released to researchers.
Most likely I will run a sensitivity analysis looking at the robustness of our results to different assumptions.
Thanks again, James
On Jul 5, 2010, at 20:40, MZMcBride wrote:
Jameshowison wrote:
I'm working on a study for which I'd like to know more about editors' watchlisting practices.
Apologies for the delayed response. The study certainly sounds interesting.
Thanks.
Currently my plan was to assume that anyone who has edited an article in the past 6 months has it on their watchlist (or is otherwise informed about new edits). Obviously a very corse assumption.
Very coarse, for a variety of reasons. The biggest reason, I imagine, is that unregistered users make up a decent-sized percentage of contributions, and unregistered users cannot have watchlists. A lot of people also use automated or semi-automated tools to do "batch editing" (vandalism reversion, code fixes, etc.) in which they'll edit a page, but never return to it or have any interest in watching it.
Yes, that's helpful. Will probably run the analysis without the unregistered users. Ah, the tradeoffs of research.
What would be fantastic would be to know the empirical distribution (e.g. editors put the page they edit on their watchlist at some % chance, perhaps changing based on their number/tenure of edits that page, especially if they had never edited the page at that point).
The watchlist table is made up of a few columns.[1] The two columns that are exposed to Toolserver users are wl_namespace and wl_title. In the public Wikimedia dumps[2] the watchlist table is excluded entirely.
If you are able to share more specific information, I'm happy to do the required computations, only ever releasing aggregate information. What were you able to share with the other researcher?
The other researcher wanted aggregate information about all page titles in all namespaces and their corresponding categories. The public "watcher" tool[3] has an imposed restriction to only show the number of watchers if the number is 30 or greater. The data I released to her had this restriction lifted.
Right, got it. So this would give a count of watchers for each page, at a single time. Never the actual watchers.
FWIW I've been using the pages associated with WikiProject Oregon as my prototype. A list of dated watch/unwatch events for those pages that I could link to revisions (edits) would be ideal; not sure if that's the format of the data or if it can be changed to such a format (seems to require the full history of the watchlist table?).
When a page is added to a watchlist, or whose watchlist it was added to, is not stored or not available, respectively.
Yeah, that makes sense. So the only way to get what I was interested in would be to track the list of editors and their watched pages (which is collected but not available) frequently, such lists would let you approximate when they added and removed this. I do understand the reasons why this data is not available, so I'm not angling for that.
Perhaps relatively frequent snapshots of watcher numbers (not thresholded at 30) would provide some data, although not the type that could be correlated with editing actions.
Most likely I will simply vary the length of my coarse assumption and see if the results are sensitive to that. It's not ideal, but at least I'm sure I haven't missed something obvious.
ps. the wl_notificationtimestamp is intriguing, I checked the manual, seems to only hold the last notification, not a history of notification, right?
This is explained at the manual page for the watchlist table.[1] Essentially this column is only used on wikis where e-mail notifications (ENotif) is enabled. Sites like the English Wikipedia do not have this feature enabled due to performance reasons.
Thanks a million for your help.
On Jul 6, 2010, at 17:58, MZMcBride wrote:
James Howison wrote:
There was some interest in this question on the wiki-research-l@lists.wikimedia.org list; would you mind if I quoted parts of your email in my summary to that list? Happy to run a draft by you to ensure I'm not misrepresenting or otherwise stepping on toes.
Please feel free to pass along anything you found helpful. No need to run it by me first.
MZMcBride
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
wiki-research-l@lists.wikimedia.org