Hello Research,
It it possible to query for the watchers of a page? It does not seem to be in the API, nor is the "watchers" or "wl_user" table in the Data Base replicas (where I thought MediaWiki stores it. I imagine this is for privacy reasons, correct? If so, how would one gain access?
I have been talking with an "econophysicist" who thinks that we could apply a "contagion" algorithm, to see which edits are "contagious". (I met this econopyhicist at the Berkeley Data Science Faire at which Wikimedia Analytics presented, so it was worth it in the end).
Maximilian Klein Wikipedian in Residence, OCLC +17074787023
No, you can't for reasons on privacy. See:
https://en.wikipedia.org/wiki/Help:Watching_pages#Privacy
But, I concur with your theory that edits are contagious. I often find that when I get the notification that a watched page has changed, I go and look at the page. While I am there, I often spot a "little thing that needs doing", which sometimes is just a simple single edit and other times initiates a marathon of editing activity for the next couple of days :-)
If you want to test this theory, I think using at the set of editors of the page might be a pretty good approximation of the watchlist. A lot of people have the "add the pages and files I edit to my watchlist" set in their preferences (I know I do).
For the purpose of declaring one edit as being contagious (that is, causes another edit), what criteria would you use? I would assume you need some time bounds here. I think there needs to be "kick-off" edits identified. These would be edits that occurred sufficiently long after the previous edit that contagion could not be factor. Then after the kick-off edit, you would be looking for one or more "reaction" edits that occurred fairly quickly after one another, suggesting a contagion based on watchlists. So it seems there are two time parameters: the kick-off threshold and the reaction threshold. I don't think these are necessarily the same value (i.e. is there is some grey zone in-between where the edits can be categorised as neither kick-off nor reaction?).
In terms of setting these threshold(s), you might need some real-life data to train on. So maybe you could start by asking if some editors would send you a copy of their watchlist and you could write a script that compared it with their edit history over the same time frame (plus a bit to cater for bursty-ness). From that you could come up with a set of edits that look like contagious ones and you could ask the editors to say "yes / no / don't remember" to try to see if 1) contagion appears to be happening 2) what the time thresholds need to be. Then test it on a bigger set of data using edit history as a proxy for watchlists.
Kerry
_____
From: wiki-research-l-bounces@lists.wikimedia.org [mailto:wiki-research-l-bounces@lists.wikimedia.org] On Behalf Of Klein,Max Sent: Tuesday, 31 December 2013 2:26 PM To: wiki-research-l@lists.wikimedia.org Subject: [Wiki-research-l] Polling the watcher's of a page. Possible?
Hello Research,
It it possible to query for the watchers of a page? It does not seem to be in the API, nor is the "watchers" or "wl_user" table in the Data Base replicas (where I thought MediaWiki stores it. I imagine this is for privacy reasons, correct? If so, how would one gain access?
I have been talking with an "econophysicist" who thinks that we could apply a "contagion" algorithm, to see which edits are "contagious". (I met this econopyhicist at the Berkeley Data Science Faire at which Wikimedia Analytics presented, so it was worth it in the end).
Maximilian Klein Wikipedian in Residence, OCLC +17074787023
Check out Michael Kummer's paper that looks at a similar topic ("contagion" in pageviews among linked articles) from an econometrics perspective: "Spillovers in Networks of User Generated Content – Evidence from 23 Natural Experiments on Wikipedia"
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2356199
On Mon, Dec 30, 2013 at 9:42 PM, Kerry Raymond kerry.raymond@gmail.comwrote:
No, you can’t for reasons on privacy. See:
https://en.wikipedia.org/wiki/Help:Watching_pages#Privacy
But, I concur with your theory that edits are contagious. I often find that when I get the notification that a watched page has changed, I go and look at the page. While I am there, I often spot a “little thing that needs doing”, which sometimes is just a simple single edit and other times initiates a marathon of editing activity for the next couple of days J
If you want to test this theory, I think using at the set of editors of the page might be a pretty good approximation of the watchlist. A lot of people have the “add the pages and files I edit to my watchlist” set in their preferences (I know I do).
For the purpose of declaring one edit as being contagious (that is, causes another edit), what criteria would you use? I would assume you need some time bounds here. I think there needs to be “kick-off” edits identified. These would be edits that occurred sufficiently long after the previous edit that contagion could not be factor. Then after the kick-off edit, you would be looking for one or more “reaction” edits that occurred fairly quickly after one another, suggesting a contagion based on watchlists. So it seems there are two time parameters: the kick-off threshold and the reaction threshold. I don’t think these are necessarily the same value (i.e. is there is some grey zone in-between where the edits can be categorised as neither kick-off nor reaction?).
In terms of setting these threshold(s), you might need some real-life data to train on. So maybe you could start by asking if some editors would send you a copy of their watchlist and you could write a script that compared it with their edit history over the same time frame (plus a bit to cater for bursty-ness). From that you could come up with a set of edits that look like contagious ones and you could ask the editors to say “yes / no / don’t remember” to try to see if 1) contagion appears to be happening 2) what the time thresholds need to be. Then test it on a bigger set of data using edit history as a proxy for watchlists.
Kerry
*From:* wiki-research-l-bounces@lists.wikimedia.org [mailto: wiki-research-l-bounces@lists.wikimedia.org] *On Behalf Of *Klein,Max *Sent:* Tuesday, 31 December 2013 2:26 PM *To:* wiki-research-l@lists.wikimedia.org *Subject:* [Wiki-research-l] Polling the watcher's of a page. Possible?
Hello Research,
It it possible to query for the watchers of a page? It does not seem to be in the API, nor is the "watchers" or "wl_user" table in the Data Base replicas (where I thought MediaWiki stores it. I imagine this is for privacy reasons, correct? If so, how would one gain access?
I have been talking with an "econophysicist" who thinks that we could apply a "contagion" algorithm, to see which edits are "contagious". (I met this econopyhicist at the Berkeley Data Science Faire at which Wikimedia Analytics presented, so it was worth it in the end).
Maximilian Klein Wikipedian in Residence, OCLC +17074787023
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
How many watchlisters a page has is a sensitive issue, we've already had one incident where a "researcher" acquired a list of unwatched pages for a vandalism experiment.
However anyone who watches a page will also have that pages talkpage on their watchlist, so while you can't directly contact everyone who has that page on their watchlist you could conceivably attract the attention of some of them by a message on its talkpage. But if you were doing more than one or two of them you would need your note to be very relevant to the watchlisters of that page.
Regards
Jonathan
On 31 December 2013 10:36, Brian Keegan b.keegan@neu.edu wrote:
Check out Michael Kummer's paper that looks at a similar topic ("contagion" in pageviews among linked articles) from an econometrics perspective: "Spillovers in Networks of User Generated Content – Evidence from 23 Natural Experiments on Wikipedia"
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2356199
On Mon, Dec 30, 2013 at 9:42 PM, Kerry Raymond kerry.raymond@gmail.comwrote:
No, you can’t for reasons on privacy. See:
https://en.wikipedia.org/wiki/Help:Watching_pages#Privacy
But, I concur with your theory that edits are contagious. I often find that when I get the notification that a watched page has changed, I go and look at the page. While I am there, I often spot a “little thing that needs doing”, which sometimes is just a simple single edit and other times initiates a marathon of editing activity for the next couple of days J
If you want to test this theory, I think using at the set of editors of the page might be a pretty good approximation of the watchlist. A lot of people have the “add the pages and files I edit to my watchlist” set in their preferences (I know I do).
For the purpose of declaring one edit as being contagious (that is, causes another edit), what criteria would you use? I would assume you need some time bounds here. I think there needs to be “kick-off” edits identified. These would be edits that occurred sufficiently long after the previous edit that contagion could not be factor. Then after the kick-off edit, you would be looking for one or more “reaction” edits that occurred fairly quickly after one another, suggesting a contagion based on watchlists. So it seems there are two time parameters: the kick-off threshold and the reaction threshold. I don’t think these are necessarily the same value (i.e. is there is some grey zone in-between where the edits can be categorised as neither kick-off nor reaction?).
In terms of setting these threshold(s), you might need some real-life data to train on. So maybe you could start by asking if some editors would send you a copy of their watchlist and you could write a script that compared it with their edit history over the same time frame (plus a bit to cater for bursty-ness). From that you could come up with a set of edits that look like contagious ones and you could ask the editors to say “yes / no / don’t remember” to try to see if 1) contagion appears to be happening 2) what the time thresholds need to be. Then test it on a bigger set of data using edit history as a proxy for watchlists.
Kerry
*From:* wiki-research-l-bounces@lists.wikimedia.org [mailto: wiki-research-l-bounces@lists.wikimedia.org] *On Behalf Of *Klein,Max *Sent:* Tuesday, 31 December 2013 2:26 PM *To:* wiki-research-l@lists.wikimedia.org *Subject:* [Wiki-research-l] Polling the watcher's of a page. Possible?
Hello Research,
It it possible to query for the watchers of a page? It does not seem to be in the API, nor is the "watchers" or "wl_user" table in the Data Base replicas (where I thought MediaWiki stores it. I imagine this is for privacy reasons, correct? If so, how would one gain access?
I have been talking with an "econophysicist" who thinks that we could apply a "contagion" algorithm, to see which edits are "contagious". (I met this econopyhicist at the Berkeley Data Science Faire at which Wikimedia Analytics presented, so it was worth it in the end).
Maximilian Klein Wikipedian in Residence, OCLC +17074787023
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
-- Brian C. Keegan, Ph.D. Post-Doctoral Research Fellow, Lazer Lab College of Social Sciences and Humanities, Northeastern University Fellow, Institute for Quantitative Social Sciences, Harvard University Affiliate, Berkman Center for Internet & Society, Harvard Law School
b.keegan@neu.edu www.brianckeegan.com M: 617.803.6971 O: 617.373.7200 Skype: bckeegan
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Jonathan,
So is that it then. Is foundation feeling too burned to ever give out the data again? Has there been other precedent since then of releasing data to academics?
Kerry,
Thanks for the link to the paper. I just saw this in the latest newsletter.
Brian,
The idea of sending a script to follow other editors and then survey them would be a good way to train a learning algorithm. I hadn't thought of that, mostly I expected to just pour over some old edits. Thanks for the idea.
Maximilian Klein Wikipedian in Residence, OCLC +17074787023
________________________________ From: wiki-research-l-bounces@lists.wikimedia.org wiki-research-l-bounces@lists.wikimedia.org on behalf of WereSpielChequers werespielchequers@gmail.com Sent: Tuesday, December 31, 2013 4:31 AM To: Research into Wikimedia content and communities Subject: Re: [Wiki-research-l] Polling the watcher's of a page. Possible?
How many watchlisters a page has is a sensitive issue, we've already had one incident where a "researcher" acquired a list of unwatched pages for a vandalism experiment.
However anyone who watches a page will also have that pages talkpage on their watchlist, so while you can't directly contact everyone who has that page on their watchlist you could conceivably attract the attention of some of them by a message on its talkpage. But if you were doing more than one or two of them you would need your note to be very relevant to the watchlisters of that page.
Regards
Jonathan
On 31 December 2013 10:36, Brian Keegan <b.keegan@neu.edumailto:b.keegan@neu.edu> wrote: Check out Michael Kummer's paper that looks at a similar topic ("contagion" in pageviews among linked articles) from an econometrics perspective: "Spillovers in Networks of User Generated Content – Evidence from 23 Natural Experiments on Wikipedia"
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2356199
On Mon, Dec 30, 2013 at 9:42 PM, Kerry Raymond <kerry.raymond@gmail.commailto:kerry.raymond@gmail.com> wrote: No, you can’t for reasons on privacy. See:
https://en.wikipedia.org/wiki/Help:Watching_pages#Privacy
But, I concur with your theory that edits are contagious. I often find that when I get the notification that a watched page has changed, I go and look at the page. While I am there, I often spot a “little thing that needs doing”, which sometimes is just a simple single edit and other times initiates a marathon of editing activity for the next couple of days :)
If you want to test this theory, I think using at the set of editors of the page might be a pretty good approximation of the watchlist. A lot of people have the “add the pages and files I edit to my watchlist” set in their preferences (I know I do).
For the purpose of declaring one edit as being contagious (that is, causes another edit), what criteria would you use? I would assume you need some time bounds here. I think there needs to be “kick-off” edits identified. These would be edits that occurred sufficiently long after the previous edit that contagion could not be factor. Then after the kick-off edit, you would be looking for one or more “reaction” edits that occurred fairly quickly after one another, suggesting a contagion based on watchlists. So it seems there are two time parameters: the kick-off threshold and the reaction threshold. I don’t think these are necessarily the same value (i.e. is there is some grey zone in-between where the edits can be categorised as neither kick-off nor reaction?).
In terms of setting these threshold(s), you might need some real-life data to train on. So maybe you could start by asking if some editors would send you a copy of their watchlist and you could write a script that compared it with their edit history over the same time frame (plus a bit to cater for bursty-ness). From that you could come up with a set of edits that look like contagious ones and you could ask the editors to say “yes / no / don’t remember” to try to see if 1) contagion appears to be happening 2) what the time thresholds need to be. Then test it on a bigger set of data using edit history as a proxy for watchlists.
Kerry
________________________________ From: wiki-research-l-bounces@lists.wikimedia.orgmailto:wiki-research-l-bounces@lists.wikimedia.org [mailto:wiki-research-l-bounces@lists.wikimedia.orgmailto:wiki-research-l-bounces@lists.wikimedia.org] On Behalf Of Klein,Max Sent: Tuesday, 31 December 2013 2:26 PM To: wiki-research-l@lists.wikimedia.orgmailto:wiki-research-l@lists.wikimedia.org Subject: [Wiki-research-l] Polling the watcher's of a page. Possible?
Hello Research,
It it possible to query for the watchers of a page? It does not seem to be in the API, nor is the "watchers" or "wl_user" table in the Data Base replicas (where I thought MediaWiki stores it. I imagine this is for privacy reasons, correct? If so, how would one gain access?
I have been talking with an "econophysicist" who thinks that we could apply a "contagion" algorithm, to see which edits are "contagious". (I met this econopyhicist at the Berkeley Data Science Faire at which Wikimedia Analytics presented, so it was worth it in the end).
Maximilian Klein Wikipedian in Residence, OCLC +17074787023tel:%2B17074787023
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.orgmailto:Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
-- Brian C. Keegan, Ph.D. Post-Doctoral Research Fellow, Lazer Lab College of Social Sciences and Humanities, Northeastern University Fellow, Institute for Quantitative Social Sciences, Harvard University Affiliate, Berkman Center for Internet & Society, Harvard Law School
b.keegan@neu.edumailto:b.keegan@neu.edu www.brianckeegan.comhttp://www.brianckeegan.com M: 617.803.6971tel:617.803.6971 O: 617.373.7200tel:617.373.7200 Skype: bckeegan
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.orgmailto:Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Max,
I wouldn't know if the Foundation was even aware of the incident, they weren't the source of the data. But it was rather high profile in the community.
I expect there have been other issues of data being extracted for researchers, but watchlist data is for some people a sensitive issue, hence my alternative suggestion. If you want to go forward with this I'd suggest either finding a better way to look at how groups of editors focus on the same articles, or doing something with anonymised watchlist data - you might get that from the WMF or indeed by posting your credentials as a researcher and inviting contributors to email you their watchlists for some research that you will anonymise.
I think it would be interesting to see some research on how closely an editors watchlist reflects their editing, and how large a watchlist gets before it becomes so big that an editor no longer stays on top of it. But you'd also need to ask a few questions such as "under what circumstances do you take a page off your watchlist"
On 1 January 2014 05:44, Klein,Max kleinm@oclc.org wrote:
Jonathan,
So is that it then. Is foundation feeling too burned to ever give out the data again? Has there been other precedent since then of releasing data to academics?
Kerry,
Thanks for the link to the paper. I just saw this in the latest newsletter.
Brian, The idea of sending a script to follow other editors and then survey them would be a good way to train a learning algorithm. I hadn't thought of that, mostly I expected to just pour over some old edits. Thanks for the idea.
Maximilian Klein Wikipedian in Residence, OCLC +17074787023
*From:* wiki-research-l-bounces@lists.wikimedia.org < wiki-research-l-bounces@lists.wikimedia.org> on behalf of WereSpielChequers werespielchequers@gmail.com *Sent:* Tuesday, December 31, 2013 4:31 AM *To:* Research into Wikimedia content and communities *Subject:* Re: [Wiki-research-l] Polling the watcher's of a page. Possible?
How many watchlisters a page has is a sensitive issue, we've already had one incident where a "researcher" acquired a list of unwatched pages for a vandalism experiment.
However anyone who watches a page will also have that pages talkpage on their watchlist, so while you can't directly contact everyone who has that page on their watchlist you could conceivably attract the attention of some of them by a message on its talkpage. But if you were doing more than one or two of them you would need your note to be very relevant to the watchlisters of that page.
Regards
Jonathan
On 31 December 2013 10:36, Brian Keegan b.keegan@neu.edu wrote:
Check out Michael Kummer's paper that looks at a similar topic ("contagion" in pageviews among linked articles) from an econometrics perspective: "Spillovers in Networks of User Generated Content – Evidence from 23 Natural Experiments on Wikipedia"
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2356199
On Mon, Dec 30, 2013 at 9:42 PM, Kerry Raymond kerry.raymond@gmail.comwrote:
No, you can’t for reasons on privacy. See:
https://en.wikipedia.org/wiki/Help:Watching_pages#Privacy
But, I concur with your theory that edits are contagious. I often find that when I get the notification that a watched page has changed, I go and look at the page. While I am there, I often spot a “little thing that needs doing”, which sometimes is just a simple single edit and other times initiates a marathon of editing activity for the next couple of days J
If you want to test this theory, I think using at the set of editors of the page might be a pretty good approximation of the watchlist. A lot of people have the “add the pages and files I edit to my watchlist” set in their preferences (I know I do).
For the purpose of declaring one edit as being contagious (that is, causes another edit), what criteria would you use? I would assume you need some time bounds here. I think there needs to be “kick-off” edits identified. These would be edits that occurred sufficiently long after the previous edit that contagion could not be factor. Then after the kick-off edit, you would be looking for one or more “reaction” edits that occurred fairly quickly after one another, suggesting a contagion based on watchlists. So it seems there are two time parameters: the kick-off threshold and the reaction threshold. I don’t think these are necessarily the same value (i.e. is there is some grey zone in-between where the edits can be categorised as neither kick-off nor reaction?).
In terms of setting these threshold(s), you might need some real-life data to train on. So maybe you could start by asking if some editors would send you a copy of their watchlist and you could write a script that compared it with their edit history over the same time frame (plus a bit to cater for bursty-ness). From that you could come up with a set of edits that look like contagious ones and you could ask the editors to say “yes / no / don’t remember” to try to see if 1) contagion appears to be happening 2) what the time thresholds need to be. Then test it on a bigger set of data using edit history as a proxy for watchlists.
Kerry
*From:* wiki-research-l-bounces@lists.wikimedia.org [mailto: wiki-research-l-bounces@lists.wikimedia.org] *On Behalf Of *Klein,Max *Sent:* Tuesday, 31 December 2013 2:26 PM *To:* wiki-research-l@lists.wikimedia.org *Subject:* [Wiki-research-l] Polling the watcher's of a page. Possible?
Hello Research,
It it possible to query for the watchers of a page? It does not seem to be in the API, nor is the "watchers" or "wl_user" table in the Data Base replicas (where I thought MediaWiki stores it. I imagine this is for privacy reasons, correct? If so, how would one gain access?
I have been talking with an "econophysicist" who thinks that we could apply a "contagion" algorithm, to see which edits are "contagious". (I met this econopyhicist at the Berkeley Data Science Faire at which Wikimedia Analytics presented, so it was worth it in the end).
Maximilian Klein Wikipedian in Residence, OCLC +17074787023
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
-- Brian C. Keegan, Ph.D. Post-Doctoral Research Fellow, Lazer Lab College of Social Sciences and Humanities, Northeastern University Fellow, Institute for Quantitative Social Sciences, Harvard University Affiliate, Berkman Center for Internet & Society, Harvard Law School
b.keegan@neu.edu www.brianckeegan.com M: 617.803.6971 O: 617.373.7200 Skype: bckeegan
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Why does it matter who has the page on watchlist? One can also get the "contagion" looking at recent changes or stumbling upon the change while reading the page, or while looking at a subsequent diff e.g. by a vandal. Also, unless you plan to try a contagion yourself (not a good idea), you'll be looking at past data, but we don't have the watchlists for the past. Moreover, you'd need to find a way to establish a causation connection between two similar edits in any case, or they could just both caused by a coordinated effort like a policy change or a cleanup sprint or a collaboration between two users. Speaking of which, an edit (especially if repeated on multiple pages) can also cause "retortion" edits in opposite direction on other articles, or policy changes in either direction, so the "contagion" may not be caused directly but rather by causing a coordinated effort, spreading either the virus or its antivirus.
Nemo
wiki-research-l@lists.wikimedia.org