Hello Everyone,
I am the first author of the paper that Denny has referred. Firstly, I want to thank Denny for asking me to join this list and know more about this discussion.
1. Regarding quality, we know that there are issues, and even in the conference, I have repeatedly told the audience that I am not satisfied with the quality of the content generated. However, the percentage of articles that were not removed when the paper was submitted was minimal. I have sent Denny a list of accounts that were used and it might have been possible that several articles created have been removed from those accounts within the last couple of months. I was not aware of the multiple account policy.
2. The area of Wikipedia article generation have been explored by others in the past. [http://www.aclweb.org/anthology/P09-1024, http://wwwconference.org/proceedings/www2011/companion/p161.pdf] We were not aware of any rules regarding these sort of experiments. However, we do understand that such experiments can harm the general quality of this great encyclopedic resource, hence we did out analysis on bare minimum articles. In fact, we did our initial work on it back in 2014, and Wikimedia research even covered details about our paper here -- https://blog.wikimedia.org/2015/02/02/wikimedia-research-newsletter-january-...
If questions were raised at that point, we would surely not have done anything further on this, or rather do things offline without creating or adding any content on Wikipedia.
I understand your point about imposing rules and I think it makes sense. However, during this research, we were not aware of any rules, hence continued our work. As I have told Denny, our purpose was to check whether we could create bare minimal articles which could be eventually improved by authors on Wikipedia, and also to see if they are totally removed. But, it was done with a few articles and we did not create anything beyond that point. Also, we did not do any manual modifications to the articles although we saw quality issues because it would void our analysis and claims.
Thanks everyone for your time and the great work you are doing for the Wikipedia community.
Regards, Sidd
I have proposed https://en.wikipedia.org/wiki/Mazaua for deletion - I assume it was one of the others involved.
Our newpage patrollers are pretty experienced at tagging for deletion the pages of spam and clearly non notable articles that get created by the hundred every day. If someone was to waste everyone's time by creating a bunch of articles that look like press releases from over enthusiastic marketing departments, and appeals for a drummer in time for the first rehearsal of the next big thing on the Bournemouth grunge scene then I've no doubt they would be deleted pdq. Easier still watch a hundred articles at the start of the NPP process, predict how they'd fare and then test your prediction against the result.
If you successfully produce a bunch of flawed articles that look like the sort of articles we accept from goodfaith newbies with idiosyncratic English, then that doesn't tell us anything about our ability to filter out the sort of stuff that we need to delete, but it could mean that patrollers will be less tolerant of what appears to be someone with limited English writing an article about an island that probably merits an article. https://en.wikipedia.org/wiki/Mazaua
Jonathan
On 9 August 2016 at 22:30, siddhartha banerjee sidd2006@gmail.com wrote:
Hello Everyone,
I am the first author of the paper that Denny has referred. Firstly, I want to thank Denny for asking me to join this list and know more about this discussion.
- Regarding quality, we know that there are issues, and even in the
conference, I have repeatedly told the audience that I am not satisfied with the quality of the content generated. However, the percentage of articles that were not removed when the paper was submitted was minimal. I have sent Denny a list of accounts that were used and it might have been possible that several articles created have been removed from those accounts within the last couple of months. I was not aware of the multiple account policy.
- The area of Wikipedia article generation have been explored by others
in the past. [http://www.aclweb.org/anthology/P09-1024, http:// wwwconference.org/proceedings/www2011/companion/p161.pdf] We were not aware of any rules regarding these sort of experiments. However, we do understand that such experiments can harm the general quality of this great encyclopedic resource, hence we did out analysis on bare minimum articles. In fact, we did our initial work on it back in 2014, and Wikimedia research even covered details about our paper here -- https://blog.wikimedia.org/ 2015/02/02/wikimedia-research-newsletter-january-2015/#Bot_ detects_theatre_play_scripts_on_the_web_and_writes_ Wikipedia_articles_about_them
If questions were raised at that point, we would surely not have done anything further on this, or rather do things offline without creating or adding any content on Wikipedia.
I understand your point about imposing rules and I think it makes sense. However, during this research, we were not aware of any rules, hence continued our work. As I have told Denny, our purpose was to check whether we could create bare minimal articles which could be eventually improved by authors on Wikipedia, and also to see if they are totally removed. But, it was done with a few articles and we did not create anything beyond that point. Also, we did not do any manual modifications to the articles although we saw quality issues because it would void our analysis and claims.
Thanks everyone for your time and the great work you are doing for the Wikipedia community.
Regards, Sidd
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
* The previous work you cite appears to have created articles in the draft namespace rather than the article namespace. This is a very important and very relevant detail, meaning your situation is in no way comparable to the previous work from my point of view * You appear to be solving a problem that the community of wikipedia editors does not have. We have enough low-quality stub articles that need human effort to improve and we're not really interested in more unless either (a) they demonstrably combat some of the systematic biases we're struggling with or (b) they demonstrably attract new cohorts users to do that improvement. Note that the examples discussed in the research newsletter are a non-English writer and a women writer. These are important details. * Your paper appears not to attempt to make any attempt to measure the statistical significance of your results; this isn't science. * Most of your sources are _really_ _really_ bad. https://en.wikipedia.org/ wiki/Talonid Contains 8 unique refs, one of which is good, one of which is a passable and the others should be removed immediately (but I won't because it'll make it harder for third parties reading this conversation to follow it.).
If you want to properly evaluate your technique, try this: Randomly pick N articles from https://en.wikipedia.org/wiki/Category:Articles_lacking_ sources subcats splitting them into control and subjects randomly. Parse each subject article for sentences that your system appears to understand. For each sentence your thing you understand look for reliable sources to support that sentence. Add a single ref to a single statement in each article. Add all the refs using a single account with a message on the user page about the nature of the edits. If you're not able to add any refs, mark it as a failure. Measure article lifespan for each group.
If you're in a hurry and want fast results, work with articles less than a week old (hint: articles IDs are numerically increasing sequence) or the intersection of https://en.wikipedia.org/wiki/Category:Articles_lacking_ sources subcats and Category:Articles_for_deletion Both of these groups of articles are actively being considered for deletion.
cheers stuart
-- ...let us be heard from red core to black sky
On Wed, Aug 10, 2016 at 9:30 AM, siddhartha banerjee sidd2006@gmail.com wrote:
Hello Everyone,
I am the first author of the paper that Denny has referred. Firstly, I want to thank Denny for asking me to join this list and know more about this discussion.
- Regarding quality, we know that there are issues, and even in the
conference, I have repeatedly told the audience that I am not satisfied with the quality of the content generated. However, the percentage of articles that were not removed when the paper was submitted was minimal. I have sent Denny a list of accounts that were used and it might have been possible that several articles created have been removed from those accounts within the last couple of months. I was not aware of the multiple account policy.
- The area of Wikipedia article generation have been explored by others
in the past. [http://www.aclweb.org/anthology/P09-1024, http:// wwwconference.org/proceedings/www2011/companion/p161.pdf] We were not aware of any rules regarding these sort of experiments. However, we do understand that such experiments can harm the general quality of this great encyclopedic resource, hence we did out analysis on bare minimum articles. In fact, we did our initial work on it back in 2014, and Wikimedia research even covered details about our paper here -- https://blog.wikimedia.org/ 2015/02/02/wikimedia-research-newsletter-january-2015/#Bot_ detects_theatre_play_scripts_on_the_web_and_writes_ Wikipedia_articles_about_them
If questions were raised at that point, we would surely not have done anything further on this, or rather do things offline without creating or adding any content on Wikipedia.
I understand your point about imposing rules and I think it makes sense. However, during this research, we were not aware of any rules, hence continued our work. As I have told Denny, our purpose was to check whether we could create bare minimal articles which could be eventually improved by authors on Wikipedia, and also to see if they are totally removed. But, it was done with a few articles and we did not create anything beyond that point. Also, we did not do any manual modifications to the articles although we saw quality issues because it would void our analysis and claims.
Thanks everyone for your time and the great work you are doing for the Wikipedia community.
Regards, Sidd
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
So here's the list of accounts that were used in order to create the articles:
https://en.wikipedia.org/wiki/Special:Contributions/Brownweepy https://en.wikipedia.org/wiki/Special:Contributions/Theatremania https://en.wikipedia.org/wiki/Special:Contributions/Bhopebhai https://en.wikipedia.org/wiki/Special:Contributions/Dicdac123 https://en.wikipedia.org/wiki/Special:Contributions/MightyPepper
Also some edits may have been done through IPs.
In discussion with Sidd it was clear that they did not plan to ever mass-create a large number of articles, and it is only these 50 articles or so we can clean up now. I am not terribly worried about this particular work (according to the paper there were 47 surviving articles at the time of writing, i.e. in Spring).
What I am concerned about is the fact that there will be more such experiments from other groups. It would be great to set up a few rules for this kind of behavior, so that we can at least point to them. If the only rule that was broken here was the "don't use multiple accounts" rule, I am not sure whether that would be sufficient.
Cheers, Denny
On Wed, Aug 10, 2016 at 1:47 AM Stuart A. Yeates syeates@gmail.com wrote:
- The previous work you cite appears to have created articles in the draft
namespace rather than the article namespace. This is a very important and very relevant detail, meaning your situation is in no way comparable to the previous work from my point of view
- You appear to be solving a problem that the community of wikipedia
editors does not have. We have enough low-quality stub articles that need human effort to improve and we're not really interested in more unless either (a) they demonstrably combat some of the systematic biases we're struggling with or (b) they demonstrably attract new cohorts users to do that improvement. Note that the examples discussed in the research newsletter are a non-English writer and a women writer. These are important details.
- Your paper appears not to attempt to make any attempt to measure the
statistical significance of your results; this isn't science.
- Most of your sources are _really_ _really_ bad.
https://en.wikipedia.org/wiki/Talonid Contains 8 unique refs, one of which is good, one of which is a passable and the others should be removed immediately (but I won't because it'll make it harder for third parties reading this conversation to follow it.).
If you want to properly evaluate your technique, try this: Randomly pick N articles from https://en.wikipedia.org/wiki/Category:Articles_lacking_sources subcats splitting them into control and subjects randomly. Parse each subject article for sentences that your system appears to understand. For each sentence your thing you understand look for reliable sources to support that sentence. Add a single ref to a single statement in each article. Add all the refs using a single account with a message on the user page about the nature of the edits. If you're not able to add any refs, mark it as a failure. Measure article lifespan for each group.
If you're in a hurry and want fast results, work with articles less than a week old (hint: articles IDs are numerically increasing sequence) or the intersection of https://en.wikipedia.org/wiki/Category:Articles_lacking_sources subcats and Category:Articles_for_deletion Both of these groups of articles are actively being considered for deletion.
cheers stuart
-- ...let us be heard from red core to black sky
On Wed, Aug 10, 2016 at 9:30 AM, siddhartha banerjee sidd2006@gmail.com wrote:
Hello Everyone,
I am the first author of the paper that Denny has referred. Firstly, I want to thank Denny for asking me to join this list and know more about this discussion.
- Regarding quality, we know that there are issues, and even in the
conference, I have repeatedly told the audience that I am not satisfied with the quality of the content generated. However, the percentage of articles that were not removed when the paper was submitted was minimal. I have sent Denny a list of accounts that were used and it might have been possible that several articles created have been removed from those accounts within the last couple of months. I was not aware of the multiple account policy.
- The area of Wikipedia article generation have been explored by others
in the past. [http://www.aclweb.org/anthology/P09-1024, http://wwwconference.org/proceedings/www2011/companion/p161.pdf] We were not aware of any rules regarding these sort of experiments. However, we do understand that such experiments can harm the general quality of this great encyclopedic resource, hence we did out analysis on bare minimum articles. In fact, we did our initial work on it back in 2014, and Wikimedia research even covered details about our paper here -- https://blog.wikimedia.org/2015/02/02/wikimedia-research-newsletter-january-...
If questions were raised at that point, we would surely not have done anything further on this, or rather do things offline without creating or adding any content on Wikipedia.
I understand your point about imposing rules and I think it makes sense. However, during this research, we were not aware of any rules, hence continued our work. As I have told Denny, our purpose was to check whether we could create bare minimal articles which could be eventually improved by authors on Wikipedia, and also to see if they are totally removed. But, it was done with a few articles and we did not create anything beyond that point. Also, we did not do any manual modifications to the articles although we saw quality issues because it would void our analysis and claims.
Thanks everyone for your time and the great work you are doing for the Wikipedia community.
Regards, Sidd
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Hello,
Do we have a collection of already existing and relevant policies and statements, at least for English Wikipedia? On Meta I found this page https://meta.wikimedia.org/wiki/Research:Wikipedia_Research_Management which main statement is that research is too various and complex to give some few recommendations.
At first sight, I find it difficult to read something relevant from https://en.wikipedia.org/wiki/Wikipedia:What_Wikipedia_is_not
I imagine that guidelines could be helpful with regard to a) research that includes editing wiki pages, b) the editing of students or pupils for educational purposes.
Research and educational activity should not disturb the efforts of the Wikipedia community to create and improve encyclopedic content. Disturbance can occur from creating sub standard content and involving in activities that disrupts work flows. ...
This guidelines could be only a recommendation, as long the Wikipedia communities don't change their rules. But it'd be great, anyway, if the guidelines can be based somehow on existing Wikipedia rules.
Kind regards Ziko
2016-08-12 0:41 GMT+02:00 Denny Vrandečić vrandecic@gmail.com:
So here's the list of accounts that were used in order to create the articles:
https://en.wikipedia.org/wiki/Special:Contributions/Brownweepy https://en.wikipedia.org/wiki/Special:Contributions/Theatremania https://en.wikipedia.org/wiki/Special:Contributions/Bhopebhai https://en.wikipedia.org/wiki/Special:Contributions/Dicdac123 https://en.wikipedia.org/wiki/Special:Contributions/MightyPepper
Also some edits may have been done through IPs.
In discussion with Sidd it was clear that they did not plan to ever mass-create a large number of articles, and it is only these 50 articles or so we can clean up now. I am not terribly worried about this particular work (according to the paper there were 47 surviving articles at the time of writing, i.e. in Spring).
What I am concerned about is the fact that there will be more such experiments from other groups. It would be great to set up a few rules for this kind of behavior, so that we can at least point to them. If the only rule that was broken here was the "don't use multiple accounts" rule, I am not sure whether that would be sufficient.
Cheers, Denny
On Wed, Aug 10, 2016 at 1:47 AM Stuart A. Yeates syeates@gmail.com wrote:
- The previous work you cite appears to have created articles in the
draft namespace rather than the article namespace. This is a very important and very relevant detail, meaning your situation is in no way comparable to the previous work from my point of view
- You appear to be solving a problem that the community of wikipedia
editors does not have. We have enough low-quality stub articles that need human effort to improve and we're not really interested in more unless either (a) they demonstrably combat some of the systematic biases we're struggling with or (b) they demonstrably attract new cohorts users to do that improvement. Note that the examples discussed in the research newsletter are a non-English writer and a women writer. These are important details.
- Your paper appears not to attempt to make any attempt to measure the
statistical significance of your results; this isn't science.
- Most of your sources are _really_ _really_ bad.
https://en.wikipedia.org/wiki/Talonid Contains 8 unique refs, one of which is good, one of which is a passable and the others should be removed immediately (but I won't because it'll make it harder for third parties reading this conversation to follow it.).
If you want to properly evaluate your technique, try this: Randomly pick N articles from https://en.wikipedia.org/wiki/Category:Articles_lacking_ sources subcats splitting them into control and subjects randomly. Parse each subject article for sentences that your system appears to understand. For each sentence your thing you understand look for reliable sources to support that sentence. Add a single ref to a single statement in each article. Add all the refs using a single account with a message on the user page about the nature of the edits. If you're not able to add any refs, mark it as a failure. Measure article lifespan for each group.
If you're in a hurry and want fast results, work with articles less than a week old (hint: articles IDs are numerically increasing sequence) or the intersection of https://en.wikipedia.org/wiki/Category:Articles_lacking_ sources subcats and Category:Articles_for_deletion Both of these groups of articles are actively being considered for deletion.
cheers stuart
-- ...let us be heard from red core to black sky
On Wed, Aug 10, 2016 at 9:30 AM, siddhartha banerjee sidd2006@gmail.com wrote:
Hello Everyone,
I am the first author of the paper that Denny has referred. Firstly, I want to thank Denny for asking me to join this list and know more about this discussion.
- Regarding quality, we know that there are issues, and even in the
conference, I have repeatedly told the audience that I am not satisfied with the quality of the content generated. However, the percentage of articles that were not removed when the paper was submitted was minimal. I have sent Denny a list of accounts that were used and it might have been possible that several articles created have been removed from those accounts within the last couple of months. I was not aware of the multiple account policy.
- The area of Wikipedia article generation have been explored by others
in the past. [http://www.aclweb.org/anthology/P09-1024, http:// wwwconference.org/proceedings/www2011/companion/p161.pdf] We were not aware of any rules regarding these sort of experiments. However, we do understand that such experiments can harm the general quality of this great encyclopedic resource, hence we did out analysis on bare minimum articles. In fact, we did our initial work on it back in 2014, and Wikimedia research even covered details about our paper here -- https://blog.wikimedia.org/ 2015/02/02/wikimedia-research-newsletter-january-2015/#Bot_ detects_theatre_play_scripts_on_the_web_and_writes_ Wikipedia_articles_about_them
If questions were raised at that point, we would surely not have done anything further on this, or rather do things offline without creating or adding any content on Wikipedia.
I understand your point about imposing rules and I think it makes sense. However, during this research, we were not aware of any rules, hence continued our work. As I have told Denny, our purpose was to check whether we could create bare minimal articles which could be eventually improved by authors on Wikipedia, and also to see if they are totally removed. But, it was done with a few articles and we did not create anything beyond that point. Also, we did not do any manual modifications to the articles although we saw quality issues because it would void our analysis and claims.
Thanks everyone for your time and the great work you are doing for the Wikipedia community.
Regards, Sidd
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
I think you misunderstand the nature of en.wiki.
en.wiki is not a rule-based automata; en.wiki is an autonomous community that works by consensus.
I cannot imagine a set of research rules constructed outside en.wiki that lets you 'safely' do interact with it. Observe it, maybe, but not interact with it. I can also imagine that certain kinds of observation (or certain results coming out of observation) making further observation difficult.
The best advice I can provide is to team up with an experienced editor or two.
[For editing for educational rather than research purposes see https://en.wikipedia.org/wiki/Wikipedia:Education_program ]
cheers stuart
-- ...let us be heard from red core to black sky
On Fri, Aug 12, 2016 at 11:04 AM, Ziko van Dijk zvandijk@gmail.com wrote:
Hello,
Do we have a collection of already existing and relevant policies and statements, at least for English Wikipedia? On Meta I found this page https://meta.wikimedia.org/wiki/Research:Wikipedia_Research_Management which main statement is that research is too various and complex to give some few recommendations.
At first sight, I find it difficult to read something relevant from https://en.wikipedia.org/wiki/Wikipedia:What_Wikipedia_is_not
I imagine that guidelines could be helpful with regard to a) research that includes editing wiki pages, b) the editing of students or pupils for educational purposes.
Research and educational activity should not disturb the efforts of the Wikipedia community to create and improve encyclopedic content. Disturbance can occur from creating sub standard content and involving in activities that disrupts work flows. ...
This guidelines could be only a recommendation, as long the Wikipedia communities don't change their rules. But it'd be great, anyway, if the guidelines can be based somehow on existing Wikipedia rules.
Kind regards Ziko
2016-08-12 0:41 GMT+02:00 Denny Vrandečić vrandecic@gmail.com:
So here's the list of accounts that were used in order to create the articles:
https://en.wikipedia.org/wiki/Special:Contributions/Brownweepy https://en.wikipedia.org/wiki/Special:Contributions/Theatremania https://en.wikipedia.org/wiki/Special:Contributions/Bhopebhai https://en.wikipedia.org/wiki/Special:Contributions/Dicdac123 https://en.wikipedia.org/wiki/Special:Contributions/MightyPepper
Also some edits may have been done through IPs.
In discussion with Sidd it was clear that they did not plan to ever mass-create a large number of articles, and it is only these 50 articles or so we can clean up now. I am not terribly worried about this particular work (according to the paper there were 47 surviving articles at the time of writing, i.e. in Spring).
What I am concerned about is the fact that there will be more such experiments from other groups. It would be great to set up a few rules for this kind of behavior, so that we can at least point to them. If the only rule that was broken here was the "don't use multiple accounts" rule, I am not sure whether that would be sufficient.
Cheers, Denny
On Wed, Aug 10, 2016 at 1:47 AM Stuart A. Yeates syeates@gmail.com wrote:
- The previous work you cite appears to have created articles in the
draft namespace rather than the article namespace. This is a very important and very relevant detail, meaning your situation is in no way comparable to the previous work from my point of view
- You appear to be solving a problem that the community of wikipedia
editors does not have. We have enough low-quality stub articles that need human effort to improve and we're not really interested in more unless either (a) they demonstrably combat some of the systematic biases we're struggling with or (b) they demonstrably attract new cohorts users to do that improvement. Note that the examples discussed in the research newsletter are a non-English writer and a women writer. These are important details.
- Your paper appears not to attempt to make any attempt to measure the
statistical significance of your results; this isn't science.
- Most of your sources are _really_ _really_ bad.
https://en.wikipedia.org/wiki/Talonid Contains 8 unique refs, one of which is good, one of which is a passable and the others should be removed immediately (but I won't because it'll make it harder for third parties reading this conversation to follow it.).
If you want to properly evaluate your technique, try this: Randomly pick N articles from https://en.wikipedia.org/wiki/ Category:Articles_lacking_sources subcats splitting them into control and subjects randomly. Parse each subject article for sentences that your system appears to understand. For each sentence your thing you understand look for reliable sources to support that sentence. Add a single ref to a single statement in each article. Add all the refs using a single account with a message on the user page about the nature of the edits. If you're not able to add any refs, mark it as a failure. Measure article lifespan for each group.
If you're in a hurry and want fast results, work with articles less than a week old (hint: articles IDs are numerically increasing sequence) or the intersection of https://en.wikipedia.org/wiki/ Category:Articles_lacking_sources subcats and Category:Articles_for_deletion Both of these groups of articles are actively being considered for deletion.
cheers stuart
-- ...let us be heard from red core to black sky
On Wed, Aug 10, 2016 at 9:30 AM, siddhartha banerjee <sidd2006@gmail.com
wrote:
Hello Everyone,
I am the first author of the paper that Denny has referred. Firstly, I want to thank Denny for asking me to join this list and know more about this discussion.
- Regarding quality, we know that there are issues, and even in the
conference, I have repeatedly told the audience that I am not satisfied with the quality of the content generated. However, the percentage of articles that were not removed when the paper was submitted was minimal. I have sent Denny a list of accounts that were used and it might have been possible that several articles created have been removed from those accounts within the last couple of months. I was not aware of the multiple account policy.
- The area of Wikipedia article generation have been explored by
others in the past. [http://www.aclweb.org/anthology/P09-1024, http://wwwconference.org/proceedings/www2011/companion/p161.pdf] We were not aware of any rules regarding these sort of experiments. However, we do understand that such experiments can harm the general quality of this great encyclopedic resource, hence we did out analysis on bare minimum articles. In fact, we did our initial work on it back in 2014, and Wikimedia research even covered details about our paper here -- https://blog.wikimedia.org/2015/02/02/wikimedia-research- newsletter-january-2015/#Bot_detects_theatre_play_scripts_on _the_web_and_writes_Wikipedia_articles_about_them
If questions were raised at that point, we would surely not have done anything further on this, or rather do things offline without creating or adding any content on Wikipedia.
I understand your point about imposing rules and I think it makes sense. However, during this research, we were not aware of any rules, hence continued our work. As I have told Denny, our purpose was to check whether we could create bare minimal articles which could be eventually improved by authors on Wikipedia, and also to see if they are totally removed. But, it was done with a few articles and we did not create anything beyond that point. Also, we did not do any manual modifications to the articles although we saw quality issues because it would void our analysis and claims.
Thanks everyone for your time and the great work you are doing for the Wikipedia community.
Regards, Sidd
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
I've sent this to ANI https://en.wikipedia.org/wiki/Wikipedia:Administrators%27_noticeboard/Incide...
cheers stuart
-- ...let us be heard from red core to black sky
On Fri, Aug 12, 2016 at 11:45 AM, Stuart A. Yeates syeates@gmail.com wrote:
I think you misunderstand the nature of en.wiki.
en.wiki is not a rule-based automata; en.wiki is an autonomous community that works by consensus.
I cannot imagine a set of research rules constructed outside en.wiki that lets you 'safely' do interact with it. Observe it, maybe, but not interact with it. I can also imagine that certain kinds of observation (or certain results coming out of observation) making further observation difficult.
The best advice I can provide is to team up with an experienced editor or two.
[For editing for educational rather than research purposes see https://en.wikipedia.org/wiki/Wikipedia:Education_program ]
cheers stuart
-- ...let us be heard from red core to black sky
On Fri, Aug 12, 2016 at 11:04 AM, Ziko van Dijk zvandijk@gmail.com wrote:
Hello,
Do we have a collection of already existing and relevant policies and statements, at least for English Wikipedia? On Meta I found this page https://meta.wikimedia.org/wiki/Research:Wikipedia_Research_Management which main statement is that research is too various and complex to give some few recommendations.
At first sight, I find it difficult to read something relevant from https://en.wikipedia.org/wiki/Wikipedia:What_Wikipedia_is_not
I imagine that guidelines could be helpful with regard to a) research that includes editing wiki pages, b) the editing of students or pupils for educational purposes.
Research and educational activity should not disturb the efforts of the Wikipedia community to create and improve encyclopedic content. Disturbance can occur from creating sub standard content and involving in activities that disrupts work flows. ...
This guidelines could be only a recommendation, as long the Wikipedia communities don't change their rules. But it'd be great, anyway, if the guidelines can be based somehow on existing Wikipedia rules.
Kind regards Ziko
2016-08-12 0:41 GMT+02:00 Denny Vrandečić vrandecic@gmail.com:
So here's the list of accounts that were used in order to create the articles:
https://en.wikipedia.org/wiki/Special:Contributions/Brownweepy https://en.wikipedia.org/wiki/Special:Contributions/Theatremania https://en.wikipedia.org/wiki/Special:Contributions/Bhopebhai https://en.wikipedia.org/wiki/Special:Contributions/Dicdac123 https://en.wikipedia.org/wiki/Special:Contributions/MightyPepper
Also some edits may have been done through IPs.
In discussion with Sidd it was clear that they did not plan to ever mass-create a large number of articles, and it is only these 50 articles or so we can clean up now. I am not terribly worried about this particular work (according to the paper there were 47 surviving articles at the time of writing, i.e. in Spring).
What I am concerned about is the fact that there will be more such experiments from other groups. It would be great to set up a few rules for this kind of behavior, so that we can at least point to them. If the only rule that was broken here was the "don't use multiple accounts" rule, I am not sure whether that would be sufficient.
Cheers, Denny
On Wed, Aug 10, 2016 at 1:47 AM Stuart A. Yeates syeates@gmail.com wrote:
- The previous work you cite appears to have created articles in the
draft namespace rather than the article namespace. This is a very important and very relevant detail, meaning your situation is in no way comparable to the previous work from my point of view
- You appear to be solving a problem that the community of wikipedia
editors does not have. We have enough low-quality stub articles that need human effort to improve and we're not really interested in more unless either (a) they demonstrably combat some of the systematic biases we're struggling with or (b) they demonstrably attract new cohorts users to do that improvement. Note that the examples discussed in the research newsletter are a non-English writer and a women writer. These are important details.
- Your paper appears not to attempt to make any attempt to measure the
statistical significance of your results; this isn't science.
- Most of your sources are _really_ _really_ bad.
https://en.wikipedia.org/wiki/Talonid Contains 8 unique refs, one of which is good, one of which is a passable and the others should be removed immediately (but I won't because it'll make it harder for third parties reading this conversation to follow it.).
If you want to properly evaluate your technique, try this: Randomly pick N articles from https://en.wikipedia.org/wiki/ Category:Articles_lacking_sources subcats splitting them into control and subjects randomly. Parse each subject article for sentences that your system appears to understand. For each sentence your thing you understand look for reliable sources to support that sentence. Add a single ref to a single statement in each article. Add all the refs using a single account with a message on the user page about the nature of the edits. If you're not able to add any refs, mark it as a failure. Measure article lifespan for each group.
If you're in a hurry and want fast results, work with articles less than a week old (hint: articles IDs are numerically increasing sequence) or the intersection of https://en.wikipedia.org/wiki/ Category:Articles_lacking_sources subcats and Category:Articles_for_deletion Both of these groups of articles are actively being considered for deletion.
cheers stuart
-- ...let us be heard from red core to black sky
On Wed, Aug 10, 2016 at 9:30 AM, siddhartha banerjee < sidd2006@gmail.com> wrote:
Hello Everyone,
I am the first author of the paper that Denny has referred. Firstly, I want to thank Denny for asking me to join this list and know more about this discussion.
- Regarding quality, we know that there are issues, and even in the
conference, I have repeatedly told the audience that I am not satisfied with the quality of the content generated. However, the percentage of articles that were not removed when the paper was submitted was minimal. I have sent Denny a list of accounts that were used and it might have been possible that several articles created have been removed from those accounts within the last couple of months. I was not aware of the multiple account policy.
- The area of Wikipedia article generation have been explored by
others in the past. [http://www.aclweb.org/anthology/P09-1024, http://wwwconference.org/proceedings/www2011/companion/p161.pdf] We were not aware of any rules regarding these sort of experiments. However, we do understand that such experiments can harm the general quality of this great encyclopedic resource, hence we did out analysis on bare minimum articles. In fact, we did our initial work on it back in 2014, and Wikimedia research even covered details about our paper here -- https://blog.wikimedia.org/2015/02/02/wikimedia-research- newsletter-january-2015/#Bot_detects_theatre_play_scripts_on _the_web_and_writes_Wikipedia_articles_about_them
If questions were raised at that point, we would surely not have done anything further on this, or rather do things offline without creating or adding any content on Wikipedia.
I understand your point about imposing rules and I think it makes sense. However, during this research, we were not aware of any rules, hence continued our work. As I have told Denny, our purpose was to check whether we could create bare minimal articles which could be eventually improved by authors on Wikipedia, and also to see if they are totally removed. But, it was done with a few articles and we did not create anything beyond that point. Also, we did not do any manual modifications to the articles although we saw quality issues because it would void our analysis and claims.
Thanks everyone for your time and the great work you are doing for the Wikipedia community.
Regards, Sidd
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Presuming the research is being conducted under the usual ethics regimes, putting articles into mainspace Wikipedia is putting them in front or readers and into the workflows and activities of editors. This would appear to me to constitute experiments on human subjects, which usually introduces issues of informed consent and potential to harm them. Can we be shown theethical approval documents for this particular project to see how these concerns were addressed?
Kerry
Sent from my iPad
On 12 Aug 2016, at 9:45 AM, Stuart A. Yeates syeates@gmail.com wrote:
I think you misunderstand the nature of en.wiki.
en.wiki is not a rule-based automata; en.wiki is an autonomous community that works by consensus.
I cannot imagine a set of research rules constructed outside en.wiki that lets you 'safely' do interact with it. Observe it, maybe, but not interact with it. I can also imagine that certain kinds of observation (or certain results coming out of observation) making further observation difficult.
The best advice I can provide is to team up with an experienced editor or two.
[For editing for educational rather than research purposes see https://en.wikipedia.org/wiki/Wikipedia:Education_program ]
cheers stuart
-- ...let us be heard from red core to black sky
On Fri, Aug 12, 2016 at 11:04 AM, Ziko van Dijk zvandijk@gmail.com wrote: Hello,
Do we have a collection of already existing and relevant policies and statements, at least for English Wikipedia? On Meta I found this page https://meta.wikimedia.org/wiki/Research:Wikipedia_Research_Management which main statement is that research is too various and complex to give some few recommendations.
At first sight, I find it difficult to read something relevant from https://en.wikipedia.org/wiki/Wikipedia:What_Wikipedia_is_not
I imagine that guidelines could be helpful with regard to a) research that includes editing wiki pages, b) the editing of students or pupils for educational purposes.
Research and educational activity should not disturb the efforts of the Wikipedia community to create and improve encyclopedic content. Disturbance can occur from creating sub standard content and involving in activities that disrupts work flows. ...
This guidelines could be only a recommendation, as long the Wikipedia communities don't change their rules. But it'd be great, anyway, if the guidelines can be based somehow on existing Wikipedia rules.
Kind regards Ziko
2016-08-12 0:41 GMT+02:00 Denny Vrandečić vrandecic@gmail.com:
So here's the list of accounts that were used in order to create the articles:
https://en.wikipedia.org/wiki/Special:Contributions/Brownweepy https://en.wikipedia.org/wiki/Special:Contributions/Theatremania https://en.wikipedia.org/wiki/Special:Contributions/Bhopebhai https://en.wikipedia.org/wiki/Special:Contributions/Dicdac123 https://en.wikipedia.org/wiki/Special:Contributions/MightyPepper
Also some edits may have been done through IPs.
In discussion with Sidd it was clear that they did not plan to ever mass-create a large number of articles, and it is only these 50 articles or so we can clean up now. I am not terribly worried about this particular work (according to the paper there were 47 surviving articles at the time of writing, i.e. in Spring).
What I am concerned about is the fact that there will be more such experiments from other groups. It would be great to set up a few rules for this kind of behavior, so that we can at least point to them. If the only rule that was broken here was the "don't use multiple accounts" rule, I am not sure whether that would be sufficient.
Cheers, Denny
On Wed, Aug 10, 2016 at 1:47 AM Stuart A. Yeates syeates@gmail.com wrote:
- The previous work you cite appears to have created articles in the draft namespace rather than the article namespace. This is a very important and very relevant detail, meaning your situation is in no way comparable to the previous work from my point of view
- You appear to be solving a problem that the community of wikipedia editors does not have. We have enough low-quality stub articles that need human effort to improve and we're not really interested in more unless either (a) they demonstrably combat some of the systematic biases we're struggling with or (b) they demonstrably attract new cohorts users to do that improvement. Note that the examples discussed in the research newsletter are a non-English writer and a women writer. These are important details.
- Your paper appears not to attempt to make any attempt to measure the statistical significance of your results; this isn't science.
- Most of your sources are _really_ _really_ bad. https://en.wikipedia.org/wiki/Talonid Contains 8 unique refs, one of which is good, one of which is a passable and the others should be removed immediately (but I won't because it'll make it harder for third parties reading this conversation to follow it.).
If you want to properly evaluate your technique, try this: Randomly pick N articles from https://en.wikipedia.org/wiki/Category:Articles_lacking_sources subcats splitting them into control and subjects randomly. Parse each subject article for sentences that your system appears to understand. For each sentence your thing you understand look for reliable sources to support that sentence. Add a single ref to a single statement in each article. Add all the refs using a single account with a message on the user page about the nature of the edits. If you're not able to add any refs, mark it as a failure. Measure article lifespan for each group.
If you're in a hurry and want fast results, work with articles less than a week old (hint: articles IDs are numerically increasing sequence) or the intersection of https://en.wikipedia.org/wiki/Category:Articles_lacking_sources subcats and Category:Articles_for_deletion Both of these groups of articles are actively being considered for deletion.
cheers stuart
-- ...let us be heard from red core to black sky
On Wed, Aug 10, 2016 at 9:30 AM, siddhartha banerjee sidd2006@gmail.com wrote: Hello Everyone,
I am the first author of the paper that Denny has referred. Firstly, I want to thank Denny for asking me to join this list and know more about this discussion.
Regarding quality, we know that there are issues, and even in the conference, I have repeatedly told the audience that I am not satisfied with the quality of the content generated. However, the percentage of articles that were not removed when the paper was submitted was minimal. I have sent Denny a list of accounts that were used and it might have been possible that several articles created have been removed from those accounts within the last couple of months. I was not aware of the multiple account policy.
The area of Wikipedia article generation have been explored by others in the past. [http://www.aclweb.org/anthology/P09-1024, http://wwwconference.org/proceedings/www2011/companion/p161.pdf] We were not aware of any rules regarding these sort of experiments. However, we do understand that such experiments can harm the general quality of this great encyclopedic resource, hence we did out analysis on bare minimum articles. In fact, we did our initial work on it back in 2014, and Wikimedia research even covered details about our paper here -- https://blog.wikimedia.org/2015/02/02/wikimedia-research-newsletter-january-...
If questions were raised at that point, we would surely not have done anything further on this, or rather do things offline without creating or adding any content on Wikipedia.
I understand your point about imposing rules and I think it makes sense. However, during this research, we were not aware of any rules, hence continued our work. As I have told Denny, our purpose was to check whether we could create bare minimal articles which could be eventually improved by authors on Wikipedia, and also to see if they are totally removed. But, it was done with a few articles and we did not create anything beyond that point. Also, we did not do any manual modifications to the articles although we saw quality issues because it would void our analysis and claims.
Thanks everyone for your time and the great work you are doing for the Wikipedia community.
Regards, Sidd
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
wiki-research-l@lists.wikimedia.org