Hi all,
I found a paper at IJCAI 2016, which left me quite curious: https://siddbanpsu.github.io/publications/ijcai16-banerjee.pdf
In short, they find red links, classify them, find the closest similar articles, use the section titles from these articles to decide on sections, search for content for the sections, paraphrase it, and write complete Wikipedia articles.
Then they uploaded the articles to Wikipedia, and from the 50 uploaded articles, only 3 got deleted. The rest stayed. I was rather excited when I heard that - where the articles really that good?
Then I took a look at the articles and... well, judge for yourself. The paper only mentions three articles of the 47 survivors:
https://en.wikipedia.org/wiki/Dick_Barbour
https://en.wikipedia.org/wiki/Atripliceae (here is the last version as created by the bot before significant human clean-up: https://en.wikipedia.org/w/index.php?title=Atripliceae&oldid=697456858 )
https://en.wikipedia.org/wiki/Talonid
I have connected with the first author and he promised me to give a list of all articles as soon as he can get it, which will be in a few weeks because he is away from his university computer right now. He was able to produce one more article though:
https://en.wikipedia.org/wiki/Sonia_Bianchetti_Garbato
(Also, see history for the extent of human clean-up)
I am not writing to talk badly about the authors or about the reviewing practice at IJCAI, or about the state of research in that area. Also, I really do not want to discourage research in this area.
I have a few questions, though:
1) the fact that so many of these articles have survived for half a year indicates that there are some problems with our review processes. Does someone want to make an investigation why these articles survived in the given state?
2) as far as I know we don't have rules for this kind of experiments, but maybe we should. In particular, I feel, that, BLPs should not be created by an experimental approach like this one. Should we set up rules for this kind of experiments?
3) Wikipedia contributors are participating in these experiments without consent. I find that worrysome, and would like to hear what others think.
I have invited the first author to join this list.
I understand the motivation: by exposing from the beginning that these articles were created by bots, they would have been scrutinized differently than articles written by humans. Therefore they remained quiet about the fact (but are willing to reveal it now, now that the experiment is over - they also explicitly don't have any intentions of expanding the scope of the experiment at the given point of time).
Cheers, Denny
Hello Denny,
I agree with all three points. The experiment reminds me of "babelfish accidents" as we called them in de.WP, and the experimemts of Google and Microsoft to "support" "translations" between Wikipedias.
Very strange this repeating "Dick Barbour is legendary in..."
Kind regards Ziko
2016-08-09 20:29 GMT+02:00 Denny Vrandečić vrandecic@gmail.com:
Hi all,
I found a paper at IJCAI 2016, which left me quite curious: https://siddbanpsu.github.io/publications/ijcai16-banerjee.pdf
In short, they find red links, classify them, find the closest similar articles, use the section titles from these articles to decide on sections, search for content for the sections, paraphrase it, and write complete Wikipedia articles.
Then they uploaded the articles to Wikipedia, and from the 50 uploaded articles, only 3 got deleted. The rest stayed. I was rather excited when I heard that - where the articles really that good?
Then I took a look at the articles and... well, judge for yourself. The paper only mentions three articles of the 47 survivors:
https://en.wikipedia.org/wiki/Dick_Barbour
https://en.wikipedia.org/wiki/Atripliceae (here is the last version as created by the bot before significant human clean-up: https://en. wikipedia.org/w/index.php?title=Atripliceae&oldid=697456858 )
https://en.wikipedia.org/wiki/Talonid
I have connected with the first author and he promised me to give a list of all articles as soon as he can get it, which will be in a few weeks because he is away from his university computer right now. He was able to produce one more article though:
https://en.wikipedia.org/wiki/Sonia_Bianchetti_Garbato
(Also, see history for the extent of human clean-up)
I am not writing to talk badly about the authors or about the reviewing practice at IJCAI, or about the state of research in that area. Also, I really do not want to discourage research in this area.
I have a few questions, though:
- the fact that so many of these articles have survived for half a year
indicates that there are some problems with our review processes. Does someone want to make an investigation why these articles survived in the given state?
- as far as I know we don't have rules for this kind of experiments, but
maybe we should. In particular, I feel, that, BLPs should not be created by an experimental approach like this one. Should we set up rules for this kind of experiments?
- Wikipedia contributors are participating in these experiments without
consent. I find that worrysome, and would like to hear what others think.
I have invited the first author to join this list.
I understand the motivation: by exposing from the beginning that these articles were created by bots, they would have been scrutinized differently than articles written by humans. Therefore they remained quiet about the fact (but are willing to reveal it now, now that the experiment is over - they also explicitly don't have any intentions of expanding the scope of the experiment at the given point of time).
Cheers, Denny
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
There appears to be out-of-policy use of multiple accounts involved in this work. If this is the case, all of these articles may subject to deletion on procedural grounds, completely independently of any real or perceived notability, article quality or research quality.
I HIGHLY recommend against such use of multiple accounts.
cheers stuart
-- ...let us be heard from red core to black sky
On Wed, Aug 10, 2016 at 7:53 AM, Ziko van Dijk zvandijk@gmail.com wrote:
Hello Denny,
I agree with all three points. The experiment reminds me of "babelfish accidents" as we called them in de.WP, and the experimemts of Google and Microsoft to "support" "translations" between Wikipedias.
Very strange this repeating "Dick Barbour is legendary in..."
Kind regards Ziko
2016-08-09 20:29 GMT+02:00 Denny Vrandečić vrandecic@gmail.com:
Hi all,
I found a paper at IJCAI 2016, which left me quite curious: https://siddbanpsu.github.io/publications/ijcai16-banerjee.pdf
In short, they find red links, classify them, find the closest similar articles, use the section titles from these articles to decide on sections, search for content for the sections, paraphrase it, and write complete Wikipedia articles.
Then they uploaded the articles to Wikipedia, and from the 50 uploaded articles, only 3 got deleted. The rest stayed. I was rather excited when I heard that - where the articles really that good?
Then I took a look at the articles and... well, judge for yourself. The paper only mentions three articles of the 47 survivors:
https://en.wikipedia.org/wiki/Dick_Barbour
https://en.wikipedia.org/wiki/Atripliceae (here is the last version as created by the bot before significant human clean-up: https://en.wikipedia.org/w/index.php?title=Atripliceae&oldid=697456858 )
https://en.wikipedia.org/wiki/Talonid
I have connected with the first author and he promised me to give a list of all articles as soon as he can get it, which will be in a few weeks because he is away from his university computer right now. He was able to produce one more article though:
https://en.wikipedia.org/wiki/Sonia_Bianchetti_Garbato
(Also, see history for the extent of human clean-up)
I am not writing to talk badly about the authors or about the reviewing practice at IJCAI, or about the state of research in that area. Also, I really do not want to discourage research in this area.
I have a few questions, though:
- the fact that so many of these articles have survived for half a year
indicates that there are some problems with our review processes. Does someone want to make an investigation why these articles survived in the given state?
- as far as I know we don't have rules for this kind of experiments, but
maybe we should. In particular, I feel, that, BLPs should not be created by an experimental approach like this one. Should we set up rules for this kind of experiments?
- Wikipedia contributors are participating in these experiments without
consent. I find that worrysome, and would like to hear what others think.
I have invited the first author to join this list.
I understand the motivation: by exposing from the beginning that these articles were created by bots, they would have been scrutinized differently than articles written by humans. Therefore they remained quiet about the fact (but are willing to reveal it now, now that the experiment is over - they also explicitly don't have any intentions of expanding the scope of the experiment at the given point of time).
Cheers, Denny
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Denny Vrandečić, 09/08/2016 20:29:
- the fact that so many of these articles have survived for half a year
indicates that there are some problems with our review processes. Does someone want to make an investigation why these articles survived in the given state?
Looks like the good old trick of making sure that the most prominent parts are ok (first line, headers, footnotes) and then adding mere fillers for the rest...
Nemo
wiki-research-l@lists.wikimedia.org