Re: [Wiki-research-l] Research on automatically created articles

12 Aug 2016

I've sent this to ANI
https://en.wikipedia.org/wiki/Wikipedia:Administrators%27_noticeboard/Incid…

cheers
stuart

--
...let us be heard from red core to black sky

On Fri, Aug 12, 2016 at 11:45 AM, Stuart A. Yeates &lt;syeates(a)gmail.com&gt;
wrote:

...
  I think you misunderstand the nature of en.wiki.

 en.wiki is not a rule-based automata; en.wiki is an autonomous community
 that works by consensus.

 I cannot imagine a set of research rules constructed outside en.wiki that
 lets you 'safely' do interact with it. Observe it, maybe, but not interact
 with it. I can also imagine that certain kinds of observation (or certain
 results coming out of observation) making further observation difficult.

 The best advice I can provide is to team up with an experienced editor or
 two.

 [For editing for educational rather than research purposes see
 https://en.wikipedia.org/wiki/Wikipedia:Education_program ]

 cheers
 stuart

 --
 ...let us be heard from red core to black sky

 On Fri, Aug 12, 2016 at 11:04 AM, Ziko van Dijk &lt;zvandijk(a)gmail.com&gt;
 wrote:

  Hello,

 Do we have a collection of already existing and relevant policies and
 statements, at least for English Wikipedia? On Meta I found this page
 https://meta.wikimedia.org/wiki/Research:Wikipedia_Research_Management
 which main statement is that research is too various and complex to give
 some few recommendations.

 At first sight, I find it difficult to read something relevant from
 https://en.wikipedia.org/wiki/Wikipedia:What_Wikipedia_is_not

 I imagine that guidelines could be helpful with regard to a) research
 that includes editing wiki pages, b) the editing of students or pupils for
 educational purposes.

 Research and educational activity should not disturb the efforts of the
 Wikipedia community to create and improve encyclopedic content.
 Disturbance can occur from creating sub standard content and involving in
 activities that disrupts work flows. ...

 This guidelines could be only a recommendation, as long the Wikipedia
 communities don't change their rules. But it'd be great, anyway, if the
 guidelines can be based somehow on existing Wikipedia rules.

 Kind regards
 Ziko

 2016-08-12 0:41 GMT+02:00 Denny Vrandečić &lt;vrandecic(a)gmail.com&gt;om>:

  So here's the list of accounts that were used
in order to create the
 articles:

 https://en.wikipedia.org/wiki/Special:Contributions/Brownweepy
 https://en.wikipedia.org/wiki/Special:Contributions/Theatremania
 https://en.wikipedia.org/wiki/Special:Contributions/Bhopebhai
 https://en.wikipedia.org/wiki/Special:Contributions/Dicdac123
 https://en.wikipedia.org/wiki/Special:Contributions/MightyPepper

 Also some edits may have been done through IPs.

 In discussion with Sidd it was clear that they did not plan to ever
 mass-create a large number of articles, and it is only these 50 articles or
 so we can clean up now. I am not terribly worried about this particular
 work (according to the paper there were 47 surviving articles at the time
 of writing, i.e. in Spring).

 What I am concerned about is the fact that there will be more such
 experiments from other groups. It would be great to set up a few rules for
 this kind of behavior, so that we can at least point to them. If the only
 rule that was broken here was the "don't use multiple accounts" rule, I am
 not sure whether that would be sufficient.

 Cheers,
 Denny

 On Wed, Aug 10, 2016 at 1:47 AM Stuart A. Yeates &lt;syeates(a)gmail.com&gt;
 wrote:

  * The previous work you cite appears to have
created articles in the
 draft namespace rather than the article namespace. This is a very important
 and very relevant detail, meaning your situation is in no way comparable to
 the previous work from my point of view
 * You appear to be solving a problem that the community of wikipedia
 editors does not have. We have enough low-quality stub articles that need
 human effort to improve and we're not really interested in more unless
 either (a) they demonstrably combat some of the systematic biases we're
 struggling with or (b) they demonstrably attract new cohorts users to do
 that improvement. Note that the examples discussed in the research
 newsletter are a non-English writer and a women writer. These are important
 details.
 * Your paper appears not to attempt to make any attempt to measure the
 statistical significance of your results; this isn't science.
 * Most of your sources are _really_ _really_ bad.
 https://en.wikipedia.org/wiki/Talonid Contains 8 unique refs, one of
 which is good, one of which is a passable and the others should be removed
 immediately (but I won't because it'll make it harder for third parties
 reading this conversation to follow it.).

 If you want to properly evaluate your technique, try this: Randomly
 pick N articles from https://en.wikipedia.org/wiki/
 Category:Articles_lacking_sources subcats splitting them into control
 and subjects randomly. Parse each subject article for sentences that your
 system appears to understand. For each sentence your thing you understand
 look for reliable sources to support that sentence. Add a single ref to a
 single statement in each article. Add all the refs using a single account
 with a message on the user page about the nature of the edits. If you're
 not able to add any refs, mark it as a failure. Measure article lifespan
 for each group.

 If you're in a hurry and want fast results, work with articles less
 than a week old (hint: articles IDs are numerically increasing sequence) or
 the intersection of https://en.wikipedia.org/wiki/
 Category:Articles_lacking_sources subcats and
 Category:Articles_for_deletion Both of these groups of articles are
 actively being considered for deletion.

 cheers
 stuart

 --
 ...let us be heard from red core to black sky

 On Wed, Aug 10, 2016 at 9:30 AM, siddhartha banerjee <
 sidd2006(a)gmail.com&gt; wrote:

> Hello Everyone,
>
> I am the first author of the paper that Denny has referred. Firstly, I
> want to thank Denny for asking me to join this list and know more about
> this discussion.
>
> 1. Regarding quality, we know that there are issues, and even in the
> conference, I have repeatedly told the audience that I am not satisfied
> with the quality of the content generated. However, the percentage of
> articles that were not removed when the paper was submitted was minimal. I
> have sent Denny a list of accounts that were used and it might have been
> possible that several articles created have been removed from those
> accounts within the last couple of months. I was not aware of the multiple
> account policy.
>
> 2. The area of Wikipedia article generation have been explored by
> others in the past. [http://www.aclweb.org/anthology/P09-1024,
> http://wwwconference.org/proceedings/www2011/companion/p161.pdf] We
> were not aware of any rules regarding these sort of experiments. However,
> we do understand that such experiments can harm the general quality of this
> great encyclopedic resource, hence we did out analysis on bare minimum
> articles. In fact, we did our initial work on it back in 2014, and
> Wikimedia research even covered details about our paper here --
> https://blog.wikimedia.org/2015/02/02/wikimedia-research-
> newsletter-january-2015/#Bot_detects_theatre_play_scripts_on
> _the_web_and_writes_Wikipedia_articles_about_them
>
> If questions were raised at that point, we would surely not have done
> anything further on this, or rather do things offline without creating or
> adding any content on Wikipedia.
>
> I understand your point about imposing rules and I think it makes
> sense. However, during this research, we were not aware of any rules, hence
> continued our work.
> As I have told Denny, our purpose was to check whether we could create
> bare minimal articles which could be eventually improved by authors on
> Wikipedia, and also to see if they are totally removed. But, it was done
> with a few articles and we did not create anything beyond that point. Also,
> we did not do any manual modifications to the articles although we saw
> quality issues because it would void our analysis and claims.
>
> Thanks everyone for your time and the great work you are doing for the
> Wikipedia community.
>
> Regards,
> Sidd
>
>
>
>
>
> _______________________________________________
> Wiki-research-l mailing list
> Wiki-research-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
 _______________________________________________
 Wiki-research-l mailing list
 Wiki-research-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

 _______________________________________________
 Wiki-research-l mailing list
 Wiki-research-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

 _______________________________________________
 Wiki-research-l mailing list
 Wiki-research-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Wiki-research-l] Research on automatically created articles