Hello Everyone,
I am the first author of the paper that Denny has referred. Firstly, I want
to thank Denny for asking me to join this list and know more about this
discussion.
1. Regarding quality, we know that there are issues, and even in the
conference, I have repeatedly told the audience that I am not satisfied
with the quality of the content generated. However, the percentage of
articles that were not removed when the paper was submitted was minimal. I
have sent Denny a list of accounts that were used and it might have been
possible that several articles created have been removed from those
accounts within the last couple of months. I was not aware of the multiple
account policy.
2. The area of Wikipedia article generation have been explored by others in
the past. [http://www.aclweb.org/anthology/P09-1024,
http://wwwconference.org/proceedings/www2011/companion/p161.pdf] We were
not aware of any rules regarding these sort of experiments. However, we do
understand that such experiments can harm the general quality of this great
encyclopedic resource, hence we did out analysis on bare minimum articles.
In fact, we did our initial work on it back in 2014, and Wikimedia research
even covered details about our paper here --
https://blog.wikimedia.org/2015/02/02/wikimedia-research-newsletter-january…
If questions were raised at that point, we would surely not have done
anything further on this, or rather do things offline without creating or
adding any content on Wikipedia.
I understand your point about imposing rules and I think it makes sense.
However, during this research, we were not aware of any rules, hence
continued our work.
As I have told Denny, our purpose was to check whether we could create bare
minimal articles which could be eventually improved by authors on
Wikipedia, and also to see if they are totally removed. But, it was done
with a few articles and we did not create anything beyond that point. Also,
we did not do any manual modifications to the articles although we saw
quality issues because it would void our analysis and claims.
Thanks everyone for your time and the great work you are doing for the
Wikipedia community.
Regards,
Sidd
Hi all,
I found a paper at IJCAI 2016, which left me quite curious:
https://siddbanpsu.github.io/publications/ijcai16-banerjee.pdf
In short, they find red links, classify them, find the closest similar
articles, use the section titles from these articles to decide on sections,
search for content for the sections, paraphrase it, and write complete
Wikipedia articles.
Then they uploaded the articles to Wikipedia, and from the 50 uploaded
articles, only 3 got deleted. The rest stayed. I was rather excited when I
heard that - where the articles really that good?
Then I took a look at the articles and... well, judge for yourself. The
paper only mentions three articles of the 47 survivors:
https://en.wikipedia.org/wiki/Dick_Barbourhttps://en.wikipedia.org/wiki/Atripliceae (here is the last version as
created by the bot before significant human clean-up:
https://en.wikipedia.org/w/index.php?title=Atripliceae&oldid=697456858 )
https://en.wikipedia.org/wiki/Talonid
I have connected with the first author and he promised me to give a list of
all articles as soon as he can get it, which will be in a few weeks because
he is away from his university computer right now. He was able to produce
one more article though:
https://en.wikipedia.org/wiki/Sonia_Bianchetti_Garbato
(Also, see history for the extent of human clean-up)
I am not writing to talk badly about the authors or about the reviewing
practice at IJCAI, or about the state of research in that area. Also, I
really do not want to discourage research in this area.
I have a few questions, though:
1) the fact that so many of these articles have survived for half a year
indicates that there are some problems with our review processes. Does
someone want to make an investigation why these articles survived in the
given state?
2) as far as I know we don't have rules for this kind of experiments, but
maybe we should. In particular, I feel, that, BLPs should not be created by
an experimental approach like this one. Should we set up rules for this
kind of experiments?
3) Wikipedia contributors are participating in these experiments without
consent. I find that worrysome, and would like to hear what others think.
I have invited the first author to join this list.
I understand the motivation: by exposing from the beginning that these
articles were created by bots, they would have been scrutinized differently
than articles written by humans. Therefore they remained quiet about the
fact (but are willing to reveal it now, now that the experiment is over -
they also explicitly don't have any intentions of expanding the scope of
the experiment at the given point of time).
Cheers,
Denny
SEMANTiCS 2016 - The Linked Data Conference
Workshops, Tutorials and the DBpedia Day
12th International Conference on Semantic Systems
Leipzig, Germany
September 12 -15, 2016
_http://2016.semantics.cc/_
*Workshops/Tutorials *
This year's SEMANTiCS is starting on September 12th with a full day of
exciting and interesting satellite events. In _6 parallel tracks_
<http://2016.semantics.cc/satellite-events> scientific and industrial
workshops and tutorials are scheduled to provide a forum for groups of
researchers and practitioners to discuss and learn about hot topics in
Semantic Web research.
Attending the SEMANTiCS workshops and tutorial is _free of charge_, but
you need to register. Feel free to have a closer look and register for
the events here: _http://2016.semantics.cc/satellite-events_.
*DBpedia Day - Call for Participation *
Following our successful meetings in Europe & US our next DBpedia
meeting will be held at Leipzig on September 15th, co-located with
SEMANTiCS.
_Highlights_
- Keynote #1: Wikidata: bringing structured data to Wikipedia with 16000
volunteers by Lydia Pintscher, product manager of Wikidata
- Keynote #2: Harald Sack, (title TBA) (Hasso-Plattner-Institut)
- A session for the “_DBpedia references and citations challenge_
<http://wiki.dbpedia.org/ideas/idea/261/dbpedia-citations-reference-challeng…>”
- A _session on DBpedia ontology_
<http://mappings.dbpedia.org/index.php/DBpedia_Ontology_Committee> by
members of the DBpedia ontology committee
- Tell us what cool things you do with
DBpedia:<https://goo.gl/AieceU>_https://goo.gl/AieceU_
- As always, there will be tutorials to learn about DBpedia and a
DBpedia showcase session
_Quick facts_
- Web URL: _http://wiki.dbpedia.org/meetings/Leipzig2016_
- When: September 15th, 2016
- Where: University of Leipzig, Augustusplatz 10, 04109 Leipzig
- Call for Contribution: _https://goo.gl/AieceU_ (submission form)
- Registration: Free to participate but only through registration
(Option for DBpedia support
tickets)<https://event.gg/3396-7th-dbpedia-community-meeting-in-leipzig-2016>_https://event.gg/3396-7th-dbpedia-community-meeting-in-leipzig-2016_
We are looking forward to your contributions and to seeing you at the
SEMANTiCS in Leipzig!
-------------------------------------------------------------------------------
WSDM Cup 2017: Call for Participation
-------------------------------------------------------------------------------
We invite you to take part in one of the following shared tasks:
Task 1.
Vandalism Detection -- Given a Wikidata revision, is it damaging?
This task is about detecting vandalism as well as all other kinds of
damaging
edits to Wikidata. In doing so, not only Wikidata's integrity is protected,
but
also that of all information systems making use of the knowledge base.
Task 2.
Triple Scoring -- Compute relevance scores for triples from type-like
relations.
For example, the triple "Johnny_Depp profession Actor" should get a high
score,
because acting is Depp's main profession, whereas "Quentin_Tarantino
profession
Actor" should get a low score, because Tarantino is more of a director than
an
actor. Such scores are a basic ingredient for ranking results in entity
search.
Learn more at http://www.wsdm-cup-2017.org
Register now at https://goo.gl/forms/JaVQwFFewLtVFCik2
-------------------------------------------------------------------------------
Important Dates
-------------------------------------------------------------------------------
now open Registration
Sep 1, 2016 Training data release
Dec 8, 2016 Final software submission
Dec 22, 2016 Announcement of evaluation results
Jan 5, 2017 Paper submission
Feb 6-10, 2017 Conference and WSDM Cup workshop
All deadlines are 11:59 PM, anywhere on earth (AoE).
-------------------------------------------------------------------------------
Special Announcements
-------------------------------------------------------------------------------
Evaluation as a Service.
For the sake of reproducability, we ask you to submit your software instead
of
just its run output. Software submissions allow for preserving your software
in working condition, and for re-evaluating it as new datasets appear.
To facilitate software submissions, we will make use of the cloud-based
evaluation platform TIRA (www.tira.io).
Open Source Proceedings.
We encourage the open source release of your software. To maximize the
impact
of your software, we collect it at a central repository on GitHub:
https://github.com/wsdm-cup-2017
Private repositories can be assigned to you at request during the
competition.
Benefits for early birds.
Submitting your software or your notebook early, as well as registering
early
for the conference will be rewarded. Check out the specific benefits on
our web page at http://www.wsdm-cup-2017.org
Hello Research,
I would like to setup surveys at Wikimedia Deutschland. Currently we have
google forms as a possiblity. We wonder if there are any other solutions
which are ideally open source and/or self hostable. What I found was
https://www.limesurvey.org/de/ – I wonder if anybody has experiences with
that.
Jan
--
Jan Dittrich
UX Design/ User Research
Wikimedia Deutschland e.V. | Tempelhofer Ufer 23-24 | 10963 Berlin
Phone: +49 (0)30 219 158 26-0
http://wikimedia.de
Imagine a world, in which every single human being can freely share in the
sum of all knowledge. That‘s our commitment.
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.