Hi Leila,
First of all thanks for your input!
Therefore, we worked on producing sentences from
the information on
Wikidata in the given language. We trained a neural network model, the
details can be found in the preprint of the NAACL paper here:
https://arxiv.org/abs/1803.07116
It would be good to do human (both readers and editors, and perhaps
both sets) evaluations for this research, too, to better understand
how well the model is doing from the perspective of the experienced
editors in some of the smaller languages as well as their readers. (I
acknowledge that finding experienced editors when you go to small
languages can become hard.)
We worked with editors in the follow-up study, to be published at ESWC.
https://2018.eswc-conferences.org/wp-content/uploads/2018/02/ESWC2018_paper…
We also asked native speakers for their input on the fluency of the
sentences. However, I agree it would be interesting to dive more into the
question how the community perceives the ArticlePlaceholder in general and
with the generated summary in particular.
Furthermore, we would love to hear your input: Do
you believe, one
sentence
summaries are enough, can we serve the
communities needs better with more
than one sentence?
This is a hard question to answer. :) The answer may rely on many
factors including the language you want to implement such a system in
and the expectation the users of the language have in terms of online
content available to them in their language.
I agree. The best would probably be therefore to study the current usage of
ArticlePlaceholder and communities targeted and draw conclusions for real
needs from those points.
Is this still true if longer abstracts would be
of lower
text quality?
same as above. You are signing yourself up for more experiments. ;)
I would be interested to know:
* What is the perception of the readers of a given language about
Wikipedia if a lot of articles that they go to in their language have
one sentence (to a good extent accurate), a few sentences but with
some errors, more sentences with more errors, versus not finding the
article they're interested in at all?
* Related to the above: what is the error threshold beyond which the
brand perceptions will turn negative (to be defined: may be by
measuring if the user returns in the coming week or month.)? This may
well be different in different languages and cultures.
* Depending on the result of the above, we may want to look at
offering the user the option to access that information, but outside
of Wikipedia, or inside Wikipedia but very clearly labeled as Machine
Generated as you do to some extent in these projects.
The questions are very interesting, and in part formalize what we discussed
already as well. The best way would be to actually study this with the
communities involved, as we started in the ESWC paper, but focus on the
different interest groups in particular: readers of Wikipedia, readers
coming from outside Wikipedia, editors of Wikipedia and new editors.
What other interesting use cases for such a
technology in the
Wikimedia world can you imagine?
The technology itself can have a variety of use-cases, including
providing captions or summaries of photos even without layers of image
processing applied to them.
This sounds like a very interesting idea. I saw that there is work on image
captions by WMF already started, I will be following this with great
curiosity :)
Best,
Lucie
Best,
Leila
Placeholders_from_Wikidata_for_Wikipedia_-_Increasing_
Access_to_Free_and_Open_Knowledge.pdf
Wikidata_Multilingual.pdf
--
Lucie-Aimée Kaffee
Web and Internet Science Group
School of Electronics and Computer Science
University of Southampton
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
--
Lucie-Aimée Kaffee
Web and Internet Science Group
School of Electronics and Computer Science
University of Southampton