Hi Lucie-Aimée,
Nice to see work in this direction is progressing. Some comments in-line.
On Wed, Apr 4, 2018 at 7:49 AM, Lucie-Aimée Kaffee kaffee@soton.ac.uk wrote:
Therefore, we worked on producing sentences from the information on Wikidata in the given language. We trained a neural network model, the details can be found in the preprint of the NAACL paper here: https://arxiv.org/abs/1803.07116
It would be good to do human (both readers and editors, and perhaps both sets) evaluations for this research, too, to better understand how well the model is doing from the perspective of the experienced editors in some of the smaller languages as well as their readers. (I acknowledge that finding experienced editors when you go to small languages can become hard.)
Furthermore, we would love to hear your input: Do you believe, one sentence summaries are enough, can we serve the communities needs better with more than one sentence?
This is a hard question to answer. :) The answer may rely on many factors including the language you want to implement such a system in and the expectation the users of the language have in terms of online content available to them in their language.
Is this still true if longer abstracts would be of lower text quality?
same as above. You are signing yourself up for more experiments. ;)
I would be interested to know: * What is the perception of the readers of a given language about Wikipedia if a lot of articles that they go to in their language have one sentence (to a good extent accurate), a few sentences but with some errors, more sentences with more errors, versus not finding the article they're interested in at all? * Related to the above: what is the error threshold beyond which the brand perceptions will turn negative (to be defined: may be by measuring if the user returns in the coming week or month.)? This may well be different in different languages and cultures. * Depending on the result of the above, we may want to look at offering the user the option to access that information, but outside of Wikipedia, or inside Wikipedia but very clearly labeled as Machine Generated as you do to some extent in these projects.
What other interesting use cases for such a technology in the Wikimedia world can you imagine?
The technology itself can have a variety of use-cases, including providing captions or summaries of photos even without layers of image processing applied to them.
Best, Leila
[1] https://www.mediawiki.org/wiki/Extension:ArticlePlaceholder and https://commons.wikimedia.org/wiki/File:Generating_Article_Placeholders_from... [2] https://eprints.soton.ac.uk/413433/1/Open_Sym_Short_Paper_Wikidata_Multiling...
-- Lucie-Aimée Kaffee Web and Internet Science Group School of Electronics and Computer Science University of Southampton _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l