Re: [Wiki-research-l] Generation of Wikipedia Summaries from Wikidata in Underserved Languages using Deep Learning

5 Apr 2018

Hi Lucie-Aimée,

Nice to see work in this direction is progressing. Some comments in-line.

On Wed, Apr 4, 2018 at 7:49 AM, Lucie-Aimée Kaffee &lt;kaffee(a)soton.ac.uk&gt; wrote:
...

 Therefore, we worked on producing sentences from the information on
 Wikidata in the given language. We trained a neural network model, the
 details can be found in the preprint of the NAACL paper here:
 https://arxiv.org/abs/1803.07116 
It would be good to do human (both readers and editors, and perhaps
both sets) evaluations for this research, too, to better understand
how well the model is doing from the perspective of the experienced
editors in some of the smaller languages as well as their readers. (I
acknowledge that finding experienced editors when you go to small
languages can become hard.)

...
  Furthermore, we would love to hear your input: Do you
believe, one sentence
 summaries are enough, can we serve the communities needs better with more
 than one sentence? 
This is a hard question to answer. :) The answer may rely on many
factors including the language you want to implement such a system in
and the expectation the users of the language have in terms of online
content available to them in their language.

...
  Is this still true if longer abstracts would be of
lower
 text quality? 
same as above. You are signing yourself up for more experiments. ;)

I would be interested to know:
* What is the perception of the readers of a given language about
Wikipedia if a lot of articles that they go to in their language have
one sentence (to a good extent accurate), a few sentences but with
some errors, more sentences with more errors, versus not finding the
article they're interested in at all?
* Related to the above: what is the error threshold beyond which the
brand perceptions will turn negative (to be defined: may be by
measuring if the user returns in the coming week or month.)? This may
well be different in different languages and cultures.
* Depending on the result of the above, we may want to look at
offering the user the option to access that information, but outside
of Wikipedia, or inside Wikipedia but very clearly labeled as Machine
Generated as you do to some extent in these projects.

...
  What other interesting use cases for such a technology
in the
 Wikimedia world can you imagine? 
The technology itself can have a variety of use-cases, including
providing captions or summaries of photos even without layers of image
processing applied to them.

Best,
Leila

...
  [1]
https://www.mediawiki.org/wiki/Extension:ArticlePlaceholder and

https://commons.wikimedia.org/wiki/File:Generating_Article_Placeholders_fro…
 [2]
 https://eprints.soton.ac.uk/413433/1/Open_Sym_Short_Paper_Wikidata_Multilin…

 --
 Lucie-Aimée Kaffee
 Web and Internet Science Group
 School of Electronics and Computer Science
 University of Southampton
 _______________________________________________
 Wiki-research-l mailing list
 Wiki-research-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l 

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Wiki-research-l] Generation of Wikipedia Summaries from Wikidata in Underserved Languages using Deep Learning