Re: [Wikimedia-l] Quality assurance of articles

17 Apr 2017

Hoi,
When you consider Wikidata's data as a predictor of relevance and interest,
the biggest problem is that Wikidata does not hold enough data at this
time. The one approach I find missing in the approach you discuss in your
presentation is local and timely information. Of relevance here are awards
but also local events like elections. The problem is that for many
countries we do not even know about such awards. They indicate what is of
local relevance. There are many ways we can open up our community to these
awards. I will come up with ideas in the future.

So yes, your approach is good but like the translation tool it relies on
English content. It will be much better when we promote translation from
French, Russian, German and Chinese as well.

Another aspect I am totally missing are bot generated articles. We can and
should have stubs generated from data, cached and not saved as an article.
Basically they as a stepping stone towards an article. They are there to
inform in any language about what we do know.

I am missing it because the Wikimedia Foundation is not the "Wikipedia
Foundation", our aim is to share in the sum of all available knowledge and
that is what we could do when we use well presented cached data when we do
not have an article. When people dismiss the Cebuano Wikipedia effort  they
are typically trolling but what I do resent most is that we do not even
study the effect of bot generated articles and their value to readers.

Another approach is that we consider the use of our content to external
parties. This is where Wikidata can benefit from Sources that care to share
what they have. I have written about quality assurance but the bottom line
is that most of the external sources may have flaws but are no worse that
what we have. A perspective you may be able to confirm. Yet another reason
to consider external parties is that sharing our data with them can be of
benefit to our readers. When we are able to link into local library
systems, we are able to do so in the Netherlands, it becomes valuable to
our readers to include data on authors so that they can find them in their
local library. The point is that once you are adding data one more
statement is quickly added.

So yes I do like your presentation, I like it very much. It does not cover
everything and that is imho a consequence of the ingrained Wikipedia and
editor bias. We have largely forgotten that what we do is not about either
but about sharing information. If there is something that I wish for 2030
it is that we care about providing and sharing information, providing and
sharing the sum of all knowledge. Yes, well written text is to be preferred
and we should indeed do everything to get as much of this as we can.
Thanks,
       GerardM

On 17 April 2017 at 18:40, Leila Zia &lt;leila(a)wikimedia.org&gt; wrote:

...
  Hoi Gerard,

 On Mon, Apr 17, 2017 at 7:54 AM, Gerard Meijssen <
 gerard.meijssen(a)gmail.com&gt;
 wrote:

 When you analyse articles and find that some things are missing, it will
 help a lot when you can target these articles to the people who are  likely
  interested. When people interested in soccer
learn that a soccer player
 died, they are more likely to edit even write an article.

 You are absolutely right. This is what we even tested in the article
 creation recommendation experiment and you could see that providing
 personalized recommendations (where personalization was on the basis of
 matching editors interests based on their history of contributions) does
 better than random important recommendations. A few pointers for you:

 * Check out section 2.3 of the paper at https://arxiv.org/pdf/1604.
 03235.pdf to
 see how this was done.
 * I talk briefly about how we do the editor interest modeling at
 https://youtu.be/lHbnvRwFC_A?t=20m44s

 In general, we have at least two ways for recommending to people what they
 like to edit: one would be using the information in their past edit history
 and building topical models that can help us learn what topics an editor is
 interested in. The other is by asking the editor to provide some seeds of
 interest to us. For example, we ask you to tell us what kind of article you
 would edit, and we give you recommendations similar to the seed you
 provide. Each have its own advantages and you sometimes have to mix the two
 approaches (and more) to give the editor enough breadth and depth of topics
 to choose from.

  The approach for finding a subject that could do
with more attention is  one
  I applaud. When you want to do this across
languages think Wikidata to
 define the area of interest for users. It will always include all the
 articles in all the languages. As you have seen with the Listeria lists,
 showing red links and Wikidata items is trivial.

 Yes, finding what is missing in a Wikipedia language by comparing language
 editions is relatively easy, thanks to Wikidata. :) What is hard is ranking
 these millions of missing articles in any language based on some notion of
 importance. We developed a ranking system for the research I mentioned
 above. You can read about it in Section 2.2 of the paper at
 https://arxiv.org/pdf/1604.03235.pdf. I talk about in less details at
 https://youtu.be/lHbnvRwFC_A?t=16m58s. In a nutshell: we built a
 prediction
 model that aims to predict the number of pageviews the article would
 receive had it existed in the destination language where it's missing
 today. The higher this predicted number for a missing article in a
 language, the more important it is to create it.

 Best,
 Leila

  Thanks,
      Gerard

 On 17 April 2017 at 02:04, Leila Zia &lt;leila(a)wikimedia.org&gt; wrote:

 > Hi John,
 >
 > This may be of interest to you:
 >
 > We are working on building recommendation systems than can help editors
 > identify how to expand already existing articles in Wikipedia. This
 > includes but is not limited to identifying what sections are missing  from
   an
article, what citations, what images, infobox information, etc. This  is
  research in its early days, if you'd like to
follow up with it please  visit

https://meta.wikimedia.org/wiki/Research:Expanding_  Wikipedia_stubs_across_
  languages

 Best,
 Leila

 Leila Zia
 Senior Research Scientist
 Wikimedia Foundation

 On Sat, Apr 15, 2017 at 2:50 PM, John Erling Blad &lt;jeblad(a)gmail.com&gt;
 wrote:

 > Are anyone doing any work on automated quality assurance of articles?  Not
  > the ORES-stuff, that is about creating hints
from measured features.  I'm
 > > thinking about verifying existence and completeness of citations, and
 > > structure of logical arguments.
 > >
 > > John
 > > _______________________________________________
 > > Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/
 > > wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/
 > > wiki/Wikimedia-l
 > > New messages to: Wikimedia-l(a)lists.wikimedia.org
 > > Unsubscribe: https://lists.wikimedia.org/  mailman/listinfo/wikimedia-l,

<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe> 
_______________________________________________
 Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/
 wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/
 wiki/Wikimedia-l
 New messages to: Wikimedia-l(a)lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
 <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
  _______________________________________________
 Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/
 wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/
 wiki/Wikimedia-l
 New messages to: Wikimedia-l(a)lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
 <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
  _______________________________________________
 Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/
 wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/
 wiki/Wikimedia-l
 New messages to: Wikimedia-l(a)lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
 <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [Wikimedia-l] Quality assurance of articles