Re: [Wiki-research-l] Wikidata quality

25 Aug 2015

Hi Ellery and Gerard,

Interesting thread!

I don't know if this could be of interest for you guys, but I have a recent
paper where we explore a very simple approach to fact-checking of a
knowledge base. We compute the shortest path between entities and convert
the distance into a semantic proximity. It's very rudimentary (for one, it
doesn't take into account different relations), but it seems to work well
on core topics (like geography, US presidents, movies, etc), and it even
gives a signal when you ask about stuff that is not contained in the
knowledge network itself:

http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0128193

In the paper we use data from DBpedia dumps, but adapting to Wikidata would
be straightforward. There is a lot of research on this problem, and it goes
under several, related names (link prediction, relation extraction,
knowledge base construction, etc). A recent overview of the literature can
be found here:

http://arxiv.org/abs/1503.00759

the review is, authored, among others, by some of the people working on
Google's Knowledge Vault.

Cheers,

Giovanni

Giovanni Luca Ciampaglia

Center for Complex Networks and Systems Research
School of Informatics and Computing, Indiana University

✎ 919 E 10th ∙ Bloomington 47408 IN ∙ USA
☞ http://www.glciampaglia.com/
✆ +1 812 855-7261
✉ gciampag(a)indiana.edu

On Tue, Aug 25, 2015 at 5:11 PM, Ellery Wulczyn &lt;ewulczyn(a)wikimedia.org&gt;
wrote:

...
  Hi Gerard,

 your blog post got me thinking about designing a Wikidata fact checking
 tool. The idea would be to rank facts to be checked by a human by some
 combination of a fact importance score and a fact uncertainty score. Do you
 know of any work that has already been done in this space? Do you think
 such a tool would be used? What are the current systems for quality control
 in Wikidata?

 As an aside, estimating fact uncertainty may reduce to estimating Wikidata
 quality as a whole.

 Best,

 Ellery

 On Mon, Aug 24, 2015 at 11:24 PM, Gerard Meijssen <
 gerard.meijssen(a)gmail.com&gt; wrote:

  Hoi,
 There is a lot of knowledge on quality in online databases. It is known
 that all of them have a certain error rate. This is true for Wikidata as
 much as any other source.

 My question is: is there a way to track Wikidata quality improvements
 over time. One approach I blogged about [1]. It is however only an approach
 to improve quality not an approach to determine quality and track the
 improvement of quality.

 The good news is that there are many dumps of Wikidata so it is possible
 to compare current Wikidata with how it was in the past.

 Would this be something that makes sense to get into for Wikimedia
 research. particularly in the light of Wikidata becoming more easily
 available to Wikipedia?
 Thanks,
      GerardM

 [1]
 http://ultimategerardm.blogspot.nl/2015/08/wikidata-quality-probability-and…

 _______________________________________________
 Wiki-research-l mailing list
 Wiki-research-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

 _______________________________________________
 Wiki-research-l mailing list
 Wiki-research-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Wiki-research-l] Wikidata quality