[Wikidata] Generating info-boxes from Wikidata: the importance of values!

7 Mar 2018


      Hi all,
Tomás and I would like to share a paper that might be of interest to the 
community. It presents some preliminary results of a work looking at 
fully automated methods to generate Wikipedia info-boxes from Wikidata. 
The main focus is on deciding what information from Wikidata to include, 
and in what order. The results are based on asking users (students) to 
rate some prototypes of generated info-boxes.
Tomás Sáez, Aidan Hogan "Automatically Generating Wikipedia Infoboxes 
from Wikidata". In the Proceedings of the Wiki Workshop at WWW 2018, 
Lyon, France, April 24, 2018.
- Link: http://aidanhogan.com/docs/infobox-wikidata.pdf
We understand that populating info-boxes is an important goal of 
Wikidata and hence we thought we'd share some lessons learned.
Obviously a lot of work is being put into populating info-boxes from 
Wikidata, but the main methods at the moment seem to be template-based 
and require a lot of manual labour; plus the definition of these 
templates seems to be a difficult problem for classes such as person 
(where different information will have different priorities for people 
of different professions, notoriety, etc.).
We were just interested to see how far we could get with a fully 
automated approach using some generic ranking methods. Also we thought 
that something like this could perhaps be used to generate a "default" 
info-box for articles with no info-box and no associated template 
mapping. The paper presents preliminary results along those lines.
One interesting result is that a major factor in the evaluation of the 
generated info-boxes was the importance of the value. For example, 
Barack Obama has lots of awards, but perhaps only something like the 
Nobel Peace Prize might be of relevance to show in the info-box (<- 
being intended as an illustrative example rather than a concrete 
assertion of course!). Another example is that sibling might not be an 
important attribute in a lot of cases, but when that sibling is Barack 
Obama, then that deserves to be in the info-box (<- how such cases could 
be expressed in a purely template-based approach, we are not sure, but 
it would seem difficult).
We assess the importance of values with PageRank. Assessing the 
importance not only of attributes, but of values, turned out to be a 
major influence on how highly our evaluators assessed the quality of the 
generated info-boxes.
This initial/isolated observation might be interesting since, to the 
best of our understanding, the current wisdom on populating info-boxes 
from Wikidata focuses on what attributes to present and in which order, 
but does not consider the importance of values (aside from the Wikidata 
rank feature, which we believe is more intended to assess 
relevance/timeliness, than importance).
Hence one of the most interesting (and surprising, for us at least) 
results of the work is to suggest that it appears to be important to 
rank *values* by importance (not just attributes) when considering what 
information the user might be interested in.
(There are limitations to PageRank measures, however, in that they 
cannot assess, for example, the importance of a particular date, or, 
more generally, datatype values.)
In any case, we are looking forward to presenting these results at the 
Wiki Workshop at WWW 2018, and any feedback or thoughts are welcome!
Cheers,
Aidan

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[Wikidata] Generating info-boxes from Wikidata: the importance of values!