Generating info-boxes from Wikidata: the importance of values!

List overview All Threads
Download

newer

older

Election data

CC-BY-SA

Aidan Hogan

7 Mar 2018 7 Mar '18

5:53 a.m.

Hi all,

Tomás and I would like to share a paper that might be of interest to the community. It presents some preliminary results of a work looking at fully automated methods to generate Wikipedia info-boxes from Wikidata. The main focus is on deciding what information from Wikidata to include, and in what order. The results are based on asking users (students) to rate some prototypes of generated info-boxes.

Tomás Sáez, Aidan Hogan "Automatically Generating Wikipedia Infoboxes from Wikidata". In the Proceedings of the Wiki Workshop at WWW 2018, Lyon, France, April 24, 2018.

- Link: http://aidanhogan.com/docs/infobox-wikidata.pdf

We understand that populating info-boxes is an important goal of Wikidata and hence we thought we'd share some lessons learned.

Obviously a lot of work is being put into populating info-boxes from Wikidata, but the main methods at the moment seem to be template-based and require a lot of manual labour; plus the definition of these templates seems to be a difficult problem for classes such as person (where different information will have different priorities for people of different professions, notoriety, etc.).

We were just interested to see how far we could get with a fully automated approach using some generic ranking methods. Also we thought that something like this could perhaps be used to generate a "default" info-box for articles with no info-box and no associated template mapping. The paper presents preliminary results along those lines.

One interesting result is that a major factor in the evaluation of the generated info-boxes was the importance of the value. For example, Barack Obama has lots of awards, but perhaps only something like the Nobel Peace Prize might be of relevance to show in the info-box (<- being intended as an illustrative example rather than a concrete assertion of course!). Another example is that sibling might not be an important attribute in a lot of cases, but when that sibling is Barack Obama, then that deserves to be in the info-box (<- how such cases could be expressed in a purely template-based approach, we are not sure, but it would seem difficult).

We assess the importance of values with PageRank. Assessing the importance not only of attributes, but of values, turned out to be a major influence on how highly our evaluators assessed the quality of the generated info-boxes.

This initial/isolated observation might be interesting since, to the best of our understanding, the current wisdom on populating info-boxes from Wikidata focuses on what attributes to present and in which order, but does not consider the importance of values (aside from the Wikidata rank feature, which we believe is more intended to assess relevance/timeliness, than importance).

Hence one of the most interesting (and surprising, for us at least) results of the work is to suggest that it appears to be important to rank *values* by importance (not just attributes) when considering what information the user might be interested in.

(There are limitations to PageRank measures, however, in that they cannot assess, for example, the importance of a particular date, or, more generally, datatype values.)

In any case, we are looking forward to presenting these results at the Wiki Workshop at WWW 2018, and any feedback or thoughts are welcome!

Cheers, Aidan

Show replies by date

Gerard Meijssen

7 Mar 7 Mar

7:22 a.m.

New subject: Generating info-boxes from Wikidata: the importance of values!

Hoi, Like they say in real estate.. "position, position". The value of your research is imho less in presenting static info boxes but more in being able to show info boxes about the context of the item involved. You may be interested in law professors, senators or presidents and in each case you may get presented different information about the same person; in your example professor Obama.

It is the same with awards. Consider the George Polk Award a notable journalism award.. You can view them from the perspective of the award winner but also from the perspective of the publication the awardees work(ed) for. The Polk award has "categories" they are not included in the Wikidata data yet but they would show awardees in the same category in different years.

When you want info boxes and make them static, you have to sit in judgement and kill of the "excess" but that may just be what people are looking for. When you make them smart, you will be able to provide the information that people are likely to be looking for. So please consider the smart application of your research.

In these examples we have a lot of information for the items involved. There are over 500 Polk Award winners for instance but for many of these there is not even an article. With generated info boxes you may be able to provide information anyway. It has just one prerequisite; the red links are linked to Wikidata. Thanks, GerardM

On 7 March 2018 at 05:53, Aidan Hogan aidhog@gmail.com wrote:

...

Hi all,

Tomás and I would like to share a paper that might be of interest to the community. It presents some preliminary results of a work looking at fully automated methods to generate Wikipedia info-boxes from Wikidata. The main focus is on deciding what information from Wikidata to include, and in what order. The results are based on asking users (students) to rate some prototypes of generated info-boxes.

Tomás Sáez, Aidan Hogan "Automatically Generating Wikipedia Infoboxes from Wikidata". In the Proceedings of the Wiki Workshop at WWW 2018, Lyon, France, April 24, 2018.

Link: http://aidanhogan.com/docs/infobox-wikidata.pdf

We understand that populating info-boxes is an important goal of Wikidata and hence we thought we'd share some lessons learned.

Obviously a lot of work is being put into populating info-boxes from Wikidata, but the main methods at the moment seem to be template-based and require a lot of manual labour; plus the definition of these templates seems to be a difficult problem for classes such as person (where different information will have different priorities for people of different professions, notoriety, etc.).

We were just interested to see how far we could get with a fully automated approach using some generic ranking methods. Also we thought that something like this could perhaps be used to generate a "default" info-box for articles with no info-box and no associated template mapping. The paper presents preliminary results along those lines.

One interesting result is that a major factor in the evaluation of the generated info-boxes was the importance of the value. For example, Barack Obama has lots of awards, but perhaps only something like the Nobel Peace Prize might be of relevance to show in the info-box (<- being intended as an illustrative example rather than a concrete assertion of course!). Another example is that sibling might not be an important attribute in a lot of cases, but when that sibling is Barack Obama, then that deserves to be in the info-box (<- how such cases could be expressed in a purely template-based approach, we are not sure, but it would seem difficult).

We assess the importance of values with PageRank. Assessing the importance not only of attributes, but of values, turned out to be a major influence on how highly our evaluators assessed the quality of the generated info-boxes.

This initial/isolated observation might be interesting since, to the best of our understanding, the current wisdom on populating info-boxes from Wikidata focuses on what attributes to present and in which order, but does not consider the importance of values (aside from the Wikidata rank feature, which we believe is more intended to assess relevance/timeliness, than importance).

Hence one of the most interesting (and surprising, for us at least) results of the work is to suggest that it appears to be important to rank *values* by importance (not just attributes) when considering what information the user might be interested in.

(There are limitations to PageRank measures, however, in that they cannot assess, for example, the importance of a particular date, or, more generally, datatype values.)

In any case, we are looking forward to presenting these results at the Wiki Workshop at WWW 2018, and any feedback or thoughts are welcome!

Cheers, Aidan

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Aidan Hogan

8 Mar 8 Mar

6:53 p.m.

New subject: Generating info-boxes from Wikidata: the importance of values!

Hi Gerard,

Yes, this is very much along the lines of what we ultimately ended up realising! At first we started out just trying to propose an alternative to generate info-boxes with no templates nor info-box resources. Later we came to realise that in fact, probably the template approach could benefit from some of the "smart features" along the lines of our work for the sorts of reasons you outline.

If anyone will be at the Wiki-workshop (WWW) in Lyon in April, we would be happy to discuss!

Cheers, Aidan

On 07-03-2018 3:22, Gerard Meijssen wrote:

...

Hoi, Like they say in real estate.. "position, position". The value of your research is imho less in presenting static info boxes but more in being able to show info boxes about the context of the item involved. You may be interested in law professors, senators or presidents and in each case you may get presented different information about the same person; in your example professor Obama.

It is the same with awards. Consider the George Polk Award a notable journalism award.. You can view them from the perspective of the award winner but also from the perspective of the publication the awardees work(ed) for. The Polk award has "categories" they are not included in the Wikidata data yet but they would show awardees in the same category in different years.

When you want info boxes and make them static, you have to sit in judgement and kill of the "excess" but that may just be what people are looking for. When you make them smart, you will be able to provide the information that people are likely to be looking for. So please consider the smart application of your research.

In these examples we have a lot of information for the items involved. There are over 500 Polk Award winners for instance but for many of these there is not even an article. With generated info boxes you may be able to provide information anyway. It has just one prerequisite; the red links are linked to Wikidata. Thanks, GerardM

On 7 March 2018 at 05:53, Aidan Hogan <aidhog@gmail.com mailto:aidhog@gmail.com> wrote:
Hi all,

Tomás and I would like to share a paper that might be of interest to
the community. It presents some preliminary results of a work
looking at fully automated methods to generate Wikipedia info-boxes
from Wikidata. The main focus is on deciding what information from
Wikidata to include, and in what order. The results are based on
asking users (students) to rate some prototypes of generated info-boxes.

Tomás Sáez, Aidan Hogan "Automatically Generating Wikipedia
Infoboxes from Wikidata". In the Proceedings of the Wiki Workshop at
WWW 2018, Lyon, France, April 24, 2018.

- Link: http://aidanhogan.com/docs/infobox-wikidata.pdf
<http://aidanhogan.com/docs/infobox-wikidata.pdf>

We understand that populating info-boxes is an important goal of
Wikidata and hence we thought we'd share some lessons learned.

Obviously a lot of work is being put into populating info-boxes from
Wikidata, but the main methods at the moment seem to be
template-based and require a lot of manual labour; plus the
definition of these templates seems to be a difficult problem for
classes such as person (where different information will have
different priorities for people of different professions, notoriety,
etc.).

We were just interested to see how far we could get with a fully
automated approach using some generic ranking methods. Also we
thought that something like this could perhaps be used to generate a
"default" info-box for articles with no info-box and no associated
template mapping. The paper presents preliminary results along those
lines.

One interesting result is that a major factor in the evaluation of
the generated info-boxes was the importance of the value. For
example, Barack Obama has lots of awards, but perhaps only something
like the Nobel Peace Prize might be of relevance to show in the
info-box (<- being intended as an illustrative example rather than a
concrete assertion of course!). Another example is that sibling
might not be an important attribute in a lot of cases, but when that
sibling is Barack Obama, then that deserves to be in the info-box
(<- how such cases could be expressed in a purely template-based
approach, we are not sure, but it would seem difficult).

We assess the importance of values with PageRank. Assessing the
importance not only of attributes, but of values, turned out to be a
major influence on how highly our evaluators assessed the quality of
the generated info-boxes.

This initial/isolated observation might be interesting since, to the
best of our understanding, the current wisdom on populating
info-boxes from Wikidata focuses on what attributes to present and
in which order, but does not consider the importance of values
(aside from the Wikidata rank feature, which we believe is more
intended to assess relevance/timeliness, than importance).

Hence one of the most interesting (and surprising, for us at least)
results of the work is to suggest that it appears to be important to
rank *values* by importance (not just attributes) when considering
what information the user might be interested in.

(There are limitations to PageRank measures, however, in that they
cannot assess, for example, the importance of a particular date, or,
more generally, datatype values.)

In any case, we are looking forward to presenting these results at
the Wiki Workshop at WWW 2018, and any feedback or thoughts are welcome!

Cheers,
Aidan

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata
<https://lists.wikimedia.org/mailman/listinfo/wikidata>
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Lydia Pintscher

7 Mar 7 Mar

10:03 a.m.

New subject: Generating info-boxes from Wikidata: the importance of values!

Hi Aidan and Tomás,

Thanks a lot for sharing your research. It'll be valuable input as we look into making it easier for smaller Wikipedias to generate infoboxes based on Wikidata.

Cheers Lydia

On Wed, Mar 7, 2018 at 5:53 AM, Aidan Hogan aidhog@gmail.com wrote:

...

Hi all,

Tomás and I would like to share a paper that might be of interest to the community. It presents some preliminary results of a work looking at fully automated methods to generate Wikipedia info-boxes from Wikidata. The main focus is on deciding what information from Wikidata to include, and in what order. The results are based on asking users (students) to rate some prototypes of generated info-boxes.

Tomás Sáez, Aidan Hogan "Automatically Generating Wikipedia Infoboxes from Wikidata". In the Proceedings of the Wiki Workshop at WWW 2018, Lyon, France, April 24, 2018.

Link: http://aidanhogan.com/docs/infobox-wikidata.pdf

We understand that populating info-boxes is an important goal of Wikidata and hence we thought we'd share some lessons learned.

Obviously a lot of work is being put into populating info-boxes from Wikidata, but the main methods at the moment seem to be template-based and require a lot of manual labour; plus the definition of these templates seems to be a difficult problem for classes such as person (where different information will have different priorities for people of different professions, notoriety, etc.).

We were just interested to see how far we could get with a fully automated approach using some generic ranking methods. Also we thought that something like this could perhaps be used to generate a "default" info-box for articles with no info-box and no associated template mapping. The paper presents preliminary results along those lines.

One interesting result is that a major factor in the evaluation of the generated info-boxes was the importance of the value. For example, Barack Obama has lots of awards, but perhaps only something like the Nobel Peace Prize might be of relevance to show in the info-box (<- being intended as an illustrative example rather than a concrete assertion of course!). Another example is that sibling might not be an important attribute in a lot of cases, but when that sibling is Barack Obama, then that deserves to be in the info-box (<- how such cases could be expressed in a purely template-based approach, we are not sure, but it would seem difficult).

We assess the importance of values with PageRank. Assessing the importance not only of attributes, but of values, turned out to be a major influence on how highly our evaluators assessed the quality of the generated info-boxes.

This initial/isolated observation might be interesting since, to the best of our understanding, the current wisdom on populating info-boxes from Wikidata focuses on what attributes to present and in which order, but does not consider the importance of values (aside from the Wikidata rank feature, which we believe is more intended to assess relevance/timeliness, than importance).

Hence one of the most interesting (and surprising, for us at least) results of the work is to suggest that it appears to be important to rank *values* by importance (not just attributes) when considering what information the user might be interested in.

(There are limitations to PageRank measures, however, in that they cannot assess, for example, the importance of a particular date, or, more generally, datatype values.)

In any case, we are looking forward to presenting these results at the Wiki Workshop at WWW 2018, and any feedback or thoughts are welcome!

Cheers, Aidan

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.

Raphaël Troncy

5:43 p.m.

New subject: Generating info-boxes from Wikidata: the importance of values!

Hey Aidan,

Great work, I loved it! You may want to (cite and) look at what we did 4 years ago where we tried to reverse engineer a bit what Google is doing when choosing properties (and values) to show in its rich panels alongside popular entities.

The paper is entitled "What Are the Important Properties of an Entity? Comparing Users and Knowledge Graph Point of View", https://www.eurecom.fr/~troncy/Publications/Assaf_Troncy-eswc14.pdf

... and the code is on github to replicate: https://github.com/ahmadassaf/KBE

Raphaël

Le 07/03/2018 à 05:53, Aidan Hogan a écrit :

...

Hi all,

Tomás and I would like to share a paper that might be of interest to the community. It presents some preliminary results of a work looking at fully automated methods to generate Wikipedia info-boxes from Wikidata. The main focus is on deciding what information from Wikidata to include, and in what order. The results are based on asking users (students) to rate some prototypes of generated info-boxes.

Tomás Sáez, Aidan Hogan "Automatically Generating Wikipedia Infoboxes from Wikidata". In the Proceedings of the Wiki Workshop at WWW 2018, Lyon, France, April 24, 2018.

Link: http://aidanhogan.com/docs/infobox-wikidata.pdf

We understand that populating info-boxes is an important goal of Wikidata and hence we thought we'd share some lessons learned.

Obviously a lot of work is being put into populating info-boxes from Wikidata, but the main methods at the moment seem to be template-based and require a lot of manual labour; plus the definition of these templates seems to be a difficult problem for classes such as person (where different information will have different priorities for people of different professions, notoriety, etc.).

We were just interested to see how far we could get with a fully automated approach using some generic ranking methods. Also we thought that something like this could perhaps be used to generate a "default" info-box for articles with no info-box and no associated template mapping. The paper presents preliminary results along those lines.

One interesting result is that a major factor in the evaluation of the generated info-boxes was the importance of the value. For example, Barack Obama has lots of awards, but perhaps only something like the Nobel Peace Prize might be of relevance to show in the info-box (<- being intended as an illustrative example rather than a concrete assertion of course!). Another example is that sibling might not be an important attribute in a lot of cases, but when that sibling is Barack Obama, then that deserves to be in the info-box (<- how such cases could be expressed in a purely template-based approach, we are not sure, but it would seem difficult).

We assess the importance of values with PageRank. Assessing the importance not only of attributes, but of values, turned out to be a major influence on how highly our evaluators assessed the quality of the generated info-boxes.

This initial/isolated observation might be interesting since, to the best of our understanding, the current wisdom on populating info-boxes from Wikidata focuses on what attributes to present and in which order, but does not consider the importance of values (aside from the Wikidata rank feature, which we believe is more intended to assess relevance/timeliness, than importance).

Hence one of the most interesting (and surprising, for us at least) results of the work is to suggest that it appears to be important to rank *values* by importance (not just attributes) when considering what information the user might be interested in.

(There are limitations to PageRank measures, however, in that they cannot assess, for example, the importance of a particular date, or, more generally, datatype values.)

In any case, we are looking forward to presenting these results at the Wiki Workshop at WWW 2018, and any feedback or thoughts are welcome!

Cheers, Aidan

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

-- Raphaël Troncy EURECOM, Campus SophiaTech Data Science Department 450 route des Chappes, 06410 Biot, France. e-mail: raphael.troncy@eurecom.fr & raphael.troncy@gmail.com Tel: +33 (0)4 - 9300 8242 Fax: +33 (0)4 - 9000 8200 Web: http://www.eurecom.fr/~troncy/

Aidan Hogan

8 Mar 8 Mar

7 p.m.

New subject: Generating info-boxes from Wikidata: the importance of values!

Hey Raphaël,

Thanks for the comments and the reference! And sorry we missed discussion of your paper (which indeed looks at largely the same problem in a slightly different context). If there's a next time, we will be sure to include it in the related work.

I am impressed btw to see a third-party evaluation of a Google tool. Also it seems Google has room for improvement. :)

Cheers, Aidan

On 07-03-2018 13:43, Raphaël Troncy wrote:

...

Hey Aidan,

Great work, I loved it! You may want to (cite and) look at what we did 4 years ago where we tried to reverse engineer a bit what Google is doing when choosing properties (and values) to show in its rich panels alongside popular entities.

The paper is entitled "What Are the Important Properties of an Entity? Comparing Users and Knowledge Graph Point of View", https://www.eurecom.fr/~troncy/Publications/Assaf_Troncy-eswc14.pdf

... and the code is on github to replicate: https://github.com/ahmadassaf/KBE

Raphaël

Le 07/03/2018 à 05:53, Aidan Hogan a écrit :

...
Hi all,

Tomás and I would like to share a paper that might be of interest to the community. It presents some preliminary results of a work looking at fully automated methods to generate Wikipedia info-boxes from Wikidata. The main focus is on deciding what information from Wikidata to include, and in what order. The results are based on asking users (students) to rate some prototypes of generated info-boxes.

Tomás Sáez, Aidan Hogan "Automatically Generating Wikipedia Infoboxes from Wikidata". In the Proceedings of the Wiki Workshop at WWW 2018, Lyon, France, April 24, 2018.

Link: http://aidanhogan.com/docs/infobox-wikidata.pdf

We understand that populating info-boxes is an important goal of Wikidata and hence we thought we'd share some lessons learned.

Obviously a lot of work is being put into populating info-boxes from Wikidata, but the main methods at the moment seem to be template-based and require a lot of manual labour; plus the definition of these templates seems to be a difficult problem for classes such as person (where different information will have different priorities for people of different professions, notoriety, etc.).

We were just interested to see how far we could get with a fully automated approach using some generic ranking methods. Also we thought that something like this could perhaps be used to generate a "default" info-box for articles with no info-box and no associated template mapping. The paper presents preliminary results along those lines.

One interesting result is that a major factor in the evaluation of the generated info-boxes was the importance of the value. For example, Barack Obama has lots of awards, but perhaps only something like the Nobel Peace Prize might be of relevance to show in the info-box (<- being intended as an illustrative example rather than a concrete assertion of course!). Another example is that sibling might not be an important attribute in a lot of cases, but when that sibling is Barack Obama, then that deserves to be in the info-box (<- how such cases could be expressed in a purely template-based approach, we are not sure, but it would seem difficult).

We assess the importance of values with PageRank. Assessing the importance not only of attributes, but of values, turned out to be a major influence on how highly our evaluators assessed the quality of the generated info-boxes.

This initial/isolated observation might be interesting since, to the best of our understanding, the current wisdom on populating info-boxes from Wikidata focuses on what attributes to present and in which order, but does not consider the importance of values (aside from the Wikidata rank feature, which we believe is more intended to assess relevance/timeliness, than importance).

Hence one of the most interesting (and surprising, for us at least) results of the work is to suggest that it appears to be important to rank *values* by importance (not just attributes) when considering what information the user might be interested in.

(There are limitations to PageRank measures, however, in that they cannot assess, for example, the importance of a particular date, or, more generally, datatype values.)

In any case, we are looking forward to presenting these results at the Wiki Workshop at WWW 2018, and any feedback or thoughts are welcome!

Cheers, Aidan

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Magnus Knuth

13 Mar 13 Mar

10:28 a.m.

New subject: Generating info-boxes from Wikidata: the importance of values!

Hey all,

thanks for sharing the paper, this is an interesting topic. I just wanted to point to some (own) prior work on entity summarization which is related to what you have done: https://link.springer.com/chapter/10.1007/978-3-642-35173-0_24

All the best Magnus

...

Am 08.03.2018 um 19:00 schrieb Aidan Hogan aidhog@gmail.com:

Hey Raphaël,

Thanks for the comments and the reference! And sorry we missed discussion of your paper (which indeed looks at largely the same problem in a slightly different context). If there's a next time, we will be sure to include it in the related work.

I am impressed btw to see a third-party evaluation of a Google tool. Also it seems Google has room for improvement. :)

Cheers, Aidan

On 07-03-2018 13:43, Raphaël Troncy wrote:

...
Hey Aidan, Great work, I loved it! You may want to (cite and) look at what we did 4 years ago where we tried to reverse engineer a bit what Google is doing when choosing properties (and values) to show in its rich panels alongside popular entities. The paper is entitled "What Are the Important Properties of an Entity? Comparing Users and Knowledge Graph Point of View", https://www.eurecom.fr/~troncy/Publications/Assaf_Troncy-eswc14.pdf ... and the code is on github to replicate: https://github.com/ahmadassaf/KBE Raphaël Le 07/03/2018 à 05:53, Aidan Hogan a écrit :

...
Hi all,

Tomás and I would like to share a paper that might be of interest to the community. It presents some preliminary results of a work looking at fully automated methods to generate Wikipedia info-boxes from Wikidata. The main focus is on deciding what information from Wikidata to include, and in what order. The results are based on asking users (students) to rate some prototypes of generated info-boxes.

Tomás Sáez, Aidan Hogan "Automatically Generating Wikipedia Infoboxes from Wikidata". In the Proceedings of the Wiki Workshop at WWW 2018, Lyon, France, April 24, 2018.

Link: http://aidanhogan.com/docs/infobox-wikidata.pdf

We understand that populating info-boxes is an important goal of Wikidata and hence we thought we'd share some lessons learned.

Obviously a lot of work is being put into populating info-boxes from Wikidata, but the main methods at the moment seem to be template-based and require a lot of manual labour; plus the definition of these templates seems to be a difficult problem for classes such as person (where different information will have different priorities for people of different professions, notoriety, etc.).

We were just interested to see how far we could get with a fully automated approach using some generic ranking methods. Also we thought that something like this could perhaps be used to generate a "default" info-box for articles with no info-box and no associated template mapping. The paper presents preliminary results along those lines.

One interesting result is that a major factor in the evaluation of the generated info-boxes was the importance of the value. For example, Barack Obama has lots of awards, but perhaps only something like the Nobel Peace Prize might be of relevance to show in the info-box (<- being intended as an illustrative example rather than a concrete assertion of course!). Another example is that sibling might not be an important attribute in a lot of cases, but when that sibling is Barack Obama, then that deserves to be in the info-box (<- how such cases could be expressed in a purely template-based approach, we are not sure, but it would seem difficult).

We assess the importance of values with PageRank. Assessing the importance not only of attributes, but of values, turned out to be a major influence on how highly our evaluators assessed the quality of the generated info-boxes.

This initial/isolated observation might be interesting since, to the best of our understanding, the current wisdom on populating info-boxes from Wikidata focuses on what attributes to present and in which order, but does not consider the importance of values (aside from the Wikidata rank feature, which we believe is more intended to assess relevance/timeliness, than importance).

Hence one of the most interesting (and surprising, for us at least) results of the work is to suggest that it appears to be important to rank *values* by importance (not just attributes) when considering what information the user might be interested in.

(There are limitations to PageRank measures, however, in that they cannot assess, for example, the importance of a particular date, or, more generally, datatype values.)

In any case, we are looking forward to presenting these results at the Wiki Workshop at WWW 2018, and any feedback or thoughts are welcome!

Cheers, Aidan

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

-- Magnus Knuth Hasso-Plattner-Institut für Digital Engineering gGmbH Prof.-Dr.-Helmert-Str. 2-3 14482 Potsdam Amtsgericht Potsdam, HRB 12184 Geschäftsführung: Prof. Dr. Christoph Meinel tel: +49 331 5509 547 email: magnus.knuth@hpi.de web: http://www.hpi.de/ webID: http://magnus.13mm.de/

2452

Age (days ago)

2458

Last active (days ago)

wikidata@lists.wikimedia.org

6 comments

5 participants

tags (0)

participants (5)

Aidan Hogan
Gerard Meijssen
Lydia Pintscher
Magnus Knuth
Raphaël Troncy