On Tue, Jul 1, 2014 at 10:38 PM, Markus Krötzsch
<markus(a)semantic-mediawiki.org> wrote:
On 01/07/14 22:00, Lydia Pintscher wrote:
...
Is there any documentation on how it chooses which entities to
suggest?
It basically creates a table of correlations for properties over all
items in Wikidata. So if say date of birth and place of birth are used
together a lot they get a high correlation. When you then have an item
with no place of birth but a date of birth it will suggest that
because of the high correlation.
Oh! I have a suggestion to make ...
Looking at properties that co-occur is good, but for P31 and P279, you must
use the values instead (assuming that you can cope with the size: there are
about 20k different values for these properties right now; seems doable). It
does not tell you much if an item has "instance of" (P31), but it is very
informative to know that you have "instance of: historic house museum".
If you look at Q4810979, you can see that it really has no property that
suggests that we are looking at an historic building: instance of, Commons
category, coordinate location, country, Freebase identifier, image. Based on
properties alone, this could really be anything, including a person. Note
that even the new suggestions seem to miss most of the "typical" properties
that I listed in my other email ("English Heritage list number" being the
most obvious one for Q4810979).
My algorithm uses values of P31 as its main information. Maybe this is why
it performs better at first sight. Should be fixable with some feature
engineering using the infrastructure you have now (where I trust that your
recommender system backend has no problem with a slightly bigger number of
features).
Jep work is already under way to also take values into account :)
Suggestions for qualifiers and sources will hopefully come very soon.
Cheers
Lydia
--
Lydia Pintscher -
http://about.me/lydia.pintscher
Product Manager for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.