On 01/07/14 22:00, Lydia Pintscher wrote:
...
Is there any documentation on how it chooses which entities to
suggest?
It basically creates a table of correlations for properties over all
items in Wikidata. So if say date of birth and place of birth are used
together a lot they get a high correlation. When you then have an item
with no place of birth but a date of birth it will suggest that
because of the high correlation.
Oh! I have a suggestion to make ...
Looking at properties that co-occur is good, but for P31 and P279, you
must use the values instead (assuming that you can cope with the size:
there are about 20k different values for these properties right now;
seems doable). It does not tell you much if an item has "instance of"
(P31), but it is very informative to know that you have "instance of:
historic house museum".
If you look at Q4810979, you can see that it really has no property that
suggests that we are looking at an historic building: instance of,
Commons category, coordinate location, country, Freebase identifier,
image. Based on properties alone, this could really be anything,
including a person. Note that even the new suggestions seem to miss most
of the "typical" properties that I listed in my other email ("English
Heritage list number" being the most obvious one for Q4810979).
My algorithm uses values of P31 as its main information. Maybe this is
why it performs better at first sight. Should be fixable with some
feature engineering using the infrastructure you have now (where I trust
that your recommender system backend has no problem with a slightly
bigger number of features).
Cheers,
Markus