Hi!
That presents a problem. While you see
"instance of": "human", the data
is P31:Q5. We can, of course, put "instance of": "human" in the
index.
But what if label for Q5 changes? Now we have to re-index 10 million
records.
I haven't thought this through, but would it be possible to index just
Q5, and then when someone searches on "human" to see what are all the
items with the label "human", so that the search becomes "human OR
Q5"?
That has a potential to explode pretty quickly. Consider query like
"movie Bruce Willis" - where obviously you want all movies where Bruce
Willis starred. Now, if we search for "movie", we get tons of potential
matches. If we search for "Bruce" and "Willis" - even more. Now if we
stuff all those IDs we've received in our query we'll get something very
far from what you intended, and the relevance would be pretty bad. Not
to mention you have to actually run four queries instead of one (4x
load) and the last one is pretty fat, stuffed with all the IDs we've
gathered.
But that's not the end of it - you don't just want any item that is
somehow related to movies - you want items that *are* movies. And you
don't want any item that is somehow related to somebody named "Bruce" or
"Willis". You want the ones where the famous actor Bruce Willis played
(or maybe directed). But there's no such information in the query.
--
Stas Malyshev
smalyshev(a)wikimedia.org