Hi!
I think we already index way more than P31 and P279.
Oh yes, all the string properties.
So I think that the increase is smaller than what you
anticipate.
What I'd try to avoid in general is indexing terms that have only doc
since they are pretty useless.
For unique string properties, that would be a frequent occurrence. But I
am not sure why it's useless - won't it be a legit use case to look up
something by external ID?
I think we should investigate what kind of data we may
have here, and at
least for statement_keywords I would not index data that contain random
text (esp. natural language) since they are prone to be unique and
impossible to search.
Yes, we definitely should not do that. I tried to exclude such
properties but if you notice more of them, let's add them to exclusion
config.
--
Stas Malyshev
smalyshev(a)wikimedia.org