+1 with Magnus on years of birth and death (but perhaps /only/ years of birth and death, or close surrogates eg years of baptism and burial, and inception or publication date for things, otherwise the search specificity would become useless with too many other 'significant event' dates)
I have found in the last few weeks I have been using the External ID value search a lot, from its search-box on the talk page of the main page for a property.
I'm finding this works very well, so I wonder whether people think that the ability to search for one of these strings directly in the general search box would actually add anything, or is the custom search eg via the talk-page search box already enough?
-- James.
On 27/07/2018 12:49, Magnus Manske wrote:
Hi, and thanks for working on this!
My subjective view:
- We don't need P2860/P1433 indexed, at least not at the moment
- I would really like dates (mainly, born/died), especially if they work
for "greater units", that is, I search for a year and get an item back, even though the statament is month- or day-precise
Cheers, Magnus
On Thu, Jul 26, 2018 at 10:48 PM Stas Malyshev smalyshev@wikimedia.org wrote:
Hi!
Today we are indexing in ElasticSearch almost all string properties (except a few) and select item properties (P31 and P279). We've been asked to extend this set and index more item properties (https://phabricator.wikimedia.org/T199884). We did not do it from the start because we did not want to add too much data to the index at once, and wanted to see how the index behaves. To evaluate what this change would mean, some statistics:
All usage of item properties in statements is about 231 million uses (according to sqid tool database). Of those, about 50M uses are "instance of" which we are already indexing. Another 98M uses belong to two properties - published in (P1433) and cites (P2860). Leaving about 86M for the rest of the properties.
So, if we index all the item properties except P2860 and P1433, we'll be a little more than doubling the amount of data we're storing for this field, which seems OK. But if we index those too, we'll be essentially quadrupling it - which may be OK too, but is bigger jump and one that may potentially cause some issues.
So, we have two questions:
- Do we want to enable indexing for all item properties? Note that if
you just want to find items with certain statement values, Wikidata Query Service matches this use case best. It's only in combination with actual fulltext search where on-wiki search is better.
- Do we need to index P2860 and P1433 at all, and if so, would it be ok
if we omit indexing for now?
Would be glad to hear thoughts on the matter.
Thanks,
Stas Malyshev smalyshev@wikimedia.org
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
--- This email has been checked for viruses by AVG. https://www.avg.com