Hi!
Wikidata’s birthday is still a few days away but since there are no deployments on Sundays we’ll get started with an early present ;-)
Wikidata and Search Platform teams are happy to announce that Wikidata prefix search (aka wbsearchentities API aka the thing you use when you type into that box on the top right or any time you edit an item or property and use the selector widget) is now using new and improved ElasticSearch backend. You should not see any changes except for relevancy and ranking improvements.
Specifically improved are:
- better language support (matches along fallback chain and also can match in any language, with lower score)
- flexibility - we now can use Elasticsearch rescore profiles which can be tuned to take advantage of any fields we index for both matching and boosting, including links counts, statement counts, label counts, (some) statement values, etc. etc. More improvement coming soon in this area, e.g. scoring disambig pages lower, scoring units higher in proper context, etc.
- optimization - we do not need to store all search data in both DB tables and Elastic indexes anymore, all the data that is needed for search and retrieval of the results is stored in Elastic index and retrieved in a single query.
- maintainability - since it is now part of the general Wikimedia search ecosystem, it can be maintained together with the rest of the search mechanisms, using the same infrastructure, monitoring, etc.
Please tell us if you have any suggestions, comments or experience any problems with it.
Thanks!
Stas Malyshev, 25/10/2017 23:22:
Wikidata and Search Platform teams are happy to announce that Wikidata prefix search (aka wbsearchentities API aka the thing you use when you type into that box on the top right or any time you edit an item or property and use the selector widget)
Is the "selector widget" some gadget or non-default preference, or do you just mean the dropdown suggestions in the field for the value of a property? When I select the property I still see a wbsgetsuggestions request (which is good because I get suggestions of common properties); only when I switch to the next field I see some wbsearchentities/wbgetentities/other requests.
- better language support (matches along fallback chain and also can
match in any language, with lower score)
Useful! I tested with https://www.wikidata.org/wiki/Q20241614 , Q12756715 and Q997741 (random items without any label or description in my languages or fallbacks thereof) and I can still match them when I try to add them as values on another item.
Federico
Thanks a lot Stas for this present. Could you please share any pointers on how to integrate it into other tools?
Cheers,
Marco
On 10/25/17 22:22, Stas Malyshev wrote:
Wikidata and Search Platform teams are happy to announce that Wikidata prefix search is now using new and improved ElasticSearch backend.
Am 26.10.2017 um 11:36 schrieb Marco Fossati:
Thanks a lot Stas for this present. Could you please share any pointers on how to integrate it into other tools?
Just keep using wgsearchentities. It now uses Cirrus as a backend, instead of SQL. That should provide better performance, and better ranking.
Hi!
Thanks a lot Stas for this present. Could you please share any pointers on how to integrate it into other tools?
It's the same API as before, wbsearchentities. If you need additional profiles - i.e., different scoring/filtering, talk to me and/or file phab task and we can look into it.
Sounds good, thank you Daniel and Stas. Best,
Marco
On 10/26/17 19:20, Stas Malyshev wrote:
Hi!
Thanks a lot Stas for this present. Could you please share any pointers on how to integrate it into other tools?
It's the same API as before, wbsearchentities. If you need additional profiles - i.e., different scoring/filtering, talk to me and/or file phab task and we can look into it.