Hi!
Search Platform team would like to present a prototype test site of new and improved Wikidata fulltext search:
http://wikidata-wdsearch.wmflabs.org/wiki/Special:Search
Please try your favorite searches on it and report whether it looks good and which problems you notice.
Important to note for this prototype:
- The data in the search is imported from Wikidata index but not updated from it after import, so it may be slightly out of date
- The search is in English by default but you can try other languages by using uselang parameter, e.g.: http://wikidata-wdsearch.wmflabs.org/w/index.php?search=Wien&title=Speci... Note that since it's a test site, this is probably the best way to test non-English searches as logins etc. may not work there properly.
- Search would work properly only for main & property namespace (0 and 120).
What kind of problems we are looking for?
- Ranking and retrieval problems, i.e. result X appears too low or too high in specific search, or does not appear at all (please tell us specific search query and expected result)
- UI problems - i.e. the ranking is fine but highlighting or label or description is broken or look bad, or not highlighting the result that should be highlighted
Of course, if some search result worked spectacularly better for you, it would be nice to know too :)
What should work?
Any search in Special:Search in main namespace and Property namespace should produce sensible result. Searches without advanced syntax should have better results than before, and search with advanced syntax (+, -, *, quotes, etc.) should work no worse than before.
Please note that this is a test wiki, so nothing else but search is expected to work, including clicking on other links, editing, browsing to other pages, etc. This is also a test site, so short disruptions might be possible when we update or change things or fix bugs reported by you :)
How to provide feedback?
Several ways are possible: - Reply to this list or personally to me if you prefer - On-wiki message on my talk page: https://www.wikidata.org/wiki/User_talk:Smalyshev_(WMF) - Talk to us on IRC: #wikimedia-discovery
Thanks!
Awww… this is awesome! It works really well, I can't wait to see this deployed.
This is going to give a huge boost to the OpenRefine reconciliation service.
Where can I learn about the internals of this jewel? (which search engine, what metrics are used to rank items, and so on).
Antonin
On 18/12/2017 18:23, Stas Malyshev wrote:
Hi!
Search Platform team would like to present a prototype test site of new and improved Wikidata fulltext search:
http://wikidata-wdsearch.wmflabs.org/wiki/Special:Search
Please try your favorite searches on it and report whether it looks good and which problems you notice.
Important to note for this prototype:
- The data in the search is imported from Wikidata index but not updated
from it after import, so it may be slightly out of date
- The search is in English by default but you can try other languages by
using uselang parameter, e.g.: http://wikidata-wdsearch.wmflabs.org/w/index.php?search=Wien&title=Speci... Note that since it's a test site, this is probably the best way to test non-English searches as logins etc. may not work there properly.
- Search would work properly only for main & property namespace (0 and
120).
What kind of problems we are looking for?
- Ranking and retrieval problems, i.e. result X appears too low or too
high in specific search, or does not appear at all (please tell us specific search query and expected result)
- UI problems - i.e. the ranking is fine but highlighting or label or
description is broken or look bad, or not highlighting the result that should be highlighted
Of course, if some search result worked spectacularly better for you, it would be nice to know too :)
What should work?
Any search in Special:Search in main namespace and Property namespace should produce sensible result. Searches without advanced syntax should have better results than before, and search with advanced syntax (+, -, *, quotes, etc.) should work no worse than before.
Please note that this is a test wiki, so nothing else but search is expected to work, including clicking on other links, editing, browsing to other pages, etc. This is also a test site, so short disruptions might be possible when we update or change things or fix bugs reported by you :)
How to provide feedback?
Several ways are possible:
- Reply to this list or personally to me if you prefer
- On-wiki message on my talk page:
https://www.wikidata.org/wiki/User_talk:Smalyshev_(WMF)
- Talk to us on IRC: #wikimedia-discovery
Thanks!
Hi!
Where can I learn about the internals of this jewel? (which search engine, what metrics are used to rank items, and so on).
Thanks for your kind words. You can track it here:
https://phabricator.wikimedia.org/T125500
and associated tasks like this one: https://phabricator.wikimedia.org/T178851
which contain links to the patches. The search runs on the same ElasticSearch we use for search on other sites, but the prototype has specific code to deal with Wikidata specific data structure and the fact that it is, unlike most other Wikimedia sites, multilingual by design.
The rankings are hand-tuned now and kind of hard to read right now (we're working on improving this), they are contained here: https://phabricator.wikimedia.org/diffusion/EWBA/browse/master/repo/config/ and specific functions we're using here: https://phabricator.wikimedia.org/diffusion/EWBA/browse/master/repo/config/E...
Basically it's a combination of match score (how well the string matches the query), incoming link count, sitelink count and special boosts like demoting the disambiguation pages.
Hi Stas,
I guess its using an older index from a few weeks ago ? Doesn't seem to have the latest properties that have landed, but that's ok if the ES index isn't current yet and your just experimenting and getting feedback.
http://wikidata-wdsearch.wmflabs.org/w/index.php?search=partition&title=...
Didn't see https://www.wikidata.org/wiki/Property:P4653
On Mon, Dec 18, 2017 at 1:20 PM Stas Malyshev smalyshev@wikimedia.org wrote:
Hi!
Where can I learn about the internals of this jewel? (which search engine, what metrics are used to rank items, and so on).
Thanks for your kind words. You can track it here:
https://phabricator.wikimedia.org/T125500
and associated tasks like this one: https://phabricator.wikimedia.org/T178851
which contain links to the patches. The search runs on the same ElasticSearch we use for search on other sites, but the prototype has specific code to deal with Wikidata specific data structure and the fact that it is, unlike most other Wikimedia sites, multilingual by design.
The rankings are hand-tuned now and kind of hard to read right now (we're working on improving this), they are contained here: https://phabricator.wikimedia.org/diffusion/EWBA/browse/master/repo/config/ and specific functions we're using here:
https://phabricator.wikimedia.org/diffusion/EWBA/browse/master/repo/config/E...
Basically it's a combination of match score (how well the string matches the query), incoming link count, sitelink count and special boosts like demoting the disambiguation pages. -- Stas Malyshev smalyshev@wikimedia.org
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hmm, it seems hit or miss. Perhaps your sitelinks scoring algorithm is having too much of an impact here ? Because it seems like several times a nearly full phrase will be ranked much lower than an incomplete or partial phrase.
For example Cart World Series is showing lower ranked in this query, where I expected it to be nearly 1st, given "cart+world":
http://wikidata-wdsearch.wmflabs.org/w/index.php?search=cart+world&title...
-Thad +ThadGuidry https://plus.google.com/+ThadGuidry
Hi!
I guess its using an older index from a few weeks ago ? Doesn't seem to have the latest properties that have landed, but that's ok if the ES index isn't current yet and your just experimenting and getting feedback.
Yes, exactly. Wikidata index is big, and we can not use main index since we're experimenting on it, so we make a copy and use that. Of course, the copy gets out of date :) This one is couple of weeks old.
http://wikidata-wdsearch.wmflabs.org/w/index.php?search=partition&title=...
Didn't see https://www.wikidata.org/wiki/Property:P4653
yes, too recent :)
Hi, it seems very good. What about suggestions?
Riccardo
2017-12-18 22:31 GMT+01:00 Stas Malyshev smalyshev@wikimedia.org:
Hi!
I guess its using an older index from a few weeks ago ? Doesn't seem to have the latest properties that have landed, but that's ok if the ES index isn't current yet and your just experimenting and getting feedback.
Yes, exactly. Wikidata index is big, and we can not use main index since we're experimenting on it, so we make a copy and use that. Of course, the copy gets out of date :) This one is couple of weeks old.
http://wikidata-wdsearch.wmflabs.org/w/index.php?search=partition&title=
Special:Search&profile=advanced&fulltext=1&ns120=1&searchToken= 25rdek6vt4n1ekkk5ht0ew0vv
Didn't see https://www.wikidata.org/wiki/Property:P4653
yes, too recent :)
-- Stas Malyshev smalyshev@wikimedia.org
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata