Hey all,
In the continued quest to make the search bar a better tool, the Wikimedia Foundation's Discovery Department https://www.mediawiki.org/wiki/Wikimedia_Discovery has put a completion suggester into Beta Features. The tool functions with search-as-you-type, with a small tolerance for typos and spacing in finding results. Possible matches are then displayed as you type in a drop down menu, hopefully eliminating the need to perform a fulltext search with landing page and all. You can read more details at mediawiki.org https://www.mediawiki.org/wiki/Extension:CirrusSearch/CompletionSuggester and use the talk page for now for feedback.
The tool is now available and will only be enabled for the article namespace for now, and will progress into full production at some point hopefully in early 2016, depending on feedback. It's going to be important to get feedback from regular contributors who use search to make sure that any of the basic feature requests for searching the main space can at least be addressed while in Beta Features.
Thanks!
Dan
-- Dan Garry Lead Product Manager, Discovery Wikimedia Foundation
If I'm, say, building a web app that could benefit from that kind of search suggestion tool, is there an API I can use?
-Sage
On Thu, Dec 17, 2015 at 5:09 PM, Dan Garry dgarry@wikimedia.org wrote:
Hey all,
In the continued quest to make the search bar a better tool, the Wikimedia Foundation's Discovery Department https://www.mediawiki.org/wiki/Wikimedia_Discovery has put a completion suggester into Beta Features. The tool functions with search-as-you-type, with a small tolerance for typos and spacing in finding results. Possible matches are then displayed as you type in a drop down menu, hopefully eliminating the need to perform a fulltext search with landing page and all. You can read more details at mediawiki.org https://www.mediawiki.org/wiki/Extension:CirrusSearch/CompletionSuggester and use the talk page for now for feedback.
The tool is now available and will only be enabled for the article namespace for now, and will progress into full production at some point hopefully in early 2016, depending on feedback. It's going to be important to get feedback from regular contributors who use search to make sure that any of the basic feature requests for searching the main space can at least be addressed while in Beta Features.
Thanks!
Dan
-- Dan Garry Lead Product Manager, Discovery Wikimedia Foundation _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
I tried this on a search for "Sør-Aurdal" (a municipality in Norway), dropped the dash and wrote "sørau" and got a hit on "Søraust-Svalbard naturreservat" among other things. The topmost hit was "søraurdøl", which is a denomyn for someone from Sør-Aurdal. It seems to me that a spelling error is compensated with a fuzzy search for long(est?) words, but that imply nearly completing the word if there is a spelling error.
What if the topmost entry in the list had a less aggressive fuzzy search, and used shorter words? I tried several other searches, and somehow "sørau" seems to be difficult. All searches was on nowiki.
I'm a bit impressed... :D
On Sun, Dec 20, 2015 at 9:55 PM, Sage Ross ragesoss+wikipedia@gmail.com wrote:
If I'm, say, building a web app that could benefit from that kind of search suggestion tool, is there an API I can use?
-Sage
On Thu, Dec 17, 2015 at 5:09 PM, Dan Garry dgarry@wikimedia.org wrote:
Hey all,
In the continued quest to make the search bar a better tool, the
Wikimedia
Foundation's Discovery Department https://www.mediawiki.org/wiki/Wikimedia_Discovery has put a
completion
suggester into Beta Features. The tool functions with search-as-you-type, with a small tolerance for typos and spacing in finding results. Possible matches are then displayed as you type in a drop down menu, hopefully eliminating the need to perform a fulltext search with landing page and all. You can read more details at mediawiki.org <
https://www.mediawiki.org/wiki/Extension:CirrusSearch/CompletionSuggester%3E
and use the talk page for now for feedback.
The tool is now available and will only be enabled for the article namespace for now, and will progress into full production at some point hopefully in early 2016, depending on feedback. It's going to be
important
to get feedback from regular contributors who use search to make sure
that
any of the basic feature requests for searching the main space can at
least
be addressed while in Beta Features.
Thanks!
Dan
-- Dan Garry Lead Product Manager, Discovery Wikimedia Foundation _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Le 20/12/2015 22:19, John Erling Blad a écrit :
I tried this on a search for "Sør-Aurdal" (a municipality in Norway), dropped the dash and wrote "sørau" and got a hit on "Søraust-Svalbard naturreservat" among other things. The topmost hit was "søraurdøl", which is a denomyn for someone from Sør-Aurdal. It seems to me that a spelling error is compensated with a fuzzy search for long(est?) words, but that imply nearly completing the word if there is a spelling error.
Thank you, this is exactly the kind of feedback we were looking for when we deployed this feature as a beta feature.
In this case the first thing to note is that "søraurdøl" [1] is a redirect to "Sør-Aurdal" [2]. The completion suggester won't display multiple suggestions that have the same target page. Here it will receive internally both "søraurdøl" and "Sør-Aurdal" but because these pages are related to "Sør-Aurdal" it will have to decide which one to display and will choose "søraurdøl" because the query "sørau" is a perfect prefix hit. You can see when the algorithm will prefer "Sør-Aurdal" by continuing typing : "søraud" => "søraurdøl" (still a perfect prefix) "sørauda" => "Sør-Aurdal" (here both are not perfect prefix and thus will decide to display the canonical page "Sør-Aurdal")
There are many knobs we could adjust to display better suggestions. Here I can see two of them:
1. At index time the suggester will group redirects that are very similar to the canonical title: On enwiki the redirect "Albert Enstein" is grouped with its canonical page "Albert Einstein", "Albert Enstein" will never be proposed to the suggester and thus won't have to choose between "Albert Enstein" and "Albert Einstein". It will always display "Albert Einstein". This technique allows us to display proper suggestions even if the user types something very far like "alberensten". Here the suggester can take benefits from popular pages that have been manually curated by editors with common typos. Unfortunately such arbitrary decisions have also drawbacks, a counter example is "life a", on enwiki this query will suggest "Life insurance" instead of "life assurance" because the redirect "Life assurance" has been wrongly grouped with "Life insurance". This is not completely wrong, both suggestions will lead to the same page, but it's not perfect... So we could fix the "sørau" problem by increasing the tolerance of this "grouping step" but unfortunately we will increase the number of cases like "life assurance".
2. Change the decision at query time We could also change the decision and always prefer canonical pages vs redirects even if the canonical page is not a perfect prefix hit. I'm not aware of a counter example here but since our ranking algorithm is far from perfect we preferred to choose perfect prefix hits for now. In the coming months we should be able to include pageviews statistics in the formula, we hope to see positive improvements with such metrics and will hopefully allow us to review this decision.
As you can see the suggester will make arbitrary decisions (sometimes hazardous) that could be wrong and this is the whole purpose of having this feature in beta. Depending on feedback like yours we may review and adjust various parameters in the algorithm.
Thank you!
David.
[1] (Omdirigert fra Søraurdøl): https://no.wikipedia.org/w/index.php?title=S%C3%B8raurd%C3%B8l&redirect=... [2] https://no.wikipedia.org/w/api.php?action=query&list=backlinks&bltit...
Le 20/12/2015 21:55, Sage Ross a écrit :
If I'm, say, building a web app that could benefit from that kind of search suggestion tool, is there an API I can use?
The API endpoing is action=cirrus-suggest[1] and accepts 2 parameters: text for the user input and limit (5 by default).
Example : /w/api.php?action=cirrus-suggest&format=json&text=albert%20einstein&limit=5
Note that this API is highly experimental and is subject to change. I'd suggest to use it only for evaluation purpose at this point. We may provide a better integration in the mediawiki API ecosystem (i.e. generators[2]) in the coming weeks.
[1] https://en.wikipedia.org/wiki/Special:ApiSandbox#action=cirrus-suggest&t... [2] https://www.mediawiki.org/wiki/API:Query#Generators
David
On Mon, Dec 21, 2015 at 4:48 AM, David Causse dcausse@wikimedia.org wrote:
Le 20/12/2015 21:55, Sage Ross a écrit :
If I'm, say, building a web app that could benefit from that kind of search suggestion tool, is there an API I can use?
The API endpoing is action=cirrus-suggest[1] and accepts 2 parameters: text for the user input and limit (5 by default).
Example : /w/api.php?action=cirrus-suggest&format=json&text=albert%20einstein&limit=5
Note that this API is highly experimental and is subject to change.
You should have implemented isInternal() to return true in your module, so the auto-generated documentation would properly reflect that status.
I'd suggest to use it only for evaluation purpose at this point. We may provide a better integration in the mediawiki API ecosystem (i.e. generators[2]) in the coming weeks.
Does your plan for "better integration" include making it the backend for action=opensearch when CirrusSearch is installed? That would allow browsers' search bars to benefit too.
I'd recommend against a non-beta CirrusSearch module for suggestions, versus something in core that Cirrus provides the backend for. That something is probably the existing list=prefixsearch.[1]
[1]: Which, despite the name,[2] doesn't really correspond to Special:PrefixIndex. That would be list=allpages with apprefix. [2]: We may want to look into the increasingly inaccurate name of that module at some point, but I wouldn't block Cirrus's work on doing anything more than just updating the apihelp-query+prefixsearch-description message.
Le 21/12/2015 16:12, Brad Jorsch (Anomie) a écrit :
You should have implemented isInternal() to return true in your module, so the auto-generated documentation would properly reflect that status.
I'll fix it, thanks for the advice.
I'd suggest to use it only for evaluation purpose at this point. We may provide a better integration in the mediawiki API ecosystem (i.e. generators[2]) in the coming weeks.
Does your plan for "better integration" include making it the backend for action=opensearch when CirrusSearch is installed? That would allow browsers' search bars to benefit too.
It was the initial plan but for simplicity reasons I preferred to bind the MW js API searchSuggest to the cirrus-suggest internal API. If the completion suggester is proven successful and useful it will be a nice candidate for TitlePrefixSearch replacement in opensearch.
I'd recommend against a non-beta CirrusSearch module for suggestions, versus something in core that Cirrus provides the backend for. That something is probably the existing list=prefixsearch.[1]
I agree. On this point I will follow any recommendations from API maintainers, my knowledge of the current API ecosystem is too limited to make any good decision here.
Thanks!
David.
wikitech-l@lists.wikimedia.org