+mobile-l
mobile-l recipients, if replying, if you would please reply-all in case any people on the CC: line aren't on mobile-l, it would be appreciated.
Nik,
Thanks for the update. Glad to hear there's even faster performance coming and also that there's no need to structure too much fallback stuff depending on whether the reflex time is okay. With any luck, it would be just fast enough. I don't think there'd be too much hammering on the suggest term; only if the resultset is insufficient does it seem like it would make sense to orchestrate the client side (or server side, for that matter) call. The apps do have a key tap timer thing on them to help avoid spurious searching, so that should help. I think I understand the ellipse related stuff - parsing the snippet text is no problem, but if there's an even simpler way to get text condensed to the point where there's no work to avoid wrapping on most form factors, cool!...and if I misunderstood, well, we'll get to the bottom of that on Friday.
-Adam
On Tue, Apr 1, 2014 at 3:53 PM, Dan Garry dgarry@wikimedia.org wrote:
I don't want to bloat the meeting into something massive, but I did invite Kenan and Howie since we're going to be talking about product consistency and that's something that should involve them.
Thanks for setting this up, Jared!
Dan
On 1 April 2014 15:05, Jared Zimmerman jared.zimmerman@wikimedia.orgwrote:
*...but how about setting up a google hangout or something?*
done.
*Jared Zimmerman * \ Director of User Experience \ Wikimedia Foundation M : +1 415 609 4043 | : @JaredZimmermanhttps://twitter.com/JaredZimmerman
On Tue, Apr 1, 2014 at 3:00 PM, Nikolas Everett neverett@wikimedia.orgwrote:
On Tue, Apr 1, 2014 at 5:13 PM, Adam Baso abaso@wikimedia.org wrote:
My email got a little buried in the thread.
You guys on mobile-l? It would be nice to bring the conversation there if possible. Understood if not; maybe we can get mobile-tech and any other necessary lists here in that case?
I imagine the right thing is to add them to the email chain and we'll all keep reply-alling.
During the mobile quarterly planning kickoff this morning, I mentioned that I had started on a patchset for iOS and that I think it would be cool if we could try this first in apps, then hopefully roll to mobile web (to ease into load, but also to learn on any other fronts we haven't considered). Here's the WIP patchset.
https://gerrit.wikimedia.org/r/#/c/121562/
See in particular the comments in the first code file in that patchset for some of my thought process.
Chad, that patchset is the thing I was talking about the other day for list=search.
It queries like the following:
https://en.m.wikipedia.org/w/api.php?action=query&list=search&srsear...https://en.m.wikipedia.org/w/api.php?action=query&list=search&srsearch=population%20of%20san%20francisco&srprop=snippet%7Csectiontitle&srlimit=15&srbackend=CirrusSearch&format=json
It would be really neat to make the app the first place in a user's mind where s/he's going to search for factual information even when doing so via unstructured search terms. I think for people without the app, they will of course always go through more conventional channels to enter queries that aren't perfectly structured for title-starts-with; my hope is that if we give them this goodie early on they'll be pleasantly surprised and see it as a good reason to use the official apps.
Sounds good to me.
Sounds like we may need to reconcile caching and general load performance items when using CirrusSearch for the backend...although if it's possible to do this fulltext magic by default on *just* the apps to start, without making CirrusSearch come to a halt (!), that would be totally sweet.
So prefix search is on the order of 4-5 ms for Elasticsearch to service, and it is cached. Full text search varies from 30ms to 500ms for "acceptable" performance. Not great, but ok.
Some queries take even longer but we're working on speeding it up. On Thursday I'm pushing something that'll save about 25% off of particularly slow queries. We'll get another 20%ish on top of that when we upgrade to Elasticsearch 1.1.0 next week because that'll bring to bear some work I did upstream a few weeks ago. We'll also be able to start using some work I did back in January that can cut really nasty queries by orders of magnitude, but we'll need to make some cirrus changes for that so it'll probably hit enwiki in a few weeks. So, yeah, we're working on it.
But the upshot is we'll have to be real careful if we want actual full text search to be fast enough for find as you type. We can save a bunch of time by not running the "did you mean" if you don't use them. Beyond that we'd have to look at things like the phrase match boost and highlighting the results text. You may want to be even more careful about the number of requests you search for because once you start cutting to the bone highlighting the results starts to show up (20ms for 50 results normally, can get higher).
One idea, although less than ideal just from a coding perspective
(especially if perf is not an issue), would be to make the client-side do lag detection or to observe a server-issued feature flag (there will be several of these for the app already), or both, such that if lag is unacceptable client side it would fallback to opensearch.
Probably not worth it initially. Maybe a good idea to keep in our back pocket if we find it is just too slow.
I don't have it in there yet for the GUI rendering (I was just working in the confines of the existing iOS code to see how it would play), but I was thinking to put the snippet text in a smaller font below the title text in this iOS POC to help the user have a little bit more context about *why* a result came back...that's helpful particularly if the page title in the result set isn't obviously related to the search stem or expansion; as you know! So instead of just
San Francisco
it would instead look like, say
San Francisco ...San Francisco City and county City and County...
The client-side code could even try to opportunistically slice the snippet text in some sensible fashion to try to provide reasonable context without wrapping text, and if that fails, just start from the beginning and add the ellipsis as appropriate to not wrap the result item's snippet text to the next line.
We should talk more about this. I've spent a bunch of time over the past two weeks working on a better highlighter then the one we are using now. It'll be faster and require less disk space. I wonder how stupid an idea it'd be to try to highlighter within a pixel size with a certain font.....
Any ideas if this is achievable? Fulltext search feels so much more natural. I guess there's maybe also this notion of search within title (it does look like srwhat=title is currently disabled for the CirrusSearch provider, at least for API), with a suggest term backing (ideally, the API would just magically augment results with suggest term autolookup, but the orchestration is obviously possible client side, too) to help deal with misspelling, which is even more likely on the mobile app.
Okay, hope that helps a bit.
Any ideas for short, medium, and longer term approach?
Got to go, but how about setting up a google hangout or something?
-Adam
On Tue, Apr 1, 2014 at 1:21 PM, Adam Baso abaso@wikimedia.org wrote:
Let me reply-all in a couple minutes.
On Tue, Apr 1, 2014 at 1:15 PM, Chad Horohoe chorohoe@wikimedia.orgwrote:
On Tue, Apr 1, 2014 at 1:10 PM, Jared Zimmerman < jared.zimmerman@wikimedia.org> wrote:
> I think that the mobile app is returning results that match user > expectations (in this case RESULTS rather than NO RESULTS) so I'd urge the > team to figure out how to resolve this issue even if there are > technological or performance issues to overcome. > > > That is not consistent with how the search box has ever worked. It's meant to be a suggestion for page titles, not a a list of full-text results (that may contain nothing in common between their title and what you typed). Once you complete the search (if you don't end on a direct title match), you'll get the full-text results.
If the mobile app is presenting full-text results as suggestions I'd say that's the wrong way to go. I'll also note our behavior is consistent with how Google works as well.
-Chad
-- Dan Garry Associate Product Manager for Platform Wikimedia Foundation
On Tue, Apr 1, 2014 at 7:19 PM, Adam Baso abaso@wikimedia.org wrote:
+mobile-l
mobile-l recipients, if replying, if you would please reply-all in case any people on the CC: line aren't on mobile-l, it would be appreciated.
Nik,
Thanks for the update. Glad to hear there's even faster performance coming and also that there's no need to structure too much fallback stuff depending on whether the reflex time is okay. With any luck, it would be just fast enough. I don't think there'd be too much hammering on the suggest term; only if the resultset is insufficient does it seem like it would make sense to orchestrate the client side (or server side, for that matter) call. The apps do have a key tap timer thing on them to help avoid spurious searching, so that should help. I think I understand the ellipse related stuff - parsing the snippet text is no problem, but if there's an even simpler way to get text condensed to the point where there's no work to avoid wrapping on most form factors, cool!...and if I misunderstood, well, we'll get to the bottom of that on Friday.
I suppose this is close to my heart because I just worked on it, but if you chop the snippet on the client side it defeats the logic used to pick the "best snippet". That logic isn't that great in Cirrus now but it'll get a whole lot better when we deploy the new highlighter. Right now the snippet is always 150 characters or something +/- 20ish characters on each side to find a word break. We pick the best snippet based on hits in the 150 character window. At minimum we should let you configure it to something that'll fit better. I suppose the best option would be to configure up font widths that matter and then use them to chop really accurately. For the most part that sounds pretty simple and quick to implement so long as we're ok with estimates that ignore stuff like ligatures.
Do the type timers fire the hit on the leading character or on a slight hesitation? Prefix search on the site is leading character and then it <cancels> the request if the user types more. That is silly because I can't cancel the request on the backend.... If it triggers on a hesitation then we should just plow ahead, I think. If it triggers on leading characters then we should totally cache requests shorter then N character. 3 or 5 or something.
From the code:
// If we were constraining the namespace set, we would probably use // 0|1|2|3|4|5|6|7|12|13|14|15|100|101|108|109|446|447 // to keep it related more to article, article talk, policy, // policy talk, help, and help talk types of resources. // The odd numbered Talk pages could even be witheld, but that's // sort of pointless when the number of backlinks to them is // likely to be small, meaning they won't turn up too much // unless they're a result(set) of last resort, or the user // went to the trouble of prefix namespace searching such as Talk:Cats. // But realistically, it's probably easier to just stick to // not defining a namespace constrainint set, and thereby (likely) // getting more pre-cached responses, due to other consumers leading // or follwing suit. There's a school of thought, or there could be, // that says only namespace 0 should be searched here, as it's // the core article content. But users may practically want // categories, too. And such logic spirals out from there. // If we were instead using the opensearch API and were seeking // parity with the desktop and mobile web experience, we should // indeed as of 27-March-2014 only be searching namespace 0. // But as CirrusSearch will be the norm and server load is expected // to handle things just fine (no fallback is necessary per Search team), // higher quality search results now can be obtained anyway.
Cirrus searches all wgContentNamespaces by default and it is optimized to do so. All non-content namespaces are in another index so we don't have to pay attention to it during the request. We also don't have to filter by namespace at all.
Each namespace has a weight factor that influences its position. That factor often ends up being more important then links. Links are "score * log(incoming_links + 2)" and the weights vary from "score * 1" (MAIN) to "score * 0.0025" (TEMPLATE_TALK). Our power users expect these because lsearchd did it. Mobile users, who knows.
// With all of this considered, we want a request of the following format // // en.m.wikipedia.org/w/api.php?action=query&list=search&srsearch=cats&srprop=snippet|sectiontitle&srlimit=15&srbackend=CirrusSearch&format=json // Note that MobileFrontend's use of opensearch has its result // set limited at 15. Note also that the 'srprop' only keeps 'snippet', // and 'sectiontitle' plus the 'title' field which is always implicit. // This buys us some additional features once we're ready for them, // all the while populating the cache. // We probably also will want to add 'srinterwiki=1' in some future // state so that users don't have to change their language to // search setting. As it is, 'srinterwiki' is not yet in place // and the format of such results may look a bit different, // so it's probably best to hold off on 'srinterwiki=1'. We are // not yet using the snippets and section titles, but let's get the // cache populated for our sake and everyone else's sake.
Interwiki is coming but I'd give it a few months, I think.
// NOTE: // Although as of 27-March-2014 it seems that suggestions may not be coming // back for CirrusSearch as frequently as for Lucene, that's probably // just an artifact of relatively lower training of suggestions. // In other words, it's likely that the suggestion pairing will grow. // Currently, we're not examining [@"query"][@"searchinfo"][@"suggestion], // but we could. There are two cases for the suggestion. // 1. When the result set is of length 0, just fire off a search with the suggestion. // This is the case where the user probably mis-spelled something. // 2. When the result set is of short length (less than 5?), fire another search with // the suggestion, and then collate those search results /after/ the first result set.
The suggestion is actually better then you give it credit for: even if it lots of results show up if we provide a suggestion it might useful. It comes from redirect and title names and it'll suggest combinations that work. So if the user searches for "picket's charge" it'll suggest "pickett's charge" even though there are plenty of results for the first term. The results for the second term are better.
The reason you get different results is because the implementations are vastly different. The Cirrus implementation has less tuning but is "more modern". Whatever that is worth.
Nik
Thanks for the response. Mine inline below.
On Wed, Apr 2, 2014 at 8:11 AM, Nikolas Everett neverett@wikimedia.orgwrote:
On Tue, Apr 1, 2014 at 7:19 PM, Adam Baso abaso@wikimedia.org wrote:
+mobile-l
mobile-l recipients, if replying, if you would please reply-all in case any people on the CC: line aren't on mobile-l, it would be appreciated.
Nik,
Thanks for the update. Glad to hear there's even faster performance coming and also that there's no need to structure too much fallback stuff depending on whether the reflex time is okay. With any luck, it would be just fast enough. I don't think there'd be too much hammering on the suggest term; only if the resultset is insufficient does it seem like it would make sense to orchestrate the client side (or server side, for that matter) call. The apps do have a key tap timer thing on them to help avoid spurious searching, so that should help. I think I understand the ellipse related stuff - parsing the snippet text is no problem, but if there's an even simpler way to get text condensed to the point where there's no work to avoid wrapping on most form factors, cool!...and if I misunderstood, well, we'll get to the bottom of that on Friday.
I suppose this is close to my heart because I just worked on it, but if you chop the snippet on the client side it defeats the logic used to pick the "best snippet". That logic isn't that great in Cirrus now but it'll get a whole lot better when we deploy the new highlighter. Right now the snippet is always 150 characters or something +/- 20ish characters on each side to find a word break. We pick the best snippet based on hits in the 150 character window. At minimum we should let you configure it to something that'll fit better. I suppose the best option would be to configure up font widths that matter and then use them to chop really accurately. For the most part that sounds pretty simple and quick to implement so long as we're ok with estimates that ignore stuff like ligatures.
Yeah, I think the snippet data is sweet. Two options - short (50 characters wide) and normal (150 characters) seems sufficient. That way there are fewer cached objects using the 3 or 5 char or n-gram thing discussed later.
Given non-monospaced rendered ligatures (that is, as far as I can tell) I think your hint to ignore ligature byte width is a pretty pragmatic approach.
Do the type timers fire the hit on the leading character or on a slight hesitation? Prefix search on the site is leading character and then it <cancels> the request if the user types more. That is silly because I can't cancel the request on the backend.... If it triggers on a hesitation then we should just plow ahead, I think. If it triggers on leading characters then we should totally cache requests shorter then N character. 3 or 5 or something.
Oops, I may have spoke too soon.
On Android, after keypress it fires if 300ms have elapsed and another keypress hasn't occurred.
On iOS, it looks like it is actually firing immediately on each keypress (will double check that, though). The iOS client-side code does try to cancel fired search events such that the latest search string is used; that may or may not mean requests go unfired at the server. But, as you say, once a proper HTTP request has been received at the origin, server-side processing is likely to occur. From this I think we should probably look into making the iOS stuff have the similar 300ms behavior if I've not misread the code.
Caching on an upper bound of 3 or 5 characters (or some upperbounded n-gram value?) could be a very solid way to make search screaming fast for the common cases for these list=search searches, particularly with CirrusSearch. I really like that idea.
On a related matter, I'll try to remember to bring up title versus non-title statistical bias in the search ranking. Speaking of such algorithms...
From the code: // If we were constraining the namespace set, we would probably use // 0|1|2|3|4|5|6|7|12|13|14|15|100|101|108|109|446|447 // to keep it related more to article, article talk, policy, // policy talk, help, and help talk types of resources. // The odd numbered Talk pages could even be witheld, but that's // sort of pointless when the number of backlinks to them is // likely to be small, meaning they won't turn up too much // unless they're a result(set) of last resort, or the user // went to the trouble of prefix namespace searching such as Talk:Cats. // But realistically, it's probably easier to just stick to // not defining a namespace constrainint set, and thereby (likely) // getting more pre-cached responses, due to other consumers leading // or follwing suit. There's a school of thought, or there could be, // that says only namespace 0 should be searched here, as it's // the core article content. But users may practically want // categories, too. And such logic spirals out from there. // If we were instead using the opensearch API and were seeking // parity with the desktop and mobile web experience, we should // indeed as of 27-March-2014 only be searching namespace 0. // But as CirrusSearch will be the norm and server load is expected // to handle things just fine (no fallback is necessary per Search team), // higher quality search results now can be obtained anyway.
Cirrus searches all wgContentNamespaces by default and it is optimized to do so. All non-content namespaces are in another index so we don't have to pay attention to it during the request. We also don't have to filter by namespace at all.
Each namespace has a weight factor that influences its position. That factor often ends up being more important then links. Links are "score * log(incoming_links + 2)" and the weights vary from "score * 1" (MAIN) to "score * 0.0025" (TEMPLATE_TALK). Our power users expect these because lsearchd did it. Mobile users, who knows.
Cool. Understood. If the main namespace is biased higher, I think mobile users will usually be pretty happy. And for power users on mobile I think they'll be pleasantly surprised that they can Namespace: search.
// With all of this considered, we want a request of the following
format // // en.m.wikipedia.org/w/api.php?action=query&list=search&srsearch=cats&srprop=snippet|sectiontitle&srlimit=15&srbackend=CirrusSearch&format=jsonhttp://en.m.wikipedia.org/w/api.php?action=query&list=search&srsearch=cats&srprop=snippet%7Csectiontitle&srlimit=15&srbackend=CirrusSearch&format=json // Note that MobileFrontend's use of opensearch has its result // set limited at 15. Note also that the 'srprop' only keeps 'snippet', // and 'sectiontitle' plus the 'title' field which is always implicit. // This buys us some additional features once we're ready for them, // all the while populating the cache. // We probably also will want to add 'srinterwiki=1' in some future // state so that users don't have to change their language to // search setting. As it is, 'srinterwiki' is not yet in place // and the format of such results may look a bit different, // so it's probably best to hold off on 'srinterwiki=1'. We are // not yet using the snippets and section titles, but let's get the // cache populated for our sake and everyone else's sake.
Interwiki is coming but I'd give it a few months, I think.
Cool. Something to look at later, then, in terms of whether it's on by default, how results are biased based on current-wiki primary language, availability of articles on other wikis, charset, etc.
// NOTE: // Although as of 27-March-2014 it seems that suggestions may not
be coming // back for CirrusSearch as frequently as for Lucene, that's probably // just an artifact of relatively lower training of suggestions. // In other words, it's likely that the suggestion pairing will grow. // Currently, we're not examining [@"query"][@"searchinfo"][@"suggestion], // but we could. There are two cases for the suggestion. // 1. When the result set is of length 0, just fire off a search with the suggestion. // This is the case where the user probably mis-spelled something. // 2. When the result set is of short length (less than 5?), fire another search with // the suggestion, and then collate those search results /after/ the first result set.
The suggestion is actually better then you give it credit for: even if it lots of results show up if we provide a suggestion it might useful. It comes from redirect and title names and it'll suggest combinations that work. So if the user searches for "picket's charge" it'll suggest "pickett's charge" even though there are plenty of results for the first term. The results for the second term are better.
Okay, let's discuss on Friday!
The reason you get different results is because the implementations are vastly different. The Cirrus implementation has less tuning but is "more modern". Whatever that is worth.
Magic!
Nik
Thanks.
Here are summarized meeting notes:
* No-go on default fuzzy searching for mobile apps so as to not hammer server - prefix search (title-starts-with) to be used; if approaching default fuzzy search as technology is refined, add 300ms delay to iOS like for Android, though * Possibly try this on beta mobile web, alpha mobile web, or a targeted language Wikipedia on the mobile web (maybe a larger beta mobile web language Wikipedia) to see how performance would go * Search team to examine returning fewer fields in each search result record by default when srprop mask not specified (e.g., don't return snippets unless they're requested)
-Adam
On Wed, Apr 2, 2014 at 8:11 AM, Nikolas Everett neverett@wikimedia.orgwrote:
On Tue, Apr 1, 2014 at 7:19 PM, Adam Baso abaso@wikimedia.org wrote:
+mobile-l
mobile-l recipients, if replying, if you would please reply-all in case any people on the CC: line aren't on mobile-l, it would be appreciated.
Nik,
Thanks for the update. Glad to hear there's even faster performance coming and also that there's no need to structure too much fallback stuff depending on whether the reflex time is okay. With any luck, it would be just fast enough. I don't think there'd be too much hammering on the suggest term; only if the resultset is insufficient does it seem like it would make sense to orchestrate the client side (or server side, for that matter) call. The apps do have a key tap timer thing on them to help avoid spurious searching, so that should help. I think I understand the ellipse related stuff - parsing the snippet text is no problem, but if there's an even simpler way to get text condensed to the point where there's no work to avoid wrapping on most form factors, cool!...and if I misunderstood, well, we'll get to the bottom of that on Friday.
I suppose this is close to my heart because I just worked on it, but if you chop the snippet on the client side it defeats the logic used to pick the "best snippet". That logic isn't that great in Cirrus now but it'll get a whole lot better when we deploy the new highlighter. Right now the snippet is always 150 characters or something +/- 20ish characters on each side to find a word break. We pick the best snippet based on hits in the 150 character window. At minimum we should let you configure it to something that'll fit better. I suppose the best option would be to configure up font widths that matter and then use them to chop really accurately. For the most part that sounds pretty simple and quick to implement so long as we're ok with estimates that ignore stuff like ligatures.
Do the type timers fire the hit on the leading character or on a slight hesitation? Prefix search on the site is leading character and then it <cancels> the request if the user types more. That is silly because I can't cancel the request on the backend.... If it triggers on a hesitation then we should just plow ahead, I think. If it triggers on leading characters then we should totally cache requests shorter then N character. 3 or 5 or something.
From the code: // If we were constraining the namespace set, we would probably use // 0|1|2|3|4|5|6|7|12|13|14|15|100|101|108|109|446|447 // to keep it related more to article, article talk, policy, // policy talk, help, and help talk types of resources. // The odd numbered Talk pages could even be witheld, but that's // sort of pointless when the number of backlinks to them is // likely to be small, meaning they won't turn up too much // unless they're a result(set) of last resort, or the user // went to the trouble of prefix namespace searching such as Talk:Cats. // But realistically, it's probably easier to just stick to // not defining a namespace constrainint set, and thereby (likely) // getting more pre-cached responses, due to other consumers leading // or follwing suit. There's a school of thought, or there could be, // that says only namespace 0 should be searched here, as it's // the core article content. But users may practically want // categories, too. And such logic spirals out from there. // If we were instead using the opensearch API and were seeking // parity with the desktop and mobile web experience, we should // indeed as of 27-March-2014 only be searching namespace 0. // But as CirrusSearch will be the norm and server load is expected // to handle things just fine (no fallback is necessary per Search team), // higher quality search results now can be obtained anyway.
Cirrus searches all wgContentNamespaces by default and it is optimized to do so. All non-content namespaces are in another index so we don't have to pay attention to it during the request. We also don't have to filter by namespace at all.
Each namespace has a weight factor that influences its position. That factor often ends up being more important then links. Links are "score * log(incoming_links + 2)" and the weights vary from "score * 1" (MAIN) to "score * 0.0025" (TEMPLATE_TALK). Our power users expect these because lsearchd did it. Mobile users, who knows.
// With all of this considered, we want a request of the following
format // // en.m.wikipedia.org/w/api.php?action=query&list=search&srsearch=cats&srprop=snippet|sectiontitle&srlimit=15&srbackend=CirrusSearch&format=jsonhttp://en.m.wikipedia.org/w/api.php?action=query&list=search&srsearch=cats&srprop=snippet%7Csectiontitle&srlimit=15&srbackend=CirrusSearch&format=json // Note that MobileFrontend's use of opensearch has its result // set limited at 15. Note also that the 'srprop' only keeps 'snippet', // and 'sectiontitle' plus the 'title' field which is always implicit. // This buys us some additional features once we're ready for them, // all the while populating the cache. // We probably also will want to add 'srinterwiki=1' in some future // state so that users don't have to change their language to // search setting. As it is, 'srinterwiki' is not yet in place // and the format of such results may look a bit different, // so it's probably best to hold off on 'srinterwiki=1'. We are // not yet using the snippets and section titles, but let's get the // cache populated for our sake and everyone else's sake.
Interwiki is coming but I'd give it a few months, I think.
// NOTE: // Although as of 27-March-2014 it seems that suggestions may not
be coming // back for CirrusSearch as frequently as for Lucene, that's probably // just an artifact of relatively lower training of suggestions. // In other words, it's likely that the suggestion pairing will grow. // Currently, we're not examining [@"query"][@"searchinfo"][@"suggestion], // but we could. There are two cases for the suggestion. // 1. When the result set is of length 0, just fire off a search with the suggestion. // This is the case where the user probably mis-spelled something. // 2. When the result set is of short length (less than 5?), fire another search with // the suggestion, and then collate those search results /after/ the first result set.
The suggestion is actually better then you give it credit for: even if it lots of results show up if we provide a suggestion it might useful. It comes from redirect and title names and it'll suggest combinations that work. So if the user searches for "picket's charge" it'll suggest "pickett's charge" even though there are plenty of results for the first term. The results for the second term are better.
The reason you get different results is because the implementations are vastly different. The Cirrus implementation has less tuning but is "more modern". Whatever that is worth.
Nik