It seems we will have a number of different options to try, I wonder if its better to have independent rules or tie them all together into a more generic rule.
For example:
Underscore stripping
Converting + into space (or just urldecoding)
Quote stripping (the bad `quot` ones, but also things that are legitimately quoted but the quoted query has no results)
Timestamp stripping?
A highly generic rule that would probably get more (but worse) results:
Either remove or convert into a space everything thats not alphadecimal
Maybe even join the words with 'OR' instead of 'AND' if there are enough tokens
If we go the route of attempting to rewrite the query into something more plausible, is that something we would be building into elasticsearch, or cirrussearch? I could come up for plausible reasons for it being on either side but am leaning towards some sort of custom suggester implementation that does our own thing (although that may be due to a lack of knowing the internal api limitations there).