Steve Bennett wrote:
I think you have to be mindful of the original goal here: for each character a user is likely to enter from their keyboard in the search box, what possible range of characters would they expect to match?
So, apostrophe (U+0027) -> curved right single quote (U+2019): yes, probably. The other way around...probably not, unless that U+2019 exists on any keyboards.
Hyphen-minus (U+002D) -> em dash (U+2014): I would say no. If you search for "clock-work", you probably don't want to match a sentence like "He was building a clock—work that is never easy—at the time." (contrived, sure)
Just saying you probably don't want the full range of "lookalikes" - the left side of each mapping should be a keyboard character, and the right side should be semantically equivalent, unless commonly used incorrectly.
Good point! Likewise, two hyphen-minus, "--", _could_ be considered to match the em dash.
Tim