2009/6/23 Brion Vibber brion@wikimedia.org:
Steve Bennett wrote:
So, apostrophe (U+0027) -> curved right single quote (U+2019): yes, probably. The other way around...probably not, unless that U+2019 exists on any keyboards.
Hyphen-minus (U+002D) -> em dash (U+2014): I would say no. If you search for "clock-work", you probably don't want to match a sentence like "He was building a clock—work that is never easy—at the time." (contrived, sure)
Just saying you probably don't want the full range of "lookalikes" - the left side of each mapping should be a keyboard character, and the right side should be semantically equivalent, unless commonly used incorrectly.
Unless you cut and paste a term containing a fancy character from another window, but the page uses the plain character...
Indeed keyboards are not the only place characters come from. Word processers often upgrade apostrophes hyphens and other characters. This is the generic field of which "smart quotes" is a specific case. Also "input methods" can insert characters not directly on the keyboard. And cutting and pasting from web pages where the author tried to choose specific characters with HTML entities and such.
I have definitely seen edits on Wikipedia where people were "correcting" various kinds of hyphens and dashes. And of course while the English Wikipedia forbids curved quotes each other wiki may well have its own policy.
Andrew Dunbar (hippietrail)
-- brion
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l