[Mediawiki-l] Character equivalence

Kyle Moore wikipediano at gmail.com
Sat Oct 15 20:03:01 UTC 2005


Hello, all fellow Wikimedians,

As the mediawiki software is set up to this day, "Pokémon" has nothing to do
with "Pokemon", "Pinata" has nothing to do with "Piñata", and "deja vu" has
nothing to do with "déjà vu". The only reason these can be found by their
common mispellings in the English Wikipedia using the search function is
because our hard-working English-speaking friends have considered all of the
common spelling errors and have created redirects from "Pokemon", for
example, to "Pokémon". Of course this is easier in English than in French
because the only accented words in English are stolen from other languages.
In French, however (as well as Spanish, Galician, Portuguese, Catalan, and
plenty of others), accented words are common and if I use Wikipedia's search
engine to look for [[fr:Société Générale]] (
http://fr.wikipedia.org/wiki/Soci%C3%A9t%C3%A9_g%C3%A9n%C3%A9rale) with
"Societe Generale", or let's say I only forgot about one of the accents, and
put "Societé Générale" in the search box, I get nothing I was looking for,
only pages that are slightly relevant because they made the same spelling
error as I did or they contain one of those words, with a completely
different meaning, surely (
http://fr.wikipedia.org/wiki/Special:Search?search=Societe+Generale&fulltext=Rechercher).
However, if I use google for the same task (
http://www.google.com/search?num=50&hl=en&lr=&domains=fr.wikipedia.org&q=Societe+Generale&btnG=Search&sitesearch=fr.wikipedia.org),
the page I'm looking for is the very first one, nifty, eh?

This is because google, understandably, is a bit smarter than mediawiki,
Google has a list of equivalent (or at least similar) characters so that
anytime the gringo comes out in me and I decide I want to beat up a
"pinata", google understands and gives me a "piñata" (thanks, google). The
idea is to implement the same or a similar mechanism in Wikimedia wikis so
that, back to our old example, "societe generale", "sócíété générálé",
"societë gënëräl" etc etc, all return "société générale", instead of
nothing, as they do now. It is true, there may be some false results
generated, but as it is, the search is rather exclusionist.

The proposal is not to create any automatic redirects ([[Societe Generale]]
redirected to [[Société Générale]], for example), but would only affect the
search function to avoid the possibility that "Societe Generale" is the name
of my sister's mom's dad's uncle's band...or whatever,

Thanks for your time.



More information about the MediaWiki-l mailing list