[Mediawiki-l] Character equivalence

Kent S. Larsen II kent at lusobraz.com
Sun Oct 16 01:06:55 UTC 2005


You are exactly right, Kyle.

As the owner of related Portuguese-language wikis (see
http://www.litinportuguese.com, http://www.poetsofbrazil, etc), I know this
is an important help for languages that use accented characters. Having
search engines that can ignore accents would be very useful.

[But not always -- in Portuguese the words está ("is" or "he is", 3rd
person singular of verb to be) and esta (this) are two different words --
while I agree a search engine should probably ignore the accents, in some
cases it will lead to unwanted results]

Let me know how I can be of assistance.

Kent

At 3:03 PM -0500 10/15/05, you wrote:
>Hello, all fellow Wikimedians,
>
>As the mediawiki software is set up to this day, "Pokémon" has nothing to do
>with "Pokemon", "Pinata" has nothing to do with "Piñata", and "deja vu" has
>nothing to do with "déjà vu". The only reason these can be found by their
>common mispellings in the English Wikipedia using the search function is
>because our hard-working English-speaking friends have considered all of the
>common spelling errors and have created redirects from "Pokemon", for
>example, to "Pokémon". Of course this is easier in English than in French
>because the only accented words in English are stolen from other languages.
>In French, however (as well as Spanish, Galician, Portuguese, Catalan, and
>plenty of others), accented words are common and if I use Wikipedia's search
>engine to look for [[fr:Société Générale]] (
>http://fr.wikipedia.org/wiki/Soci%C3%A9t%C3%A9_g%C3%A9n%C3%A9rale) with
>"Societe Generale", or let's say I only forgot about one of the accents, and
>put "Societé Générale" in the search box, I get nothing I was looking for,
>only pages that are slightly relevant because they made the same spelling
>error as I did or they contain one of those words, with a completely
>different meaning, surely (
>http://fr.wikipedia.org/wiki/Special:Search?search=Societe+Generale&fulltext=Rechercher).
>However, if I use google for the same task (
>http://www.google.com/search?num=50&hl=en&lr=&domains=fr.wikipedia.org&q=Societe+Generale&btnG=Search&sitesearch=fr.wikipedia.org),
>the page I'm looking for is the very first one, nifty, eh?
>
>This is because google, understandably, is a bit smarter than mediawiki,
>Google has a list of equivalent (or at least similar) characters so that
>anytime the gringo comes out in me and I decide I want to beat up a
>"pinata", google understands and gives me a "piñata" (thanks, google). The
>idea is to implement the same or a similar mechanism in Wikimedia wikis so
>that, back to our old example, "societe generale", "sócíété générálé",
>"societë gënëräl" etc etc, all return "société générale", instead of
>nothing, as they do now. It is true, there may be some false results
>generated, but as it is, the search is rather exclusionist.
>
>The proposal is not to create any automatic redirects ([[Societe Generale]]
>redirected to [[Société Générale]], for example), but would only affect the
>search function to avoid the possibility that "Societe Generale" is the name
>of my sister's mom's dad's uncle's band...or whatever,
>
>Thanks for your time.
>_______________________________________________
>MediaWiki-l mailing list
>MediaWiki-l at Wikimedia.org
>http://mail.wikipedia.org/mailman/listinfo/mediawiki-l




More information about the MediaWiki-l mailing list