Re: [Wikitech-l] additional pipe trick suggestion

19 May 2003


      Andre Engels wrote:
...
...
Why not create [[Mexican]] as a redirect page?
That would solve things for en:, but not for some other languages. For example,
in Latin all of the following could be forms to redirect at [[domus]]:
[[domi]], [[domum]], [[domo]], [[domos]], [[domorum]], [[domis]] (and perhaps
also [[dome]]). And similar for almost every word. Then a trick like the
above would be much welcome, it seems.
Swedish, Danish, and Norwegian are about as complicated as Latin in
this respect (Swedish words have 4.5 forms on *average*), while German
is a little lighter (perhaps 3 forms per word), and Finnish is a lot
heavier (I'd guess 10 forms per word on average).
Still, the free spell checker "ispell" is able to "stem" most forms
down to the correct basic word, except for the relatively few
ambiguous cases. The existing ispell dictionaries for various
languages already contain the information necessary for this.
Using stemming (based on ispell) could be useful both as an automatic
URL redirection, and during searches.  Commercial "text retrieval"
databases do this, but I haven't heard of any open websites or web
search engines or free software that use this technology.
For searching, you could either stem each word before you index the
text corpus, or you could "unstem" each search expression, so that a
search for "domus" actually searches for "domus OR domi OR domum OR
domo OR domos OR ...".
The next step would be to "stem" synonyms into "concepts", so that a
search for "car" returns hits on "automobile", as well.  At some point
of generalization, you just get too many hits.  So maybe perfect hits
should be prioritized over stemming hits, which in turn get prio over
synonym hits.  Just like title hits get prio over fulltext hits today.
This is a new direction that I have thought about, but never got
around to implement.  Does anybody have any experience to share?
-- 
  Lars Aronsson (lars@aronsson.se)
  Aronsson Datateknik - http://aronsson.se/

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] additional pipe trick suggestion