Re: [Wikitech-l] MediaWiki aliases feature proposal

26 Oct 2007

I haven't read the entire thread but it seems the extension I was
working on for Wiktionary would be relevant here:
http://www.mediawiki.org/wiki/Extension:DidYouMean

It normalizes article names and traps page creations, deletions, and
moves to maintain a database table of normalized titles.

At every page view and search request this database is queried (unless
already cached) and a list of similar titles is suggested to the user.

The English Wiktionary already has a way (using templates) to suggest
similar article titles. DidYouMean combines this hand-edited list with
its generated list and displays them in the manner expected on
Wiktionary.

I had already considered adding a subset of pattern matching to the
normalization for finding kinds of similar titles that normalization
alone wouldn't find. This would be essential for a Wikipedia solution.

Currently DidYouMean normalizes accented characters to unaccented
characters, strips Hebrew and Arabic vowels, normalizes Japanese
fullwidth and halfwidth characters to normal width, etc. It also
strips spaces, hyphens, apostrophes, periods etc.

Obviously the matching heuristics for Wikipedia would be different.
Possibly including word stemming and stoplists but possibly also
hand-coded rules in a special page.

It might even be possible to automate page disambiguation to some
degree using these methods.

Andrew Dunbar (hippietrail)

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] MediaWiki aliases feature proposal