On 14 May 2011 04:33, Andrew Dunbar hippytrail@gmail.com wrote:
On 14 May 2011 01:48, Aryeh Gregor Simetrical+wikilist@gmail.com wrote:
On Fri, May 13, 2011 at 3:31 AM, M. Williamson node.ue@gmail.com wrote:
I still don't think page titles should be case sensitive. Last time I asked how useful this really was, back in 2005 or so, I got a tersely-worded response that we need it to disambiguate certain pages. OK, but how many cases does that actually apply to? I would think that the increased usability from removing case sensitivity would far outweigh the benefit of natural disambiguation that only applies to a tiny minority of pages, and which could easily be replaced with disambiguation pages.
From a software perspective, the way to do this would be to store a canonicalized version of each page's title, and require that to be unique instead of the title itself. This would be nice because we could allow underscores in page titles, for instance, in addition to being able to do case-folding. Note that Unicode capitalization is locale-dependent, but case-folding is not. Thus we could use the same case-folding on all projects, including international projects like Commons. There's only one exception -- Turkish, with its dotless and dotted i's. But that's minor enough that we should be able to work around it without too much pain.
I'm almost positive Azeri has the same dotless i issue and perhaps some of the other Turkic languages of Central Asia. One solution is to do accent/diacritic normalization too as part of the canonicalization.
This is getting into "nirvana fallacy" territory - we can't have case-folding until every edge case works?
Instead, I would ask first: What does it take in English? Then work out from there.
- d.