On 14 May 2011 04:33, Andrew Dunbar <hippytrail(a)gmail.com> wrote:
On 14 May 2011 01:48, Aryeh Gregor
<Simetrical+wikilist(a)gmail.com> wrote:
> On Fri, May 13, 2011 at 3:31 AM, M. Williamson <node.ue(a)gmail.com> wrote:
>> I still don't think page titles should be
case sensitive. Last time I asked
>> how useful this really was, back in 2005 or so, I got a tersely-worded
>> response that we need it to disambiguate certain pages. OK, but how many
>> cases does that actually apply to? I would think that the increased
>> usability from removing case sensitivity would far outweigh the benefit of
>> natural disambiguation that only applies to a tiny minority of pages, and
>> which could easily be replaced with disambiguation pages.
> From a software perspective, the way to do this
would be to store a
> canonicalized version of each page's title, and require that to be
> unique instead of the title itself. This would be nice because we
> could allow underscores in page titles, for instance, in addition to
> being able to do case-folding.
> Note that Unicode capitalization is locale-dependent, but case-folding
> is not. Thus we could use the same case-folding on all projects,
> including international projects like Commons. There's only one
> exception -- Turkish, with its dotless and dotted i's. But that's
> minor enough that we should be able to work around it without too much
> pain.
I'm almost positive Azeri has the same dotless i
issue and perhaps
some of the other Turkic languages of Central Asia. One solution is to
do accent/diacritic normalization too as part of the canonicalization.
This is getting into "nirvana fallacy" territory - we can't have
case-folding until every edge case works?
Instead, I would ask first: What does it take in English? Then work
out from there.
- d.