On Fri, May 13, 2011 at 3:31 AM, M. Williamson
<node.ue(a)gmail.com> wrote:
I still don't think page titles should be
case sensitive. Last time I asked
how useful this really was, back in 2005 or so, I got a tersely-worded
response that we need it to disambiguate certain pages. OK, but how many
cases does that actually apply to? I would think that the increased
usability from removing case sensitivity would far outweigh the benefit of
natural disambiguation that only applies to a tiny minority of pages, and
which could easily be replaced with disambiguation pages.
From a software
perspective, the way to do this would be to store a
canonicalized version of each page's title, and require that to be
unique instead of the title itself. This would be nice because we
could allow underscores in page titles, for instance, in addition to
being able to do case-folding.
Note that Unicode capitalization is locale-dependent, but case-folding
is not. Thus we could use the same case-folding on all projects,
including international projects like Commons. There's only one
exception -- Turkish, with its dotless and dotted i's. But that's
minor enough that we should be able to work around it without too much
pain.
Some projects, like probably all Wiktionaries, would doubtless not
want case-folding at all, so we should support different
canonicalization algorithms. Even the ones that don't want
case-folding could still benefit from allowing underscores in titles.
But all this would require a very intrusive rewrite. Assumptions like
"replace spaces by underscores to get dbkey" are hardwired into
MediaWiki all over the place, unfortunately. It's not clear that it's
worth it, since there are downsides to case-folding too. It might
make more sense to auto-generate redirects instead, which would be a
much easier project that wouldn't have the downsides.
Fortunately I think most
of the space/underscore switching done by code
is actually isolated to a subset of Title and perhaps a few other core
classes (probably ones like User and the filerepo stuff), most code
should be using the title interface.
When I tried that first rewrite I had more of an issue with the wide use
of $user->getName() to test if two user objects were the same.
--
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [