On 11-05-13 08:48 AM, Aryeh Gregor wrote:
On Fri, May 13, 2011 at 3:31 AM, M. Williamson node.ue@gmail.com wrote:
I still don't think page titles should be case sensitive. Last time I asked how useful this really was, back in 2005 or so, I got a tersely-worded response that we need it to disambiguate certain pages. OK, but how many cases does that actually apply to? I would think that the increased usability from removing case sensitivity would far outweigh the benefit of natural disambiguation that only applies to a tiny minority of pages, and which could easily be replaced with disambiguation pages.
From a software perspective, the way to do this would be to store a canonicalized version of each page's title, and require that to be unique instead of the title itself. This would be nice because we could allow underscores in page titles, for instance, in addition to being able to do case-folding.
Note that Unicode capitalization is locale-dependent, but case-folding is not. Thus we could use the same case-folding on all projects, including international projects like Commons. There's only one exception -- Turkish, with its dotless and dotted i's. But that's minor enough that we should be able to work around it without too much pain.
Some projects, like probably all Wiktionaries, would doubtless not want case-folding at all, so we should support different canonicalization algorithms. Even the ones that don't want case-folding could still benefit from allowing underscores in titles.
But all this would require a very intrusive rewrite. Assumptions like "replace spaces by underscores to get dbkey" are hardwired into MediaWiki all over the place, unfortunately. It's not clear that it's worth it, since there are downsides to case-folding too. It might make more sense to auto-generate redirects instead, which would be a much easier project that wouldn't have the downsides.
Fortunately I think most of the space/underscore switching done by code is actually isolated to a subset of Title and perhaps a few other core classes (probably ones like User and the filerepo stuff), most code should be using the title interface. When I tried that first rewrite I had more of an issue with the wide use of $user->getName() to test if two user objects were the same.