On 14 May 2011 01:48, Aryeh Gregor <Simetrical+wikilist(a)gmail.com> wrote:
On Fri, May 13, 2011 at 3:31 AM, M. Williamson
<node.ue(a)gmail.com> wrote:
I still don't think page titles should be
case sensitive. Last time I asked
how useful this really was, back in 2005 or so, I got a tersely-worded
response that we need it to disambiguate certain pages. OK, but how many
cases does that actually apply to? I would think that the increased
usability from removing case sensitivity would far outweigh the benefit of
natural disambiguation that only applies to a tiny minority of pages, and
which could easily be replaced with disambiguation pages.
From a software perspective, the way to do this would be to store a
canonicalized version of each page's title, and require that to be
unique instead of the title itself. This would be nice because we
could allow underscores in page titles, for instance, in addition to
being able to do case-folding.
Note that Unicode capitalization is locale-dependent, but case-folding
is not. Thus we could use the same case-folding on all projects,
including international projects like Commons. There's only one
exception -- Turkish, with its dotless and dotted i's. But that's
minor enough that we should be able to work around it without too much
pain.
I'm almost positive Azeri has the same dotless i issue and perhaps
some of the other Turkic languages of Central Asia. One solution is to
do accent/diacritic normalization too as part of the canonicalization.
Andrew Dunbar (hippietrail)
Some projects, like probably all Wiktionaries, would
doubtless not
want case-folding at all, so we should support different
canonicalization algorithms. Even the ones that don't want
case-folding could still benefit from allowing underscores in titles.
But all this would require a very intrusive rewrite. Assumptions like
"replace spaces by underscores to get dbkey" are hardwired into
MediaWiki all over the place, unfortunately. It's not clear that it's
worth it, since there are downsides to case-folding too. It might
make more sense to auto-generate redirects instead, which would be a
much easier project that wouldn't have the downsides.
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l