On 21 July 2010 15:28, Aryeh Gregor <Simetrical+wikilist(a)gmail.com> wrote:
On Wed, Jul 21, 2010 at 6:18 PM, Daniel Kinzler
<daniel(a)brightbyte.de> wrote:
While we definitly need efficient retrieval by
namespace, the default sort key
should *not* include the namespace prefix. it's very annoying that all files get
sorted under "F" currently, or that pages from the Wikipedia namespace all end
up under "W".
That's totally orthogonal and is like a one-line change. Probably you
just have to change getPrefixedDBkey() to getDBkey() somewhere.
On Wed, Jul 21, 2010 at 6:22 PM, Conrad Irwin <conrad.irwin(a)gmail.com> wrote:
Because there are multiple languages on each page
- so you need lots
of different sort keys.
Could you point me to an example of some pages and categories where
this is an issue? I'm not clear on how categories/pages/sort keys are
being used here.
I don't have an example to hand (as the page is not yet complete on Wiktionary)
The Hungarian letter "cs" sorts after "c", so while in English
"cs"
(for centi-seconds) should come before "CV", in Hungarian the entry
for the letter (which is missing) should come afterwards. Both English
and Hungarian would be on the same Wiktionary page.
If it's sorted as starting with "a"
it should appear under "a". The
alternative would be to have different explicit sorting for the
sections in the category than for the words in the section, which I
think is unnecessary.
So if we have three pages "Áa", "Ab", "Ác" and they're
sorted in the
category in that order, should they be in one section? I don't see
how you'd put them in two or three sections. If they're in one
section, what letter do you use for it, "Á" or "A"? We can figure
out
"A" is correct here, but how do you do that in general automatically?
Some languages treat accented letters as the same primary letter, and
use it only in the secondary or tertiary sort key (Which the current
category table's keys of 80 bytes are in danger of truncating), others
have variations on a theme, again Hungarian makes a good example, ö
and ő are the one letter with two stresses, o and ó is a different
letter. It should be automatically possible to extract the first
letter from the words to be sorted (I don't know if ICU covers that,
if not, just ask some people who speak the language, or Wikipedia) -
but it's not possible to get that information from the sort keys
directly, so either we store the user provided sort key, and our
derived sort key, so we can use the former to find the first letter at
render time, or we just store the first letter.
Conrad
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l