Hey,
Is there any way to alter the alphabetical order used to sort lists of articles? In the OE wiki, the letter æ (a and e together) should be alphabetized after a, so that áetan, ániman, æfter show up in that order, and ð and þ are arranged after d and t, respectively. There are also accented vowels that should show up in their unaccented versions as well, so that and, ániman, and ánlíepig are all under the letter A.
James
James R. Johnson wrote:
Is there any way to alter the alphabetical order used to sort
lists of articles?
Alphabetical order is not used. In those rare cases that things are listed ordered by name on the wiki, a binary string sort is used, which is dependent on the Unicode code point order.
No, it's not pretty. :)
In category listings you can fudge the order by manually specifying a sort key in the category link, eg: [[Category:Bigcat|000Foobar]]
-- brion vibber (brion @ pobox.com)
Note that there is a bugreport about that in Bugzilla: http://bugzilla.wikimedia.org/show_bug.cgi?id=164 You can at least throw in a vote. :-)
-- [[:cs:User:Mormegil | Petr Kadlec]]
Petr Kadlec wrote:
Note that there is a bugreport about that in Bugzilla: http://bugzilla.wikimedia.org/show_bug.cgi?id=164 You can at least throw in a vote. :-)
I can sympathize with the idea, but one has to keep in mind that the sort order should vary between one language and another. In English we would alphebetize "æ" as though it were "ae" while Danish treats it as a separate letter tagged on at the end of the alphabet.
Ec
On Mon, 21 Feb 2005 09:22:17 -0800, Ray Saintonge saintonge@telus.net wrote:
I can sympathize with the idea, but one has to keep in mind that the sort order should vary between one language and another. In English we would alphebetize "æ" as though it were "ae" while Danish treats it as a separate letter tagged on at the end of the alphabet.
Of course! The sorting order would have to be determined by the Language object.
I know that well, as the Czech sorting has many rather peculiar features (e.g. in the Czech language, we have a "letter" (digraph) ch that is sorted between H and I -- specifically this would probably be a problem for MediaWiki anyway). For a bit of amusement, our national standard (ČSN 97 6030) governing proper sorting is known for its lack of precision and impossibility to be implemented because its rules are considered to be AI-complete :-). For instance, numerals in text should be sorted according to their meaning. (In practice, simplified interpretations are normally used.)
-- [[:cs:User:Mormegil | Petr Kadlec]]
On Mon, Feb 21, 2005 at 09:22:17AM -0800, Ray Saintonge wrote:
Petr Kadlec wrote:
Note that there is a bugreport about that in Bugzilla: http://bugzilla.wikimedia.org/show_bug.cgi?id=164 You can at least throw in a vote. :-)
I can sympathize with the idea, but one has to keep in mind that the sort order should vary between one language and another. In English we would alphebetize "æ" as though it were "ae" while Danish treats it as a separate letter tagged on at the end of the alphabet.
We can do a lot better than Unicode binary sort order.
The sort order should not vary between Latin-script languages, because it should be script-dependent, not language-depedent (sorted words don't have to come from the same language, but merely have to be in the same script - think proper names). It's very unfortunate that there are different traditions for sorting Latin writing system.
In most languages the right place for base letter X with diacritical mark Y is somewhere after plain letter X (and we can chose order of Ys that generates few conflicts).
In some it sorts the same way as X, in which case sorting it after X is still much better than at the end of alphabet.
The languages with such letters at the end, or with ever weirder orders are few, and we won't break any more than we currently do if we adapt "base, then base+diacritics" sorting.
"Tomasz" == Tomasz Wegrzanowski taw@users.sourceforge.net writes:
On Mon, Feb 21, 2005 at 09:22:17AM -0800, Ray Saintonge wrote:
Petr Kadlec wrote:
Note that there is a bugreport about that in Bugzilla: http://bugzilla.wikimedia.org/show_bug.cgi?id=164 You can at least throw in a vote. :-)
I can sympathize with the idea, but one has to keep in mind that the sort order should vary between one language and another. In English we would alphebetize "æ" as though it were "ae" while Danish treats it as a separate letter tagged on at the end of the alphabet.
We can do a lot better than Unicode binary sort order.
The sort order should not vary between Latin-script languages, because it should be script-dependent, not language-depedent (sorted words don't have to come from the same language, but merely have to be in the same script - think proper names). It's very unfortunate that there are different traditions for sorting Latin writing system.
In most languages the right place for base letter X with diacritical mark Y is somewhere after plain letter X (and we can chose order of Ys that generates few conflicts).
Given that danish have com up in the discussion, I will hasten to point out that the danish letter "å" is interchangeable with "aa". In /some/ cases. For Instace, the german city Aachen has a collation order in the start of any list, where as the danish city Aalborg (Ålborg) comes at the end of any collation.
In short, nothing is as simple as it seems.
In some it sorts the same way as X, in which case sorting it after X is still much better than at the end of alphabet.
The languages with such letters at the end, or with ever weirder orders are few, and we won't break any more than we currently do if we adapt "base, then base+diacritics" sorting.
Probably not, but the danes are still going to complain about those two examples, no matter what's done.
On Tue, Feb 22, 2005 at 01:52:48AM +0100, Anders Wegge Jakobsen wrote:
In most languages the right place for base letter X with diacritical mark Y is somewhere after plain letter X (and we can chose order of Ys that generates few conflicts).
Given that danish have com up in the discussion, I will hasten to point out that the danish letter "å" is interchangeable with "aa". In /some/ cases. For Instace, the german city Aachen has a collation order in the start of any list, where as the danish city Aalborg (Ålborg) comes at the end of any collation.
In short, nothing is as simple as it seems.
The languages with such letters at the end, or with ever weirder orders are few, and we won't break any more than we currently do if we adapt "base, then base+diacritics" sorting.
Probably not, but the danes are still going to complain about those two examples, no matter what's done.
We should just get it right for 80% of cases with 20% of effort. It's much better than having it broken for 100% of cases, as it is now.
Kaixo!
On Tue, Feb 22, 2005 at 01:39:32AM +0100, Tomasz Wegrzanowski wrote:
The sort order should not vary between Latin-script languages, because it should be script-dependent, not language-depedent
Then it is not sort order and it is no much better than the current binary sorting order.
And having an ordering like you propose or a correct ordering for each language the same level of complexity to implement, so why don't do it right?
So, you want to ignore the problem of different collation orders for different languages? ...or perhaps we can use the proper locale data, which will tell us how to sort for each language.
This would also give us information on dates, which are for example not quite right in some of the files, especially for underdevelopped Wikipedias, and information on numbers and times ("PM"). It would also provide us with the appropriate native name of the language to use in interwiki links (iirc).
Mark
On Tue, 22 Feb 2005 01:39:32 +0100, Tomasz Wegrzanowski taw@users.sf.net wrote:
On Mon, Feb 21, 2005 at 09:22:17AM -0800, Ray Saintonge wrote:
Petr Kadlec wrote:
Note that there is a bugreport about that in Bugzilla: http://bugzilla.wikimedia.org/show_bug.cgi?id=164 You can at least throw in a vote. :-)
I can sympathize with the idea, but one has to keep in mind that the sort order should vary between one language and another. In English we would alphebetize "æ" as though it were "ae" while Danish treats it as a separate letter tagged on at the end of the alphabet.
We can do a lot better than Unicode binary sort order.
The sort order should not vary between Latin-script languages, because it should be script-dependent, not language-depedent (sorted words don't have to come from the same language, but merely have to be in the same script - think proper names). It's very unfortunate that there are different traditions for sorting Latin writing system.
In most languages the right place for base letter X with diacritical mark Y is somewhere after plain letter X (and we can chose order of Ys that generates few conflicts).
In some it sorts the same way as X, in which case sorting it after X is still much better than at the end of alphabet.
The languages with such letters at the end, or with ever weirder orders are few, and we won't break any more than we currently do if we adapt "base, then base+diacritics" sorting. _______________________________________________ Wikipedia-l mailing list Wikipedia-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikipedia-l
On Sun, 20 Feb 2005 22:54:42 +0100, James R. Johnson modean52@comcast.net wrote:
Hey,
Is there any way to alter the alphabetical order used to sort
lists of articles? In the OE wiki, the letter æ (a and e together) should be alphabetized after a, so that "áetan, ániman, æfter" show up in that order, and ð and þ are arranged after d and t, respectively. There are also accented vowels that should show up in their unaccented versions as well, so that and, ániman, and ánlíepig are all under the letter "A".
James
Wikipedia-l mailing list Wikipedia-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikipedia-l
The zh community have a more serious problem: they want different sorting orders for Traditional and Simplified Chinese...
So I was looking at the related code, and it seems not too hard to implement specific (but fixed) sorting order within one language. However I only have limited time to work on this right now, and I don't fully understand how the category thing works.So I put up a test site with the basic implementation at http://tinyurl.com/5l24b. The test site is in English, and the sorting order is altered so that x, y, z come first, followed by a, b, c, etc.
The categorylinks tables will have to be rebuilt if this is to be deployed at the live sites. Not sure how expensive that will be.
Interested parties please visit the test site and provide comments either at the site or this list. If this seems to be a reasonable solution I will check it into cvs. Test site is running 1.4 from cvs.
A related issue is: If I type in the address bar a the URL of an article in TC, but the article is actually located at the SC title, I won't end up at the right place unless there is a redirect. Does this mean we need a redirect for most articles on zh:? Or can we have a computerised way to solve this?
And why is it that when I change from TC to SC, any time I click on a link it switches back?
Mark
On Sun, 6 Mar 2005 23:01:39 -0500, zhengzhu zhengzhu@gmail.com wrote:
On Sun, 20 Feb 2005 22:54:42 +0100, James R. Johnson modean52@comcast.net wrote:
Hey,
Is there any way to alter the alphabetical order used to sort
lists of articles? In the OE wiki, the letter æ (a and e together) should be alphabetized after a, so that "áetan, ániman, æfter" show up in that order, and ð and þ are arranged after d and t, respectively. There are also accented vowels that should show up in their unaccented versions as well, so that and, ániman, and ánlíepig are all under the letter "A".
James
Wikipedia-l mailing list Wikipedia-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikipedia-l
The zh community have a more serious problem: they want different sorting orders for Traditional and Simplified Chinese...
So I was looking at the related code, and it seems not too hard to implement specific (but fixed) sorting order within one language. However I only have limited time to work on this right now, and I don't fully understand how the category thing works.So I put up a test site with the basic implementation at http://tinyurl.com/5l24b. The test site is in English, and the sorting order is altered so that x, y, z come first, followed by a, b, c, etc.
The categorylinks tables will have to be rebuilt if this is to be deployed at the live sites. Not sure how expensive that will be.
Interested parties please visit the test site and provide comments either at the site or this list. If this seems to be a reasonable solution I will check it into cvs. Test site is running 1.4 from cvs.
-- zhengzhu _______________________________________________ Wikipedia-l mailing list Wikipedia-l@Wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikipedia-l
wikipedia-l@lists.wikimedia.org