On 29/10/12 21:55, Jane Darnell wrote:
Thanks for clearing that up! I was wondering how it was possible (and of course was worried that maybe others were seeing undesirable languages). Jane
I'm happy to do that. Things should "just work", but if they don't, it's worth digging trying to find out the reason for the alleged misbehavior.
In this case your language preferences were a bit illogical, since you prefered US English to Dutch but then any kind of English less than that. As a general rule, I don't think it makes much sense to have a completely different language with a weight which places it in the midst of the preference for different variants of one language family. There are some cases where it could made sense, if the first variant is very different than the language specified by the generic tag. But en-us and en is not such case.
I started logging accept-language pairs this morning, as well as the language on which the survey was shown to them. In most cases it is what should have been done from looking at the whole accept-language, headers. I haven't seen other "shuffled" requests, but I see a number of requests done for a language tag (en-za, fr-FR, es-ES, es-CL, es-mx...) which don't include the parent tag. So the language choice algorithm, not being able to match to any accepted language, ends up showing it in English as last resort. In most cases, the link followed from Wikimedia Commons based on the local language has fixed this problem. I will add another check of partial strings to cover those cases. But it's the User Agent task to instruct the user to make sane decisions, not the application.
From section 14.4 of rfc2616 [1]: Note: When making the choice of linguistic preference available to the user, we remind implementors of the fact that users are not familiar with the details of language matching as described above, and should provide appropriate guidance. As an example, users might assume that on selecting "en-gb", they will be served any kind of English document if British English is not available. A user agent might suggest in such a case to add "en" to get the best matching behavior.
There's some buggy browser out there regarding this. It is very unlikely that so many users misconfigured their browser in this way.
I have added some more logging code and I'll try to catch the guilty UAs.
1- http://tools.ietf.org/html/rfc2616#page-105
PS: We are at 1254 answers!