Hi all!
On 17 March 2016 the Portal team deployed a patch to Wikipedia Portal to log all user preferred languages. From the analysis of this additional data, we found that approximately 70% of Wikipedia Portal visitors only have English as their preferred language, and approx. 18% set a language other than English. Around 12% of our users are multilingual (according to their browser preferences), but many of those included English.
Users whose primary language is English or who included English as a preferred language clicked through and searched at a remarkably higher rate on a daily basis than users whose primary language is another language (60% vs 50% and 50% vs 30%), but the latter group used the language links at a much higher rate than the former (10% vs 20%).
In fact, English-speaking visitors' overall clickthrough rate was 12.5%-14.4% higher than non-English-speaking visitors', and were 1.3 times more likely to click through, indicating, perhaps, that increased localization efforts could better engage our non-English-speaking visitors. Based on the data and patterns observed, we strongly support and encourage the language detection and localization efforts the Portal team has begun pursuing.
These and other findings can be found in the report on Commons https://commons.wikimedia.org/wiki/File:Analysis_of_Clickthrough_Rates_and_User_Preferred_Languages_on_Wikipedia_Portal.pdf. The graphs are especially cool and informative, even if I do say so myself.
Thanks, Mikhail on behalf of Discovery Analytics
How do you interpret the 100 % figures at page 7 and things like the 17 % of "Vietnamese" users going to it.wiki per page 6? Such results IMHO mostly show that Accept-Language is a very poor indicator of the languages really understood by the user, as we've known for a long time.
The graph at p. 4 is more promising because it could tell us something about the relative advantages of search vs. manual selection that the users see depending on their conditions. We could discover more with breakdowns other than en/non-en accept-language. The main take away seems to be that for one population the search is 5 times more popular than the links, while for the other they are very similar... but why??
Nemo
For that example: of the visitors who clicked on one of the primary links, 17% of the users whose Accept-Language included Vietnamese as the first language went to the Italian Wikipedia. Not sure why! And while Accept-Language is not a perfect indicator, it is kind of our best bet for language detection and localization.
As for search being more popular than the links for one population vs the other: US accounts for approx. 40% of the total traffic to wikipedia.org with UK coming in 2nd at about 8% and the remainder accounted by the 200+ other countries (http://discovery.wmflabs.org/portal/#country_breakdown, https://commons.wikimedia.org/w/index.php?title=File:Analysis_of_Wikipedia_P...). Given that 90% of all traffic to Wikipedia is direct traffic ( http://discovery.wmflabs.org/portal/#referrals_summary), I suppose English-speaking US & UK visitors (the biggest group of visitors to the Portal) have the page bookmarked, go to it, and search because they're not especially interested in visiting Wikipedia in other languages. On the other hand, users of other languages might find it preferable to bookmark the main Wikipedia page of their respective language rather than go to wikipedia.org or are just used to going to wikipedia.org and then clicking on their language, even though the search box detects the user's language.
There's only so much we can do with the data we have. Thanks for the questions! I hope I've answered them, or provided any additional insights.
Cheers, Mikhail
On Fri, Mar 25, 2016 at 3:08 PM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
How do you interpret the 100 % figures at page 7 and things like the 17 % of "Vietnamese" users going to it.wiki per page 6? Such results IMHO mostly show that Accept-Language is a very poor indicator of the languages really understood by the user, as we've known for a long time.
The graph at p. 4 is more promising because it could tell us something about the relative advantages of search vs. manual selection that the users see depending on their conditions. We could discover more with breakdowns other than en/non-en accept-language. The main take away seems to be that for one population the search is 5 times more popular than the links, while for the other they are very similar... but why??
Nemo
discovery mailing list discovery@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/discovery