Mediawiki-i18n July 2016

mediawiki-i18n@lists.wikimedia.org

2 participants
3 discussions

MediaWiki Language Extension Bundle launches

by Niklas Laxström

The Wikimedia Language Engineering team is pleased to announce the first release of the MediaWiki Language Extension Bundle. The bundle is a collection of selected MediaWiki extensions needed by any wiki which desires to be multilingual. This first bundle release (2012.11) is compatible with MediaWiki 1.19, 1.20 and 1.21alpha. Get it from https://www.mediawiki.org/wiki/MLEB The Universal Language Selector is a must have, because it provides an essential functionality for any user regardless on the number of languages he/she speaks: language selection, font support for displaying scripts badly supported by operating systems and input methods for typing languages that don't use Latin (a-z) alphabet. Maintaining multilingual content in a wiki is a mess without the Translate extension, which is used by Wikimedia, KDE and translatewiki.net, where hundreds of pieces of documentation and interface translations are updated every day; with Localisation Update your users will always have the latest translations freshly out of the oven. The Clean Changes extension keeps your recent changes page uncluttered from translation activity and other distractions. Don't miss the chance to practice your rusty language skills and use the Babel extension to mark the languages you speak and to find other speakers of the same language in your wiki. And finally the cldr extension is a database of language and country translations. We are aiming to make new releases every month, so that you can easily stay on the cutting edge with the constantly improving language support. The bundle comes with clear installation and upgrade installations. The bundle is tested against MediaWiki release versions, so you can avoid most of the temporary breaks that would happen if you were using the latest development versions instead. Because this is our first release, there can be some rough edges. Please provide us a lot of feedback so that we can improve for the next release. -Niklas -- Niklas Laxström

1 year, 2 months

Providing the effective language of messages

by Adrian Heine

Hi everyone, as some of you might know, I'm a software developer at Wikimedia Deutschland, working on Wikidata. I'm currently focusing on improving Wikidata's support for languages we as a team are not using on a daily basis. As part of my work I stumbled over a shortcoming in MediaWiki's message system that – as far as I see it – prevents me from doing the right thing(tm). I'm asking you to verify that the issue I see indeed is an issue and that we want to fix it. Subsequently, I'm interested in hearing your plans or goals for MediaWiki's message system so that I can align my implementation with them. Finally, I am hoping to find someone who is willing to help me fix it. == The issue == On Wikidata, we regularly have content in different languages on the same page. We use the HTML lang and dir attributes accordingly. For example, we have a table with terms for an entity in different languages. For missing terms, we would display a message in the UI language within this table. The corresponding HTML (simplified) might look like this: <div id="mw-content-text" lang="UILANG" dir="UILANG_DIR"> <table class="entity-terms"> <tr class="entity-terms-for-OTHERLANG1" lang="OTHERLANG1" dir="OTHERLANG1_DIR"> <td class="entity-terms-for-OTHERLANG1-label"> <div class="wb-empty" lang="UILANG" dir="UILANG_DIR">  </div> </td> </tr> </div> </div> This works great as long as the missing label message is available in the UI language. If that is not the case, though, the message is translated according to the defined language fallbacks. In that case, we might end up with something like this: <div class="wb-empty" lang="arc" dir="rtl">No label defined</div> That's obviously wrong, and I'd like to fix it. == Fixing it == For fixing this, I tried to make MessageCache provide the language a message was taken from [1]. That's not too straight-forward to begin with, but while working on it I realized that MessageCache is only responsible for following the language fallback chain for database translations. For file-based translations, the fallbacks are directly merged in by LocalisationCache, so the information is not there anymore at the time of translating a message. I see some ways to fix this: * Don't merge messages in LocalisationCache, but perform the fallback on request (possibly caching the result) * Tag message strings in LocalisationCache with the language they are in (sounds expensive to me) * Tag message strings as being a fallback in LocalisationCache (that way we could follow the fallback until we find a language in which the message string is not tagged as being a fallback) What do you think? [1] https://gerrit.wikimedia.org/r/282133 Thanks, -- Adrian Heine né Lang SOFTWARE DEVELOPER Wikimedia Deutschland e.V. | Tempelhofer Ufer 23-24 | 10963 Berlin Phone: +49 (0)30 219 158 26-0 http://wikimedia.de Imagine a world, in which every single human being can freely share in the sum of all knowledge. That‘s our commitment. Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

7 years, 9 months

Fwd: Language detection for Special:Search queries

by Federico Leva (Nemo)

-------- Messaggio inoltrato -------- Oggetto: [discovery] Better search results on wiki via TextCat Data: Tue, 19 Jul 2016 19:42:27 -0600 Mittente: Deborah Tankersley A: A public mailing list about Wikimedia Search and Discovery projects <discovery(a)lists.wikimedia.org> We're happy to announce that after numerous tests and analyses[1] and a fully operational demo[2], the Discovery Team is ready to release TextCat[3] into production on wiki. What is TextCat? It detects the language that the search query was written in which allows us to look for results on a different wiki. TextCat is a language detection library based on n-grams[4]. During a search, TextCat will only kick in when the following three things occur: 1. fewer than 3 results are returned from the query on the current wiki 2. language detection is successful (meaning that TextCat is reasonably certain what language the query is in, and that it is different from the language of the current wiki) 3. the other wiki (in the detected language) has results Our analysis of the A/B test[5] (for English, French, Spanish, Italian and German Wikipedia's) showed that: "...The test groups not only had a substantially lower zero results rate (57% in control group vs 46% in the two test groups), but they had a higher clickthrough rate (44% in the control group vs 49-50% in the two test groups), indicating that we may be providing users with relevant results that they would not have gotten otherwise." This update will be scheduled for production release during the week of July 25, 2016 on the following Wikipedia's: * English [6] * German [7] * Spanish [8] * Italian [9] * French [10] TextCat will then be added to this next group of Wikipedia's at a later date: * Portugese[11] * Russian[12] * Japanese[13] This is a huge step forward in creating a search mechanism that is able to detect - with a high level of accuracy - the language that was used and produce results in that language. Another forward-looking aspect of TextCat is investigating a confidence measuring algorithm[14], to ensure that the language detection results are the best they can be. We will also be doing more[15] A/B tests using TextCat on non Wikipedia sites, such as Wikibooks and Wikivoyage. These new tests will give us insight into whether applying the same language detection configuration across projects would be helpful. Please let us know if you have any questions or concerns, on the TextCat discussion page[16]. Also, for screenshots of what this update will look like, please see this one[17] showing an existing search typed in on enwiki in Russian "первым экспериментом" and this one[18] for showing what it will look like once TextCat is in production on enwiki. Thanks! [1] https://phabricator.wikimedia.org/T118278 [2] https://tools.wmflabs.org/textcatdemo/ [3] https://www.mediawiki.org/wiki/TextCat [4] https://en.wikipedia.org/wiki/N-gram [5] https://commons.wikimedia.org/wiki/File:Report_on_Cirrus_Search_TextCat_AB_… [6] https://en.wikipedia.org/ [7] https://de.wikipedia.org/ [8] https://es.wikipedia.org/ [9] https://it.wikipedia.org/ [10] https://fr.wikipedia.org/ [11] https://pt.wikipedia.org/ [12] https://ru.wikipedia.org/ [13] https://ja.wikipedia.org/ [14] https://phabricator.wikimedia.org/T140289 [15] https://phabricator.wikimedia.org/T140292 [16] https://www.mediawiki.org/wiki/Talk:TextCat [17] https://commons.wikimedia.org/wiki/File:Existing-search_no-textcat.png [18] https://commons.wikimedia.org/wiki/File:New-search_with-textcat.png -- Deb Tankersley Product Manager, Discovery IRC: debt Wikimedia Foundation

7 years, 9 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Mediawiki-i18n July 2016