Re: [Mediawiki-i18n] Internationalizing project names

16 Sep 2015

Hoi,
I do not really understand the purpose of finding the articles in other
languages. Arguably they are all known in Wikidata, including
aliases.Magnus did already build a functional extension to search that is
implemented on many Wikipedias. It will find you data when it is only
available in Wikidata and it has the option to see this in an informative
way by means of the "Reasonator".

No need to train the search of languages. Arguably such a training is not
available for most of the languags Wikimedia supports.
Thanks,
     GerardM

On 16 September 2015 at 17:59, Erik Bernhardson &lt;ebernhardson(a)wikimedia.org&gt;
wrote:

...
  On Tue, Sep 15, 2015 at 11:13 PM, Gerard Meijssen
<
 gerard.meijssen(a)gmail.com&gt; wrote:

  One question, when you search for 
''Ревест-Сен-Мартен", why did you not
 consider every language that uses the Cyrillic script? It is as likely to
 find something in Serbian, Macedonian, Belarusian etc ...
 Thanks,
      GerardM

  The rest of the discussion is happening on the phab ticket, but i'll
 answer this here.  We are using a language detection algorithm that has
 been trained against tweets. Tweets are not, on average, as short as  the
 searches we are detecting the language of but it does an ok job. Trey did a
 great job putting together an analysis[1] of this language detection algo.
 We will also be using his work there to evaluate other language detection
 methods and perhaps change what we are using in the future.

 So the short of it is, we chose russian instead of serbian because the
 machine learning algorithm said so.

 [1]
 https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Language_Detection_E…

 _______________________________________________
 Mediawiki-i18n mailing list
 Mediawiki-i18n(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/mediawiki-i18n

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Mediawiki-i18n] Internationalizing project names