Re: [Wikitech-l] Do we have any data in wikidata / wiktionary that could be used for mechanic translations?

22 May 2014


      On 05/22/2014 05:41 PM, Petr Bena wrote:
...
I was looking for a free (possibly open source) provider of automatic
translations for my open source application I am working on and quite
had troubles finding some. Then I realized we have a project called
"wiktionary" which could possibly (I was assuming it's open
dictionary) help me here, but I was quite disappointed as I couldn't
find any simple way to perform simple queries like:
There are several open-source machine translation projects.
They are either rule-based or statistics-based. One of the
rule-based projects is Apertium.
When you start from zero, building a rule-based system
gives you a useful system quite fast, especially if the
two languages are similar. A statistics-based system (such
as Google Translate) requires enormous amounts of
data to become useful.
It's not something that you can start as a subproject
within Wiktionary, not even as a separate WMF project.
It's a very large task.
One naive approach is to base a statistics-based
machine translator (SMT) on the European Union's
freely available parallel text corpus. When you try
to translate Finnish "terve" (which means: hello!)
into English in such a system, it will say "health",
since the same word also means health, and EU
texts only talk about healthcare, never "hello".
-- 
   Lars Aronsson (lars@aronsson.se)
   Aronsson Datateknik - http://aronsson.se

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Do we have any data in wikidata / wiktionary that could be used for mechanic translations?