Re: [Wikitech-l] Do we have any data in wikidata / wiktionary that could be used for mechanic translations?

22 May 2014

      If you Petr were going to take a rules' based approach to what you've
outlined above, and use the already existing Wikidata interlinguality,
which I think is based around the 'item with a label' (think a Wikipedia
Encyclopedia article - is this correct?), and build on Wiktionary, could
one 'reduce' Wikidata's intelinguality from an 'item' to a 'word' (and also
co-anticipate voice, smartphones, and extensibility / scalability to all
7,106+ languages, for example, as well)? What else would be needed, and
what would some of the initial challenges to beginning this way?
Cheers,
Scott
(I write the above in the context of developing wiki CC MIT OCW-centric
WUaS for free online university degrees, and which plans to be in all 7106+
languages
http://worlduniversity.wikia.com/wiki/Languages as schools, and develop a
universal translator -
http://worlduniversity.wikia.com/wiki/WUaS_Universal_Translator - as well).
On Thu, May 22, 2014 at 9:03 AM, Lars Aronsson lars@aronsson.se wrote:
...
On 05/22/2014 05:41 PM, Petr Bena wrote:
...
I was looking for a free (possibly open source) provider of automatic
translations for my open source application I am working on and quite
had troubles finding some. Then I realized we have a project called
"wiktionary" which could possibly (I was assuming it's open
dictionary) help me here, but I was quite disappointed as I couldn't
find any simple way to perform simple queries like:
There are several open-source machine translation projects.
They are either rule-based or statistics-based. One of the
rule-based projects is Apertium.
When you start from zero, building a rule-based system
gives you a useful system quite fast, especially if the
two languages are similar. A statistics-based system (such
as Google Translate) requires enormous amounts of
data to become useful.
It's not something that you can start as a subproject
within Wiktionary, not even as a separate WMF project.
It's a very large task.
One naive approach is to base a statistics-based
machine translator (SMT) on the European Union's
freely available parallel text corpus. When you try
to translate Finnish "terve" (which means: hello!)
into English in such a system, it will say "health",
since the same word also means health, and EU
texts only talk about healthcare, never "hello".
--
  Lars Aronsson (lars@aronsson.se)
  Aronsson Datateknik - http://aronsson.se

Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- 
http://scottmacleod.com/worlduniversityandschool.htm

This email is intended only for the use of the individual or entity to
which it is addressed and may contain information that is privileged and
confidential. If the reader of this email message is not the intended
recipient, you are hereby notified that any dissemination, distribution, or
copying of this communication is prohibited. If you have received this
email in error, please notify the sender and destroy/delete all copies of
the transmittal. Thank you.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Do we have any data in wikidata / wiktionary that could be used for mechanic translations?