Re: [Wikipedia-l] The use of Wikipedia extracted wordlists in GPL machine translation systems

2 Feb 2008

Hoi,
The Apertium software needs information in an unambiguous way. This is to
ensure that the software is able to run with the data. The notion that the
information needed by Apertium is not of relevance in other environments is
simply wrong. The information is of use outside of Apertium and as a
consequence the choise for the GPL license is unfortunate. You concentrate
for now on Wikipedia but you indicate that consider using the Wiktionary
data as well.

Where you state that Apertium needs information in a very tightly controlled
way, is this what you copyright? Or in other words, do you copyright the
information in order to control this specific type of application? If not,
what is the objective of choosing the GPL for data?
Thanks,
     GerardM

On Feb 2, 2008 9:38 PM, Francis Tyers &lt;spectre(a)ivixor.net&gt; wrote:

...
  El sáb, 02-02-2008 a las 12:10 -0800, Ray Saintonge
escribió:
  Francis Tyers wrote:
  I work on machine translation software,¹
focussing on lesser-used and
 under-resourced languages.² One of the things that is needed for our
 software is bilingual dictionaries. A usable way of getting bilingual
 dictionaries is to harvest Wikipedia interwiki links.³
  While they are helpful, it would be a mistake to consider these as fully
 reliable.  The disambiguation policies of the separate projects are also
 a factor to consider. 
 Needless to say I've done an analysis of how useful this is before
 mentioning it. I can send you the results if you would be interested.

   Now,
I've been told that interwiki links do not have the level of
 originality required for copyright, many of them being created by bot.
 I'm not sure that this is the case, as some of them are done by people
 and choosing the correct article has at least some level of work.
 Besides, this would be a cop-out, if we for example wanted to sense
 disambiguate the terms extracted using the first paragraph of the
 article, this would still be a licence violation.
  I would question the copyrightability of any dictionary entry on the
 basis of the merger principle.  We copyright forms of expression rather
 than ideas.  If the idea is indistinguishable from the form there is a
 strong likelihood that it is not copyrightable.  A dictionary is not
 reliable if it seeks to inject originality in its definition.  Seeking
 new ways to define words means that we encourage definitions that may
 deviate from the original intention of the words.  What is copyrightable
 in a dictionary then is more in the level of global selection and
 presentation. 
 This is what I also have been lead to believe. But when you're in the
 habit of commercially distributing stuff -- especially free software
 that everyone can see inside -- you like to be sure :)

  > So, is there any way to resolve this? I
understand that probably it is
 > on no-ones high list of priorities. On the other hand, I understand  that
  > the FSF is considering to update the GFDL to
make it compatible with  the
   Creative
Commons CC-BY-SA licence.

 Would it also be possible at the same time to add some kind of clause
 making GFDL content usable in GPL licensed linguistic data for machine
 translation systems?
  What either of those licences say is not within the control of any
 Wikimedia project. Perhaps you should be discussing this with FSF. 
 I was intending to do that after I received replies back from here. I
 understand that the WMF/Wikipedia has some clout with respect to
 licensing at the FSF, for example:

 http://wikimediafoundation.org/wiki/Resolution:License_update

 Of course moving to CC-BY-SA won't solve the GPL compatibility problem.

 Fran

 _______________________________________________
 Wikipedia-l mailing list
 Wikipedia-l(a)lists.wikimedia.org
 http://lists.wikimedia.org/mailman/listinfo/wikipedia-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

Re: [Wikipedia-l] The use of Wikipedia extracted wordlists in GPL machine translation systems