Hoi,
The Apertium software needs information in an unambiguous way. This is
to ensure that the software is able to run with the data. The notion
that the information needed by Apertium is not of relevance in other
environments is simply wrong. The information is of use outside of
Apertium and as a consequence the choise for the GPL license is
unfortunate. You concentrate for now on Wikipedia but you indicate
that consider using the Wiktionary data as well.
The choice of the GPL licence is perfect for including machine
translation in other free software, the overwhelming majority of which
is licensed under the GPL.
The fact that our linguistic data can be used separately is an aside.
And as a note, it can be re-used for software like grammar checkers,
spell-checkers, etc. which are under the GPL. The question is really why
_not_ use the GPL.
Where you state that Apertium needs information in a
very tightly
controlled way, is this what you copyright? Or in other words, do you
copyright the information in order to control this specific type of
application? If not, what is the objective of choosing the GPL for
data?
To the other list members: yes this is off-topic, so I'll try and keep
it short.
The objective of choosing GPL for the data is:
* to make it compatible with the engine/other tools in case anything
needs to be moved between the packages,
* to make it unambiguously able to be included in Debian,
* to make it compatible with other lexical resources that are GPL (of
which there are many),
* because the transfer rules and scripts are copyrightable works, as are
the rules for morphological analysis. As I mentioned in the previous
email it is not possible to decouple the two. If you want further
information as to the originality and copyright status of the data,
please consider looking at one of the packages,
* to ensure that if people take one of our original language pairs the
community has the guarantees of the GPL that changes and improvements
will be released under the same licence, whether this be increased
vocabulary, better transfer rules, a special program to deal with a
language feature etc.
Fran
Thanks,
GerardM
On Feb 2, 2008 9:38 PM, Francis Tyers <spectre(a)ivixor.net> wrote:
El sáb, 02-02-2008 a las 12:10 -0800, Ray Saintonge escribió:
Francis Tyers wrote:
> I work on machine translation software,¹ focussing on
lesser-used
and
> under-resourced languages.² One of the
things that is
needed for our
> software is bilingual dictionaries. A usable
way of
getting bilingual
dictionaries is to harvest Wikipedia interwiki links.³
While they are helpful, it would be a mistake to consider
these as
fully
reliable. The disambiguation policies of the
separate
projects are also
a factor to consider.
Needless to say I've done an analysis of how useful this is
before
mentioning it. I can send you the results if you would be
interested.
> Now, I've been told that interwiki links
do not have the
level of
> originality required for copyright, many of
them being
created by bot.
> I'm not sure that this is the case, as
some of them are
done by people
> and choosing the correct article has at
least some level
of work.
> Besides, this would be a cop-out, if we for
example wanted
to sense
> disambiguate the terms extracted using the
first paragraph
of the
article,
this would still be a licence violation.
I would question the copyrightability of any dictionary
entry on
the
basis of the merger principle. We copyright
forms of
expression rather
than ideas. If the idea is indistinguishable
from the form
there is a
strong likelihood that it is not copyrightable.
A
dictionary is not
reliable if it seeks to inject originality in its
definition. Seeking
new ways to define words means that we encourage
definitions
that may
deviate from the original intention of the words.
What is
copyrightable
in a dictionary then is more in the level of
global
selection and
presentation.
This is what I also have been lead to believe. But when you're
in the
habit of commercially distributing stuff -- especially free
software
that everyone can see inside -- you like to be sure :)
> So, is there any way to resolve this? I
understand that
probably it is
> on no-ones high list of priorities. On the
other hand, I
understand that
> the FSF is considering to update the GFDL to
make it
compatible with the
> Creative Commons CC-BY-SA licence.
>
> Would it also be possible at the same time to add some
kind of
clause
> making GFDL content usable in GPL licensed
linguistic data
for machine
translation systems?
What either of those licences say is not within the control
of any
Wikimedia project. Perhaps you should be
discussing this
with FSF.
I was intending to do that after I received replies back from
here. I
understand that the WMF/Wikipedia has some clout with respect
to
licensing at the FSF, for example:
http://wikimediafoundation.org/wiki/Resolution:License_update
Of course moving to CC-BY-SA won't solve the GPL compatibility
problem.
Fran
_______________________________________________
Wikipedia-l mailing list
Wikipedia-l(a)lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/wikipedia-l