El sáb, 02-02-2008 a las 22:13 +0100, Gerard Meijssen escribió:
Hoi, The Apertium software needs information in an unambiguous way. This is to ensure that the software is able to run with the data. The notion that the information needed by Apertium is not of relevance in other environments is simply wrong. The information is of use outside of Apertium and as a consequence the choise for the GPL license is unfortunate. You concentrate for now on Wikipedia but you indicate that consider using the Wiktionary data as well.
The choice of the GPL licence is perfect for including machine translation in other free software, the overwhelming majority of which is licensed under the GPL.
The fact that our linguistic data can be used separately is an aside. And as a note, it can be re-used for software like grammar checkers, spell-checkers, etc. which are under the GPL. The question is really why _not_ use the GPL.
Where you state that Apertium needs information in a very tightly controlled way, is this what you copyright? Or in other words, do you copyright the information in order to control this specific type of application? If not, what is the objective of choosing the GPL for data?
To the other list members: yes this is off-topic, so I'll try and keep it short.
The objective of choosing GPL for the data is:
* to make it compatible with the engine/other tools in case anything needs to be moved between the packages, * to make it unambiguously able to be included in Debian, * to make it compatible with other lexical resources that are GPL (of which there are many), * because the transfer rules and scripts are copyrightable works, as are the rules for morphological analysis. As I mentioned in the previous email it is not possible to decouple the two. If you want further information as to the originality and copyright status of the data, please consider looking at one of the packages, * to ensure that if people take one of our original language pairs the community has the guarantees of the GPL that changes and improvements will be released under the same licence, whether this be increased vocabulary, better transfer rules, a special program to deal with a language feature etc.
Fran
Thanks, GerardM
On Feb 2, 2008 9:38 PM, Francis Tyers spectre@ivixor.net wrote: El sáb, 02-02-2008 a las 12:10 -0800, Ray Saintonge escribió: > Francis Tyers wrote: > > I work on machine translation software,¹ focussing on lesser-used and > > under-resourced languages.² One of the things that is needed for our > > software is bilingual dictionaries. A usable way of getting bilingual > > dictionaries is to harvest Wikipedia interwiki links.³ > > > While they are helpful, it would be a mistake to consider these as fully > reliable. The disambiguation policies of the separate projects are also > a factor to consider.
Needless to say I've done an analysis of how useful this is before mentioning it. I can send you the results if you would be interested. > > Now, I've been told that interwiki links do not have the level of > > originality required for copyright, many of them being created by bot. > > I'm not sure that this is the case, as some of them are done by people > > and choosing the correct article has at least some level of work. > > Besides, this would be a cop-out, if we for example wanted to sense > > disambiguate the terms extracted using the first paragraph of the > > article, this would still be a licence violation. > > > I would question the copyrightability of any dictionary entry on the > basis of the merger principle. We copyright forms of expression rather > than ideas. If the idea is indistinguishable from the form there is a > strong likelihood that it is not copyrightable. A dictionary is not > reliable if it seeks to inject originality in its definition. Seeking > new ways to define words means that we encourage definitions that may > deviate from the original intention of the words. What is copyrightable > in a dictionary then is more in the level of global selection and > presentation. This is what I also have been lead to believe. But when you're in the habit of commercially distributing stuff -- especially free software that everyone can see inside -- you like to be sure :) > > So, is there any way to resolve this? I understand that probably it is > > on no-ones high list of priorities. On the other hand, I understand that > > the FSF is considering to update the GFDL to make it compatible with the > > Creative Commons CC-BY-SA licence. > > > > Would it also be possible at the same time to add some kind of clause > > making GFDL content usable in GPL licensed linguistic data for machine > > translation systems? > > > What either of those licences say is not within the control of any > Wikimedia project. Perhaps you should be discussing this with FSF. I was intending to do that after I received replies back from here. I understand that the WMF/Wikipedia has some clout with respect to licensing at the FSF, for example: http://wikimediafoundation.org/wiki/Resolution:License_update Of course moving to CC-BY-SA won't solve the GPL compatibility problem. Fran _______________________________________________ Wikipedia-l mailing list Wikipedia-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikipedia-l