[Foundation-l] One Wikipedia Per Person (regarding the distribution of and the ability to read Wikipedia)

Gerard Meijssen gerard.meijssen at gmail.com
Sun May 31 08:44:41 UTC 2009


Hoi,
The notion that this black box needs to use text that is licensed under the
CC-by-sa is a folly. The data that is gathered by data mining strips the
meaning of the text. Consequently it can be considered to be a completely
and utterly separate work. Using text as the basis of a corpus is
essentially less intrusive then using the same text for "search engine"
purposes.

I have never argued for the WMF to involve itself in machine translation.
What I do argue is that the WMF might partner with organisations that are
involved in machine translations. It is not just Google that comes to mind,
Apertium is another project that has a different approach that is effective
for certain language combinations.

The legalities and practicalities of language technology are quite distinct
from our standard considerations.
Thanks,
      GerardM

2009/5/31 Brian <Brian.Mingus at colorado.edu>

> Proprietary algorithms aren't what make their system better - it's that
> they
> have a larger corpus. Google has published a trillion token dataset for
> machine translation researchers but it's presumably just a subset of what
> they now have.  The data that makes their system so good is already
> available public but it is not (yet) within the scope of the WMF to harvest
> all copyrighted information in order to increase the performance of already
> published machine translation algorithms.
>
> It would cost the WMF dearly in resources to build such a system themselves
> based on published
> research.  In other words, as long as the output of the black box is
> CC-BY-SA the other factors aren't very important.
>
> In my mind if you consider using a corporation's semi-proprietary
> translation engine to be a violation of the WMF's principles then accepting
> visitors that come from Google in the first place would be an analogous
> violation. We have no idea how the search engine that is the single largest
> source of visitors to Wikipedia works, and yet we accept them graciously.
>
> On Sun, May 31, 2009 at 1:45 AM, Gerard Meijssen
> <gerard.meijssen at gmail.com>wrote:
>
> > Hoi,
> > Currently the translation engine by Goole works for some twenty
> languages.
> > We have Wikipedias in over 250 languages and we localise in over 300. If
> we
> > are to collaborate with Google on this, we should partner in the building
> > of
> > translation engines for our other languages. We could and we should
> > consider
> > this when the software was to be open source.
> > Thanks,
> >      GerardM
> >
> > 2009/5/31 Foxy Loxy <foxyloxy.wikimedia at gmail.com>
> >
> > > I would guess a partership with Google would be a good idea because:
> > > 1) They are the best (according to Brian) and
> > > 2) If we were to go through with this proposal we'd want the
> translation
> > > technology now, not in X years when the technology catches up with
> > > google, if at all.
> > >
> > > And with many OSS/free projects, the X could be insanely high.
> > >
> > > On Sunday, 31 May 2009 2:50 pm, Fajro wrote:
> > > > And why partner with Google? There are Free alternatives in
> > > > development:
> > > >
> > > > http://www.apertium.org/
> > > >
> > > > http://wiki.apertium.org/wiki/Main_Page
> > > >
> > > > --
> > > > △ ℱajro △
> > >
> > > --
> > > fl
> > > <http://en.wikipedia.org/wiki/user_talk:fl>
> > > _______________________________________________
> > > foundation-l mailing list
> > > foundation-l at lists.wikimedia.org
> > > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
> > >
> > _______________________________________________
> > foundation-l mailing list
> > foundation-l at lists.wikimedia.org
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
> >
> _______________________________________________
> foundation-l mailing list
> foundation-l at lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>


More information about the foundation-l mailing list