Wikimedia-l August 2006

wikimedia-l@lists.wikimedia.org

115 participants
129 discussions

by Jan Kulveit

Hi, the 1st sunrise period for .eu domain registration is going to begin soon - 7 December 2005 only domain names which are registred EU community/national trademarks will be registered. I'm not sure about the status of Wikimedia trademarks, but guess at least Wikipedia a Wikimedia are suitable. IMO "we" should apply at least for wikipedia.eu and wikimedia.eu. Because of trademark issues and eu regulations concernig who can apply for .eu, I'm affraid it will be a bit complicated. I hope someone from the foundation can take care of it. (If the process is allready going, sorry :-) Jan Kulveit ([[USEr:Wikimol]])

17 years, 8 months

Re: [Foundation-l] Celebrity pictures

by ATR

| Date: Tue, 29 Aug 2006 09:36:30 -0400 | From: Anthony <wikilegal(a)inbox.org> | Subject: Re: [Foundation-l] Celebrity pictures | To: "Wikimedia Foundation Mailing List" <foundation-l(a)wikimedia.org> | Message-ID: | <71cd4dd90608290636p645474e7m1f132123a17319e9(a)mail.gmail.com> | Content-Type: text/plain; charset=ISO-8859-1; format=flowed | | On 8/29/06, Ray Saintonge <saintonge(a)telus.net> wrote: | > Anthony wrote: | > >On 8/28/06, Ray Saintonge <saintonge(a)telus.net> wrote: | > >>Perhaps more significant than whether anyone has lost is whether any | > >>such case has ever been filed. Given that they are distributed for the | > >>specific purpose of publicity there could be an implicit permission. | > >> | > >> | > >If you're using the image for the purposes of promoting the person. | > >If, on the other hand, you're using the image to sell an encyclopedia | > >article which portrays the person in a way which they don't want to be | > >portrayed, then there probably isn't implicit permission. | > > | > I don't know if it's to "sell" an encyclopedia. Lindsay Lohan would | > need to think she's pretty special if she believes a picture of her will | > make all the difference in encyclopedia sales. Is she as self-absorbed | > as Paris Hilton? Our use is transformative, and it in no way adversely | > affects the company's sales.. It would even be interesting to hear the | > companies comment on the function of publicity shots. | > | I was talking about reuse. Specifically, someone who was selling | print encyclopedias with the current Lindsay Lohan article in it. I | didn't mean to imply that the selling point was the picture, but | merely that the encyclopedia was being sold. | | > >Maybe I'm overly paranoid, but even here in the US where we have some | > >very strong fair use and first amendment rights, I still wouldn't feel | > >comfortable selling an encyclopedia with the current [[Lindsay Lohan]] | > >article in it, without first receiving permission from the copyright | > >holders of the images. | > >(http://en.wikipedia.org/w/index.php?title=Lindsay_Lohan&oldid=72480012 | > >in case it changes before this is read) | > > | > This may be a problem for the print version, and specific permissions | > should probably be sought when we get that far. For the on-line | > verrsion however I have no problem with an active campaign to replace | > the fair use images with "free" ones. It's clear that I'm more risk | > tolerant than you, but that doesn't mean there's such a wide gap between | > our views. | > | I don't think it makes sense to have such significant differences | between the print version and the online version. Other than that, I | agree with you though. I wouldn't have a problem distributing the | current article online. In fact, I have a website where I'm doing it. | | Jimbo has stated, long in the past, that he doesn't want the print | version to be a fork of the online version. Maybe he's changed his | mind, but if not I think you have to consider the print and online | versions to be the same thing. | | > >Frankly I think that case could be probably be won by the museum on | > >appeal, if they spent enough money fighting it. | > > | > Yeah, Dillinger has been dead since 1934. | > | In Indiana the right to publicity persists after death, though. | | > >Besides, there are | > >always going to be crazy jurisdictions (like Indiana, apparently) with | > >laws so out of touch with reasonableness that we just can't follow | > >them. | > > | > Developing policies to account for such extremes is playing to the | > lleast common denominator. | > | Absolutely. I agree. But at the same time, US fair use is an extreme | too, just on the other end of the spectrum. | | > >As for relying on the copyright holder of the image finding the | > >Wikipedia article "respectful", well, I just think that's a horrible | > >thing for us to even have to consider. Would Linsay Lohan (*) object | > >to our portrayal of her in "Media spotlight"? I don't know, and I | > >don't care. | > > | > There's also the question of who owns the copyright. I suspect it's the | > studio who sends out fan pics to admirers. | > | I would think, with a publicity photo, that it'd be the publicist. | | Anthony | Might also the NY Times test apply here? Celebrities, being public figures should be viewed in public. A publicity photo is just for that purpose, to have the person's photo be put out into the public. Besides the strict fair use argument (there is no loss to their marketplace by putting the celebrity photo out) one could also argue that a plain head shot (that does not involve any creative positioning etc.) really the photo belongs to the celebritity (and usually they own those photos by contractual work made for hire agreements anyway) not to a publicist or other third party (though sometimes photos made for specific publications are copyright by that publication as part of a story that is being done for that publication). A regular publicity photo might only have a claim by the celebrity and in that case one might even argue that under the NY Times test that as long as actual malice does not apply to use of the photo, that the photo can be reproduced anywhere else (I haven't done any caselaw research on this so I cannot state that this would prevail but it seems a reasonable argument to make now). This question also opens up all the moral rights questions associated with altering someone's work. Obviously if some punk rocker took a GFDL released photo (with a right of publicity implicitly released) from someone's profile on WP and then altered it into making them look "evil" that could be considered defamatory (either as a breach of privacy or perhaps some other tort, if not actual defamation) and so one could argue that the GFDL and CC never include permission for such tranformations of an image, even a public domain image could be the basis of a tortious transformation. If it were just being used in another encyclopedia article how could they argue that an accurate image of their face damages them in anyway (including under copyright laws) since that is the reason they released the photo in the first place? Having the photo used in a thumbnail in an encyclopedia is just enhancing their celebrity status, not causing them to loose money. It is the other transformative uses that need to be looked at, I think. Of course if the GFDL were drafted by an Canadian open source foundation then the whole bundle of moral rights protections would definitely apply, besides the right of pubicity issues, but in the US it is not so clear even though it signed the Berne Convention it has limited moral rights protections under the [[Visual Artists Rights Act]] that only apply to limited types of "works of visual art" as defined in Title 17 USC sec. 101 and do not cover publicity photos. alex756 (IAAL)

17 years, 8 months

[Foundation-l] WkiTrans Update/Swahili Machine Translation 20060817 (Corrections)

by Jeffrey V. Merkey

I have made another machine translation run and removed particle insertion, the erroneous swahili lexicons identified by Martin Benjamin, and recompiled the swahili thesaurus based solely upon the Kamusi swahili lexicons, which Martin states are only partially completed and possibly has some ambiguities. Future runs of this project will be posted and announced after application of the grammar rules and full conjugation and sentence decomposition and reconstruction rule sets based upon Dr. Benjamin's parsing rules, which may be a month or two from now after more work is done on the grammar parser for this language. One other challenge is language drift into Arabic, which was explained to me that Swahili and many other African Languages have drifted to incorporate arabic language derivatives which may require overlapping rule sets to machine translate properly. I have activiated the english link grammar parser for this second run and have begun using word paring against the Kamusi lexicons, which are not yet setup to fully handle these cases yet (but well on their way to this goal). The Cherokee language (and most native languages) produce words which are complete self contained morphemes and word meanings are typically not split accross word pairs as appears to be the case in Swahili, and the Cherokee parsers and lexicons are a lot further along, having been in development by our linguists for several years for this precise application (In Cherokee, each complex verb is in fact an entire self contained sentence of sorts - and some nouns as well). As Martin points out, this language has a lot more work to go to get to the same point the machine translator for Native American Languages has already reached with comprehesive lexicons and grammar rule sets for machine translation. Nonetheless, the tremendous potential Wikipedia machine translation holds for African Languages is compelling enough for the Wolf Mountain Group to approve funding for this effort to move it forward along with any other interested African Languages in support of the Wikimedia Foundations Projects and Goals for African Communities. I still anticipate we can get to 90% by the end of Autumn. This project will be under development and regular updates which will be posted to the machine translations page setup by Sabine on Meta for African Languages. These first runs were examples to illustrate the power of Wikitrans to rapidly apply and create the whole of Wikipedia almost overnight in another language (provided the lexicons and rule sets are complete and accurate for the translator to rely upon). The African languages project is very useful to allow further abstractions to be instrumented in WikiTrans to deal with a multitude of languages for all of Wikimedia's projects, which is the ultimate goal. The real value here are the grammar and parsing rule sets and word paring logic for each language and dialect. Over time, Wikitrans will develop a large body of these rule sets and lexicons for all interested languages we target. Rule sets may or may not be published, depending on the project and the interests of the contributors. French, Spanish, German, Dine, Italian, and other popular and pervasive language rule sets will certainly be published sometime this fall so folks interested in porting a language to WikiTrans can do so by writing rule sets and lexicons and submitting them to the project for test runs. Latest run for swahili is at: http://sw.wikigadugi.org Latest lexicons, thesaurus, and xml dumps are at: ftp://ftp.wikigaudgi.org/africa Jeff

17 years, 8 months

Re: [Foundation-l] WMF Approval of New Languages/Projects

by timichal

>> >> >> In the past we have accepted codes to be used as "language" codes which >> were non-existent and have had as a result that we are not in compliance >> with the rules of accepted use for the ISO-639 codes. When codes for new >> languages are used that are not consistent with the existing ISO-639 >> codes (all two and three character codes) a language should not be >> accepted at all. >> We can handle ISO-639 codes, it's no problem to assign the correct code if it exist. In my opinion, even languages without an ISO-639 code should be accepted; however, this shouldn't be controlled by us, but the decision should be made in the New language requests vote. > > According to your rules, anyone (with help of > a few friends or of a few sockpuppets) can re-open the Zorglub language > (oldbies will understand which language is concerned). > So, your proposal needs to mention the issue of constructed languages. The policy is just a proposal at the moment; however, we'll take this into account. There is a Quenya language test in Incubator; I suppose we should delete it? > Besides, I see you wrote "The Foundation will also have to approve the > domain". Errrrrrrr. I'd prefer we avoid such bottleneck. How about > something like "if at least 20 votes with a very large majority", no > approval needed. If less votes or less obvious support, then, the > Foundation or the spc must approve before creation ? I removed the approval part, I think it remained there from the time when it was imported on Meta and reworded by Daniel. As for your 20 votes suggestion - unfortunately there are often cases of flash voting, where 20-30 voters can easily appear out of nowhere and declare their support. We can counter this somewhat by forcing all voters to have accounts, but there are still problems with that. I think this has yet to be decided somehow. We should make a percent range of approval, like with RfAs on English Wikipedia; e.g. more than 75% support gets approved automatically, between 50% and 75% needs approval by Foundation/SPcom and less then 50% fails to get a wiki estalished. Michal Zlatkovsky ([[incubator:User:Timichal]])

17 years, 8 months

[Foundation-l] [Election] Big season of translation, we have now close to 20 candidates!

by Aphaia

Hello all, thank you for your help and interest as usual, now we close candidate accepting. I hope we can release soon the complete list of candidates. This election gets a wide attention, and we have now close to 20 candidates. I'm really excited about that. Quicklist of candidate statement by language: http://meta.wikimedia.org/wiki/Election_translations_2006/En#Candidates Some of those candidates came late, but I hope the Wikimedia community pay impartial attention to all those candidates. All statements will be read closely, ideally in the language most convenient for each reader, that is, each voter. Regretfully we need to admit not all voter can read those statement in the language they are most familiar with, though. But luckily and thankfully being helped by many eager translators, some language speakers will be able to have each of those statements in their own languages - or not. It depends on our volunteering staff, and if you are multilingual, you can help it. Your translation will help your friends in your community and help assuring the neutrality and impartiality of this coming Election. I think no one will be happy if some candidates have their statements in multiple language, and others have only in few, or some language speaker voters can read what a certain candidate think and promise, but need to rely on machine translation or a language version which is obscure for them at all. I want to pursue the equal opportunity to the highest extent. On this Election. Not only because I am serving as officer to that, but also I believe in the equal right of us the Wikimedia editor and trust you all as collaborator to the same goal; to provide free knowledge in every potion of this planet, in every language. And voting for our community representative should be one of most significant part to pursue this goal in my opinion. And for that, we need your help, you German, Spanish, Dutch, Italian, Polish ... and many other translators. Again, we need your help. To provide all our voters enough information consider this Election as serious as possible. As impartial as possible. And you can help us. Us the Wikimedia community, assuring it the global character, the equality - and free access to a certain knowledge: what those candidate think about our mission. So, give a look to our workspace and consider what you can do! Quicklist of candidate statement by language: http://meta.wikimedia.org/wiki/Election_translations_2006/En#Candidates Thank you for your attention, and see later on meta. -- Kizu Naoko Wikiquote: http://wikiquote.org * vivemus, mea Lesbia, amemus *

17 years, 8 months

[Foundation-l] Swahili Machine Translation First Run Completed for enwiki-20060817

by Martin Benjamin

Jeff, I applaud you for your initiative - your effort is impressive, albeit unreadable. I'll give my feedback in this post, and then suggest we take the discussion of the specifics of Swahili translation off-list (and welcome others who want to keep track of this thread to email us to stay in the cc loop). The last 2 or 3 paragraphs of this post do speak to the wider discussion list, so other readers might wish to SKIP TOWARD THE BOTTOM. The first problem derives from your sources. The first source, "public swahili lexicon," is a useless set of about 1000 nouns, adjectives, and conjunctions, essentially a tourist vocabulary without any verbs. I would be surprised if that list gave any pairings that weren't also in the other lists. The third source, "rogets thesaurus in swahili," is one I would like to know more about, but is not useful for machine translation purposes in the configuration you've set up - for example, scroll down to line 51382, and look at the following 100-odd pairs for "idhini" in no particular order, with no way to distinguish among parts of speech, shades of meaning, relative frequency, etc. However, I was heartened to see line 45405 and following; I'm sure that if any wikipedia entries need to be translated that include "assify," "torpedinous," or "macht nichts," this thesaurus will prove quite handy. It looks like someone started with a smallish Swahili-English wordlist, plugged that into an English thesaurus, and extrapolated dozens of additional English equivalents per word, yielding an intriguing but lexicographically suspect set of equivalencies. Which leads us to the Kamusi Project as a source. I will be the first to say that the Kamusi is a pretty good Swahili dictionary that will one day be a great Swahili dictionary, but at the moment contains significant weaknesses that prevent it from being a reliable source for machine translation. The first issue is the quality of the data. The initial data were manually input from an existing print dictionary to which we were granted copyright permission. Unfortunately, the students entering the data, before we programmed the Edit Engine, introduced a lot of errors. I am currently in the process of going through the database entry by entry, fixing those errors and adding in new heaps of data, including information for many data fields that we hadn't introduced during the initial data entry phase. This is an incredibly time consuming, research-intensive task, and I don't foresee having a Swahili->English dictionary that I am really happy with for another couple of years (at best - the thesaurus above, and my wife, would describe our current funding situation as "pauperized"). The Kamusi lexicon is much better as a Swa->Eng source than as an English->Swahili dictionary, because that is the direction in which we've input most of the initial data. The magic of databases makes it possible to have our data available bi-directionally, but the E->S version of the Kamusi needs its own careful review. That review can only come after the S->E data are thoroughly updated. Most especially, precious few E->S entries have been arranged with the Grouping Tool ( http://research.yale.edu/swahili/serve_pages/groupingtool_en.php ), so most entries appear in an arbitrary order that does not account for homographs, differing senses, frequency, etc. So, it would be premature to use the E->S Kamusi lexicon as a platform for machine translation, even though we do intend to get there. When the data are ready for machine use, the program would also need to check the four "alternate spellings" fields, to pick up all the color v. colour issues that occur in both English and Swahili. Also, I would think that you would want to keep part of speech info associated with each line, which would make it much easier to employ grammar rules. A grammar hint: in Swahili, the adjective always comes *after* the noun that it modifies, except for the words "kila," "nusu," and "robo", and a few other cases, including the numbers preceding "elfu" for thousands between 11,000 and 99,000. Another hint: Swahili does not use articles, so you need to get rid of most attempts at translations of a/ an/ the. When an article is absolutely necessary (which a computer would have a difficult time predicting), Swahili uses variations of "one" for a/ an, and "that" for the. Just getting rid of the articles in your articles would be a 100% improvement (bringing them up to 2% readable). Ok, now assume we have good data, with a good way of predicting which words were appropriate in which circumstances (something that will eventually be aided by the work now being done toward building a central OmegaT database), and a good set of grammar rules. You would still need to deal with the agglutinative Swahili verb in all its glory. The Kamusi Project has a good parser embedded in our Swahili->English search, which disentangles the front end of any conjugated Swahili verb according to an analysis of every grammatical rule in the language. (We have a similar analysis completed and written in pseudo-code for the back end, the verbal extensions, but ran out of money and had to lay off our programmer before we could code it into the search engine.) Even taking advantage of our parser, your translating software would need to go the other way, building Swahili verbs from conjugated English verbs. You would need to account for the noun classes of each noun that is referred to in the verb (as many as three different nouns, each of which is either one of four different conversational participants or belongs to one of 16 different noun classes), which involves trivial calls to our database once you've identified the appropriate elements in the English sentence and chosen the relevant nouns - the "class" field is the key here. The real problem comes from conjugated English verbs. You need some way of knowing that "catches/ caught" relate to "catch," which would involve a database of English verbs and their irregular forms, and then you would need to map the various movable elements of the English sentence to the appropriate fixed points of the Swahili verb. Not an impossible task to achieve to 90% over time, but not nearly as straightforward as you are hoping. Of course, this is all for Swahili, for which we have a pretty good initial lexicon en route to becoming excellent, a complete description of grammatical rules, and an accepted, unicoded orthography. Most other African languages, even those spoken by millions of people, are missing some or all of those elements in digital form. So, even if you could get pretty good machine translation of Wikipedia for Swahili, you would still be a long, long way away from rolling with other languages. And we still haven't dealt with content. What's to say that content that is appropriate for the English Wikipedia is appropriate for the Swahili Wikipedia? For example, the entry for Agriculture. It begins by discussing the derivation of the word "agriculture," which is of course irrelevant for Swahili. Then it carries an unacknowledged POV about modern agriculture (as though the vast numbers of Africans who earn their livings with hand hoes are pre-modern museum relics, and let's not even click on the link to "subsistence farming" that talks about "life outside of modern society"), and essentially ignores all of the issues of raising crops on small farms that would be of immediate interest to an African farmer logging in from an internet kiosk. (Comment to those who fear paternalism in this endeavor: the people I live and work among in Tanzania express a huge interest in having access to this sort of information, although they are not in a position to contribute to the development of the resource.) So, an African farmer trying to combat an insect infestation on her farm would find a translation of an English "agriculture" article that focuses on technology-intensive farming to be much less useful than an article started almost from scratch that addressed farming in the context of speakers of that language. It just happens that "agriculture" was the second article I clicked to by following links from the initial article on the pseudo-Swahili test site - what similar issues would arise on the fourth article, or the tenth, or the 997,032nd? There's also the issue that a great many of the current English Wikipedia articles are works in progress, of varying quality. Would you do a one time machine translation of the current Wikipedia, and ignore all future edits? Translate only "stable versions? Re-translate articles every time there is a change? Re-translate every time the Kamusi Project data is updated (hundreds of times a week)? Have the machine overwrite manual edits that someone did to machine translations, when the English version changes? Do this for dozens of African languages, and hundreds of languages around the world? I don't want to dismiss the entire endeavor, although I've been working on these issues for long enough to be sure that the undertaking is much more complicated than you're estimating. Here's where I think your translation project might prove useful: if a speaker of, for example, Swahili went searching for an entry that didn't already exist in the Swahili Wikipedia, an application could build a version of that page on-the-fly from the English version that is current at that moment. The Wikipedia user could then either (a) glean whatever information she could from the article and move on, (b) laugh uproariously, or (c) go into edit mode, work to turn the machine translation into something readable in Swahili, and save that version - which would then become the baseline page for that entry in that language, from which future edits could take off. In this way, you would get the best of both worlds - good articles written in the actual language whenever possible, and fingertip access to rough machine translations from English when articles are not initially available in the target language.

17 years, 8 months

[Foundation-l] WMF Approval of New Languages/Projects

by Daniel Bregman

Over at Incubator we've been deciding our policy with regard to starting new language (of an existing project) tests, and new project tests. We've come up with this: - New languages can create a test on Incubator quite easily, needing only to get a few people who will help. ( http://incubator.wikimedia.org/wiki/I:NTR) - New languages can create a full wiki using an approval process on Incubator, and it will be made (or not) after consensus has been reached. (http://incubator.wikimedia.org/wiki/I:NLR) - New projects will need approval from the foundation to have a test made, and then need further approval to make a full wiki. Is this acceptable? We are also not sure about what exactly the Foundation needs to approve. Views we got previously seemed to be slightly contradictory on this matter. Thank you for your replies, Dbmag9 (http://incubator.wikimedia.org/wiki/User:Dbmag9) -

17 years, 8 months

Re: [Foundation-l] WMF Approval of New Languages/Projects

by Daniel Bregman

Dear all, Thank you all for your comments. I agree with what Timichal has said for the most part. I think that there are still problems with the voting process, some of which can never be solved. A percent range of approval is a good idea; although it is again vulnerable. Now we need to find a foolproof method of notifying the foundation about what's going on at Incubator :). Thank you again for your thoughts. -Dbmag9 (http://incubator.wikimedia.org/wiki/User:Dbmag9)

17 years, 8 months

Re: [Foundation-l] WMF Approval of New Languages/Projects

by timichal

17 years, 8 months

[Foundation-l] Professional bookkeeping in the office

by Brad Patrick

I write to advise the community that we have successfully brought a bookkeeper in to work with Michael Davis on a part-time basis, Tricia Hoffman. Trish was referred to us by our audit firm and has more than ten years of complicated full-cycle accounting experience. During an initial transition phase, Michael will be working with Trish to learn the accounts and work with our internal processes. Following the certification and publication of our audited financial statements, we will resume publication of our financials, on what I anticipate will be a monthly basis. Trish has agreed to provide professional support for our accounting functions on a contract basis. After we have worked together for a while, we will reassess based on the amount of time that is appropriate for the tasks required. mav has had an extraordinary influence on WMF and is to be heartily congratulated for his willingness to jump in and work hard, especially in the earliest days of the projects. As I have come to appreciate the work that must be done to keep things going in the office, mav's energy and initiative become all the more apparent. I'm sure he will continue to contribute in multiple ways going forward. Thanks, mav! -Brad

17 years, 8 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Wikimedia-l August 2006