Wikimedia-l September 2007

wikimedia-l@lists.wikimedia.org

125 participants
84 discussions

[Foundation-l] Wikimania 2008 Public Meeting with Bidders

by Cary Bass

Please consider this a formal announcement that the public meeting with existing bidders will be held on September 23, at 1500UTC, which is: 8:00AM in San Francisco 11:00AM in New York 16:00 in London 23:00 in Taipei midnight in Tokyo (next day) and 01:00AM in Sydney. The meeting will take place on IRC, on freenode, on channel #wikimania2008. For instructions on how to obtain an IRC client or how to connect to the wikipedia IRC channels, see <http://en.wikipedia.org/wiki/Wikipedia:IRC_channels>. -- Cary Bass Volunteer Coordinator Wikimedia Foundation, Inc. Phone: 727.231.0101 Fax: 727.258.0207 E-Mail: cbass(a)wikimedia.org

16 years, 7 months

[Foundation-l] Wikimania bids and images

by Aphaia

Hi there, regretfully I found some questionable images / speedy deletions on meta around Wikimania 2008 bidding. To meta admins: Please be careful if you are going to speedy "not relevant images". We saw some actually relevant images to meta speeded ... yes Wikimania bid team uploading. Please remind six bidding cities and do not miss the relevance of some images. Also you would like to aware those bidders are not necessarily meta regulars and not familiar with meta policies or its lingua franca, English. To bidders: please be aware * meta accepts only images under {{GFDL}} or in {{PD}} (public domain}}. Do not forget to tag your images. No fair use! Thanks. * not all meta admin might know what is going on about your bidding --- meta is so big -- a shot of prospective party space might be thought as "not relevant to meta". Or Dorms. Or Internet cafes. Or the sightseeing places in your city. It may help to survive your images much easily for you to add "this image is used for [[Wikimania 2008]] bidding" in its description. The better way is to put the image on your bidding page as soon as you upload it. Anyway ... thank you for your every efforts, folks. I'm looking forward the next week meeting. See you later, -- KIZU Naoko Wikiquote: http://wikiquote.org * habent enim emolumentum in labore suo *

16 years, 7 months

[Foundation-l] Release of squid log data

by Erik Zachte

If a Chinese or Iranian university offered to sign a confidentiality agreement, would you accept it? Or an institute in another country where they exchange students with? I still remember the talk at Berlin, 21C3, Dec 2004, where inside info was given about the draconic measures China has taken to keep its citizen under control. According to the talk they have 30,000 IT personnel working on patrolling their electronic borders (estimate by 'Reporters without Borders'), and the best (US) equipment, loads of it. Those guys would love to parse these data. I am not questioning the integrity of current applicants at all. I do have doubts about where the data will ultimately end up, if gradually tens of institutions carry our viewer data on their portables, or in 2009 on 1 Tb memory sticks :) Pakistan got the blueprints for ultracentrifuges for producing nuclear bombs by a friendly student exchange project, from a small peaceful country in Western Europe. Sensitive scientific data tend to travel. Erik Zachte > Tim Starling wrote: > > For a while now, we've been releasing squid log data, stripped of > > personally identifying information such as IP addresses, to groups at > > two universities: Vrije Universiteit and the University of > Minnesota. We > > now have a request pending from a third group, at Universidad Rey Juan > > Carlos in Spain. They are asking if they can have the full data stream > > including IP addresses, and they are prepared to sign a confidentiality > > agreement to get it. >

16 years, 7 months

[Foundation-l] Warrantless (government) surveillance of reader activity. Was: Release of squid log data

by Gregory Maxwell

I'm splitting threads for a tangent here. Ray brought up an interesting subject in the log thread. On 9/15/07, Ray Saintonge <saintonge(a)telus.net> wrote: > Trust and signatures are not enough. How will they react if a > government demands the release of private information? If we determine > that we will not release it in the absence of a court order, what > recourse do we have if the acquirers are not willing to resist a > government order in the courts? In some jurisdictions there may be no > such right to challenge such an order. As it stands right now wide scale illicit surveillance of reader activity would not be much of a challenge for a well funded group such as a government, all it requires is the ability to intercept the links which carry the traffic. Outside of government activity, ISPs and their employees also have access to this data. We could substantially mitigate this risk by scaling our SSL handling ability able to the point where it can handle a substantial portion of the traffic coming to our site and then take measures to encourage readers to do this. Then someone wishing to intercept reader activity would be forced to either compromise reader systems, come to us, or disclose that they know how to break SSL. Scaling up our SSL handling is possible but not without considerable capital and non-zero operating costs. Squid can act as a SSL accelerator, but we may need to purchase addition hardware (crypto cards, more cpus, etc) and we would need to deal with potentially buggy paths in the code. ... but these are technical matters which belong on another list. The appropriate question for foundation-l is, should we be spending some money to do something like this?

16 years, 7 months

[Foundation-l] To -l or not to -l?

by Sean Whitton

Hey all, Some time ago it was posted onto this list that the Wikimedia mailing lists had been moved onto lists.wikimedia.org and, if I recall correctly, we were told that we should stop using -l on the end (of new lists) as this distinction didn't make sense anymore. When I created the ComProj mailing list, I just requested comproj@ However, I am noticing that a lot of new lists are being created with -l. Yes, it is tempting, it is very much in my head to... but is there a consensus of which way to go? It seems to me that we should do one thing or the other. Or am I missing something? Thanks, Sean

16 years, 7 months

[Foundation-l] Board election voting system (was: Re: Board size).

by Jeandré du Toit

> Regardless the timing of expansion (either this year or next) and who > will be apppointed member (including if Jimmy will run for the > Election or not), I would add there is one another issue we might take > into consideration: the voting system. In the latest election, and > after that, it was in an argument whether approval voting was the best > system to choose the representatives of the Wikimedia community in our > current circumstances. The discussion seems to have left without > conclusion. It was clear that one month before the Election would be > too late to pick it up, so I expect the community consider the way of > vote counting regardless who they'll vote. > Cheers, > -- > KIZU Naoko See also <http://xrl.us/56rt> ([[m:Requests for comments/Board Election 2007#Was approval voting the best choice for this election? If not, why not? What substitute would you suggest?]]). -- User:Jeandré du Toit

16 years, 7 months

[Foundation-l] Board size

by Stephen Bain

Silly me, I forgot to replace the subject properly. On 9/11/07, Stephen Bain <stephen.bain(a)gmail.com> wrote: > I don't know whether the Board wants community input on this or not, > but I suspect there will be community members who would like to give > their input anyway. > > From the "Board meeting planned in october" thread: > > On 9/10/07, Florence Devouard <anthere(a)anthere.org> wrote: > > > > During the board meeting, there should be discussions over whether to > > expand the board to 9, or keep it for now at 7. A couple of names are > > currently floating around. > > There may be a change in the terms of the appointed members. > > Based on the board expansion resolution of December last year [1], I > would have expected that the Board would be expanded to 9 in July next > year, with three more elected seats to be up for election at that > time. > > -- > [1] http://wikimediafoundation.org/wiki/Resolution:Board_expansion > > -- > Stephen Bain > stephen.bain(a)gmail.com > > _______________________________________________ > foundation-l mailing list > foundation-l(a)lists.wikimedia.org > http://lists.wikimedia.org/mailman/listinfo/foundation-l > -- Stephen Bain stephen.bain(a)gmail.com

16 years, 7 months

[Foundation-l] Let's switch to CC-BY-SA

by Axel Boldt

The Wikimedia projects should switch from the GFDL to the CC-BY-SA license. Why to switch ============= When we started, the CC-BY-SA didn't exist and GFDL was the only available license that expressed the "free-to-use-and-modify-but- creators-need-to-be-attributed-and-the-license-cannot-be-changed" idea for textual materials. Since then, we have largely ignored the more arcane features of the GFDL, essentially telling our users "If you keep the license and provide a link back to the original, you're welcome to use our materials." In other words, we have always used GFDL as if it were CC-BY-SA. This practice is unfair for two reasons: * People who want to use our content have to trust that we won't enforce the more arcane features of the GFDL in the future, such as the requirement to change the article's title or to explicitly list at least five principal authors. * Contributors to Wikimedia projects have to trust that no one will exploit the GFDL in the future and encumber their materials with non-changeable text ("invariant sections"). By contrast, the CC-BY-SA license has the following advantages: * It is simple and fits our precise requirements. * It is promoted, maintained and translated by an active organization, Creative Commons. * It is better known and more widely used than the GFDL, at least outside of Wikimedia projects, increasing the potential for re-use and collaboration. We should do the right thing, bring theory and practice into alignment, and switch to the CC-BY-SA license once and for all. How to switch ============= Here's the plan: we issue a press release and post a prominent website banner, saying that from some specified date on, the current and all future versions of all materials on Wikimedia servers will be considered released under CC-BY-SA. Any content creator who does not agree with this change is invited to have their materials removed before that date. I don't see how any good-faith contributor who has researched the licenses could disagree with this change and prefer GFDL over CC-BY-SA. A small group of disgruntled former contributors will probably use the occasion to get their material wiped from our servers, and I don't see anything wrong with that. Some trolls will attempt to game the system, but we can deal with that. All materials in the history up to the specified deadline should probably remain available under GFDL only; this makes it easier to deal with the material of contributors who disagree with the change. And we need to find some way to deal with discussion and policy pages. I realize that this opt-out procedure is not perfectly clean from a legalistic standpoint, but neither is our current distortion of the GFDL. If we look at it pragmatically, considering what YouTube and the Internet Archive can get away with, there doesn't seem to be any appreciable danger that we could be successfully sued over this matter; the number of true copyright violations that appear on Wikipedia every day are a much bigger cause for concern. And ethically speaking, there's nothing wrong with the opt-out approach since the two licenses are, in essence and intent, identical. --Axel ____________________________________________________________________________________ Shape Yahoo! in your own image. Join our Network Research Panel today! http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7

16 years, 7 months

[Foundation-l] Fwd: [Fwd: Re: Information for the Wikimedia user community]

by Michael Bimmler

---Forwarded on request of the sender, who is not subscribed to the list---- tstarling said: > I have posted to our public mailing list "foundation-l" about your > project and your request for private data. They have the following > questions: > * How many people would require access to the data? > * What is your research goal? Is it technical or sociological study? > * How long would you need to collect data for, and how long would you > store it? > You can reply to me, or you can read and reply to the thread itself: Hi, I've been reading the thread, and I'll try to address your general concerns and your specific questions. Since I'm not a subscriber of foundation-l, please, CC me in your answers. [Sorry for the long post. If you want a summary, and just the answer to those specific questions, go straight to the end] First of all, some backgrund. We at the GSyC/LibreSoft have been researching the libre (free, open source) software development community for years. We focus mainly on public data, such as CVS/SVN repositories or mailing lists. With that (massive) information, we try to improve the understanding the development and maintenance of libre software. In this area, we have seldom used non-public data, when it was not distributred for privacy-related reasons. For instance, that applies to some private data of SourceForge users (distributed for academic users by University of Notre Dame [1]), or to the actual archives of mailing lists as kept by some projects (in many cases, public versions of the archives do not include real email address because of spam-related issues). In theses cases, having access to that specific information allowed us to research some aspects (such as geographical origin of developers and participants in libre software communities) which would be impossible otherwise. About two years ago, we found that Wikipedia was an interesting case, from a research point of view, for many reasons. Felipe Ortega, one of our PhD students, started to explore that way by building the WikiXRay tool [2], and using it to perform several studies using Wikipedia dumps as source data. Now, we're exploring a new line which has more to do with the system that provides the Wikipedia service. The long-shot goal is to understand it, to profile it, and to find ways to improve it. From an academic point of view, the Wikimedia system is real gold: one of the top Internet sites, with almost all the information (content, architechture, etc.) available for inspection. Both from a pedagogical and from a research viewpoint, it is rather interesting. When we (that's mainly Felipe Ortega, Antonio Reinoso, in CC, and Gregorio Robles) started to consider the Wikimedia system from an architectural and networking point of view, one of the first issues that were raised were the convenience of having access to reliable statistics about its behavior. Antonio contacted Tim for that, and it seems that the easiest data to be provided was those sampled dumps of Squid logs that we're now talking about. This is all for context. Now, before entering in the details of privacy-related information, let me also say that we would like to work with you to find the more appropriate way of getting as much non-privacy related information, suitable for research, that you may consider reasonable. And of course, to find ways of making it available to the research community as a whole. Thanks to your transparency and sharing of knowledge ideals, and to the technical relevance of the site, with time the Wikimedia system could be one of the canonical case studies for the research community, and we would like to help to make that happen. For instance, intrumenting (or maybe just logging) Wikimedia software in the proper way, we could for instance profile different kinds of requests (from the Squid front end to the database), identify bottlenecks, measure delays and bandwidths in different steps in the interactions, etc. This said, we're of course ready to respect your policies. If for reason you prefer to only provide that data yourself, or to trim it from such or such other information (for privacy or other reasons), that's ok. What I would like is to identify the information you could provide which is useful for research, while letting you happy, not harming performance, respecting your policies, etc. With respect to privacy information, I fully understand your concerns, and I'm also familiar with them, because of our previous work with the libre software community. In fact, after some years of experience, we've found that in many cases the best thing is to work jointly with projects to identify which information and how, can be make available, maybe under different conditions, to specific research groups or to the public at large. I would like to do the same with Wikimedia, if possible. Now, your specific questions (I understand that they refer to information that could be used with some ease to track indivudual identities). > * How many people would require access to the data? As little as possible. To start with, just Antonio and me, and probably other researchers at my group. However, maybe this could be used as a test-case to define some conditions that could be offered in the future to other research groups. To be honest, I won't like to be the only group with access to such data, since any study we make on them won't be reproductible by others, and therefore it can hardly be called research... > * What is your research goal? Is it technical or sociological study? Both. In the specific case of IP addresses, I would like to use it mainly for geotargeting, which would allow for several interesting studies. For instance, in the "sociological" side, it would be nice to know the share of different countries for certain language editions of the wikipedia (both in edits and reads): consider the case of English, Spanish or Chinese, which for sure present different patterns. But it can also be used to undertand how proxies are dealing with requests from different mega-carriers. Or to identify crawlers and similar. Of course, there are in some cases alternate ways of doing this kind of research, but in most cases having IP addresses is the straightest way, or the most reliable. For now, we're not interested in individual patterns, and that's why 1/1,000 and even smaller samples are more than enough, if they are reasonably non-biased. > * How long would you need to collect data for, and how long would you > store it? Ideally, we would like to do it continuously over time, since the dynamic evolution is quite interesting. But we're of course ready to impse time limits if needed. In summary, we are very thankful to the Wikimedia community for providing as much information as you are providing now. We hope to go on using it to better understand Wikipedia, Wikimedia systems, etc. But we would also like to work with you to identify other sources of information, which are currently not provided, but maybe could be without harming Wikimedia or its users, and would be of great interest for the research community. And would like to to all this in a way that other research groups may also benefit from the data. As somebody said in the previous thread, most of this can be done either: (1) by providing the data to researchers, or (2) by asking researchers to write scripts or the like that run at Wikimedia facilties, producing the output that would be sent to the researchers without actually delivering the source data. We would prefer (1) becase it depends less on the resources that Wikimedia may have for implementing (2), because maybe (2) won't scale if many groups start using those data, and because (2) makes review and reproductibility of research more difficult. But if you prefer (2) (or prefer it in some specific cases, such as the IP addresses of client machines), let's see how we can implement it. Again, sorry for the long message, and thanks for reading up to here. I'd be happy to answer any comment you may have. Saludos, Jesus. [1] http://www.nd.edu/~oss/Data/data.html [2] http://meta.wikimedia.org/wiki/WikiXRay

16 years, 7 months

[Foundation-l] UvA what they need in log data from the Wikimedia Foundation

by GerardM

Hoi, The university of Amsterdam (UvA) is getting log information that is thoroughly anonymised to the point where it becomes not as useful as it should be. The UvA is working on what they call a "peer to peer Wikipedia". Their interest in the data is not in the specific IP number of a requester for information, their interest is in where a request is coming from. The point is that is best, fastest and cheapest when information is available from a peer that is close by. When you consider that there is a wikipedia.ky a project that is outside of the WMF where the justification is that it is expensive to get information from outside the country, you will appreciate that a cache in Kirghistan would make the reason for being for this project disappear. A peer to peer Wikipedia allows for having peers in all parts of the world and the information would as a result be potentially locally viable in countries like Kirghistan but also in Africa and China.. In order to build this system it is necessary to understand how the need for information develops. To build efficient routines that bring the information in a sufficient number to the caches that are local to the requesters and in order to ensure that data will be persistently available, it is necessary for the UvA to have geographically relevant information on requests to the WMF servers. The UvA is one of the top universities in this field. In Andrew Tannenbaum they have one of the leading thinkers and architects on computer and Internet architecture as one of their staff. It is for this reason that I again and this time publicly ask for the UvA to have this information. With a peer to peer Wikipedia infrastructure the need for funding of the Wikimedia Foundation will be significantly reduced. Before this project is finished however, it may be two years down the road.. However Thanks, GerardM

16 years, 7 months

← Newer
1
2
3
4
5
6
7
8
9
Older →

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Wikimedia-l September 2007