Please consider this a formal announcement that the public meeting with
existing bidders will be held on September 23, at 1500UTC, which is:
8:00AM in San Francisco
11:00AM in New York
16:00 in London
23:00 in Taipei
midnight in Tokyo (next day)
and 01:00AM in Sydney.
The meeting will take place on IRC, on freenode, on channel
#wikimania2008. For instructions on how to obtain an IRC client or how
to connect to the wikipedia IRC channels, see
<http://en.wikipedia.org/wiki/Wikipedia:IRC_channels>.
--
Cary Bass
Volunteer Coordinator
Wikimedia Foundation, Inc.
Phone: 727.231.0101
Fax: 727.258.0207
E-Mail: cbass(a)wikimedia.org
Hi there,
regretfully I found some questionable images / speedy deletions on
meta around Wikimania 2008 bidding.
To meta admins:
Please be careful if you are going to speedy "not relevant images". We
saw some actually relevant images to meta speeded ... yes Wikimania
bid team uploading. Please remind six bidding cities and do not miss
the relevance of some images. Also you would like to aware those
bidders are not necessarily meta regulars and not familiar with meta
policies or its lingua franca, English.
To bidders:
please be aware
* meta accepts only images under {{GFDL}} or in {{PD}} (public
domain}}. Do not forget to tag your images. No fair use! Thanks.
* not all meta admin might know what is going on about your bidding
--- meta is so big -- a shot of prospective party space might be
thought as "not relevant to meta". Or Dorms. Or Internet cafes. Or the
sightseeing places in your city. It may help to survive your images
much easily for you to add "this image is used for [[Wikimania 2008]]
bidding" in its description. The better way is to put the image on
your bidding page as soon as you upload it.
Anyway ... thank you for your every efforts, folks. I'm looking
forward the next week meeting.
See you later,
--
KIZU Naoko
Wikiquote: http://wikiquote.org
* habent enim emolumentum in labore suo *
If a Chinese or Iranian university offered to sign a confidentiality
agreement, would you accept it? Or an institute in another country where
they exchange students with?
I still remember the talk at Berlin, 21C3, Dec 2004, where inside info was
given about the draconic measures China has taken to keep its citizen under
control. According to the talk they have 30,000 IT personnel working on
patrolling their electronic borders (estimate by 'Reporters without
Borders'), and the best (US) equipment, loads of it. Those guys would love
to parse these data.
I am not questioning the integrity of current applicants at all. I do have
doubts about where the data will ultimately end up, if gradually tens of
institutions carry our viewer data on their portables, or in 2009 on 1 Tb
memory sticks :)
Pakistan got the blueprints for ultracentrifuges for producing nuclear bombs
by a friendly student exchange project, from a small peaceful country in
Western Europe. Sensitive scientific data tend to travel.
Erik Zachte
> Tim Starling wrote:
> > For a while now, we've been releasing squid log data, stripped of
> > personally identifying information such as IP addresses, to groups at
> > two universities: Vrije Universiteit and the University of
> Minnesota. We
> > now have a request pending from a third group, at Universidad Rey Juan
> > Carlos in Spain. They are asking if they can have the full data stream
> > including IP addresses, and they are prepared to sign a confidentiality
> > agreement to get it.
>
I'm splitting threads for a tangent here. Ray brought up an
interesting subject in the log thread.
On 9/15/07, Ray Saintonge <saintonge(a)telus.net> wrote:
> Trust and signatures are not enough. How will they react if a
> government demands the release of private information? If we determine
> that we will not release it in the absence of a court order, what
> recourse do we have if the acquirers are not willing to resist a
> government order in the courts? In some jurisdictions there may be no
> such right to challenge such an order.
As it stands right now wide scale illicit surveillance of reader
activity would not be much of a challenge for a well funded group such
as a government, all it requires is the ability to intercept the links
which carry the traffic.
Outside of government activity, ISPs and their employees also have
access to this data.
We could substantially mitigate this risk by scaling our SSL handling
ability able to the point where it can handle a substantial portion of
the traffic coming to our site and then take measures to encourage
readers to do this. Then someone wishing to intercept reader activity
would be forced to either compromise reader systems, come to us, or
disclose that they know how to break SSL.
Scaling up our SSL handling is possible but not without considerable
capital and non-zero operating costs. Squid can act as a SSL
accelerator, but we may need to purchase addition hardware (crypto
cards, more cpus, etc) and we would need to deal with potentially
buggy paths in the code. ... but these are technical matters which
belong on another list.
The appropriate question for foundation-l is, should we be spending
some money to do something like this?
Hey all,
Some time ago it was posted onto this list that the Wikimedia mailing
lists had been moved onto lists.wikimedia.org and, if I recall
correctly, we were told that we should stop using -l on the end (of
new lists) as this distinction didn't make sense anymore. When I
created the ComProj mailing list, I just requested comproj@
However, I am noticing that a lot of new lists are being created with
-l. Yes, it is tempting, it is very much in my head to... but is there
a consensus of which way to go? It seems to me that we should do one
thing or the other. Or am I missing something?
Thanks,
Sean
> Regardless the timing of expansion (either this year or next) and who
> will be apppointed member (including if Jimmy will run for the
> Election or not), I would add there is one another issue we might take
> into consideration: the voting system. In the latest election, and
> after that, it was in an argument whether approval voting was the best
> system to choose the representatives of the Wikimedia community in our
> current circumstances. The discussion seems to have left without
> conclusion. It was clear that one month before the Election would be
> too late to pick it up, so I expect the community consider the way of
> vote counting regardless who they'll vote.
> Cheers,
> --
> KIZU Naoko
See also <http://xrl.us/56rt> ([[m:Requests for comments/Board
Election 2007#Was approval voting the best choice for this election?
If not, why not? What substitute would you suggest?]]).
--
User:Jeandré du Toit
Silly me, I forgot to replace the subject properly.
On 9/11/07, Stephen Bain <stephen.bain(a)gmail.com> wrote:
> I don't know whether the Board wants community input on this or not,
> but I suspect there will be community members who would like to give
> their input anyway.
>
> From the "Board meeting planned in october" thread:
>
> On 9/10/07, Florence Devouard <anthere(a)anthere.org> wrote:
> >
> > During the board meeting, there should be discussions over whether to
> > expand the board to 9, or keep it for now at 7. A couple of names are
> > currently floating around.
> > There may be a change in the terms of the appointed members.
>
> Based on the board expansion resolution of December last year [1], I
> would have expected that the Board would be expanded to 9 in July next
> year, with three more elected seats to be up for election at that
> time.
>
> --
> [1] http://wikimediafoundation.org/wiki/Resolution:Board_expansion
>
> --
> Stephen Bain
> stephen.bain(a)gmail.com
>
> _______________________________________________
> foundation-l mailing list
> foundation-l(a)lists.wikimedia.org
> http://lists.wikimedia.org/mailman/listinfo/foundation-l
>
--
Stephen Bain
stephen.bain(a)gmail.com
The Wikimedia projects should switch from the GFDL to the CC-BY-SA
license.
Why to switch
=============
When we started, the CC-BY-SA didn't exist and GFDL was the only
available license that expressed the "free-to-use-and-modify-but-
creators-need-to-be-attributed-and-the-license-cannot-be-changed"
idea for textual materials. Since then, we have largely ignored
the more arcane features of the GFDL, essentially telling our users "If
you keep the license and provide a link back to the original, you're
welcome to use our materials." In other words, we have always used GFDL
as if it were CC-BY-SA. This practice is unfair for two reasons:
* People who want to use our content have to trust that we won't
enforce the more arcane features of the GFDL in the future, such as the
requirement to change the article's title or to explicitly list at
least
five principal authors.
* Contributors to Wikimedia projects have to trust that no one will
exploit the GFDL in the future and encumber their materials with
non-changeable text ("invariant sections").
By contrast, the CC-BY-SA license has the following advantages:
* It is simple and fits our precise requirements.
* It is promoted, maintained and translated by an active
organization, Creative Commons.
* It is better known and more widely used than the GFDL, at least
outside of Wikimedia projects, increasing the potential for re-use and
collaboration.
We should do the right thing, bring theory and practice into alignment,
and switch to the CC-BY-SA license once and for all.
How to switch
=============
Here's the plan: we issue a press release and post a prominent website
banner, saying that from some specified date on, the current and all
future versions of all materials on Wikimedia servers will be
considered
released under CC-BY-SA. Any content creator who does not agree with
this change is invited to have their materials removed before that
date.
I don't see how any good-faith contributor who has researched the
licenses could disagree with this change and prefer GFDL over CC-BY-SA.
A small group of disgruntled former contributors will probably use the
occasion to get their material wiped from our servers, and I don't see
anything wrong with that. Some trolls will attempt to game the system,
but we can deal with that.
All materials in the history up to the specified deadline should
probably remain available under GFDL only; this makes it easier to deal
with the material of contributors who disagree with the change. And we
need to find some way to deal with discussion and policy pages.
I realize that this opt-out procedure is not perfectly clean from a
legalistic standpoint, but neither is our current distortion of the
GFDL. If we look at it pragmatically, considering what YouTube and the
Internet Archive can get away with, there doesn't seem to be any
appreciable danger that we could be successfully sued over this matter;
the number of true copyright violations that appear on Wikipedia every
day are a much bigger cause for concern. And ethically speaking,
there's
nothing wrong with the opt-out approach since the two licenses are, in
essence and intent, identical.
--Axel
____________________________________________________________________________________
Shape Yahoo! in your own image. Join our Network Research Panel today! http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7
---Forwarded on request of the sender, who is not subscribed to the list----
tstarling said:
> I have posted to our public mailing list "foundation-l" about your
> project and your request for private data. They have the following
> questions:
> * How many people would require access to the data?
> * What is your research goal? Is it technical or sociological study?
> * How long would you need to collect data for, and how long would you
> store it?
> You can reply to me, or you can read and reply to the thread itself:
Hi,
I've been reading the thread, and I'll try to address your general
concerns and your specific questions. Since I'm not a subscriber of
foundation-l, please, CC me in your answers. [Sorry for the long post.
If you want a summary, and just the answer to those specific questions,
go straight to the end]
First of all, some backgrund. We at the GSyC/LibreSoft have been
researching the libre (free, open source) software development community
for years. We focus mainly on public data, such as CVS/SVN repositories
or mailing lists. With that (massive) information, we try to improve the
understanding the development and maintenance of libre software.
In this area, we have seldom used non-public data, when it was not
distributred for privacy-related reasons. For instance, that applies to
some private data of SourceForge users (distributed for academic users
by University of Notre Dame [1]), or to the actual archives of mailing
lists as kept by some projects (in many cases, public versions of the
archives do not include real email address because of spam-related
issues). In theses cases, having access to that specific information
allowed us to research some aspects (such as geographical origin of
developers and participants in libre software communities) which would
be impossible otherwise.
About two years ago, we found that Wikipedia was an interesting case,
from a research point of view, for many reasons. Felipe Ortega, one of
our PhD students, started to explore that way by building the WikiXRay
tool [2], and using it to perform several studies using Wikipedia dumps
as source data.
Now, we're exploring a new line which has more to do with the system
that provides the Wikipedia service. The long-shot goal is to understand
it, to profile it, and to find ways to improve it. From an academic
point of view, the Wikimedia system is real gold: one of the top
Internet sites, with almost all the information (content,
architechture, etc.) available for inspection. Both from a pedagogical
and from a research viewpoint, it is rather interesting.
When we (that's mainly Felipe Ortega, Antonio Reinoso, in CC, and
Gregorio Robles) started to consider the Wikimedia system from an
architectural and networking point of view, one of the first issues that
were raised were the convenience of having access to reliable statistics
about its behavior. Antonio contacted Tim for that, and it seems that
the easiest data to be provided was those sampled dumps of Squid logs
that we're now talking about.
This is all for context. Now, before entering in the details of
privacy-related information, let me also say that we would like to work
with you to find the more appropriate way of getting as much non-privacy
related information, suitable for research, that you may consider
reasonable. And of course, to find ways of making it available to the
research community as a whole. Thanks to your transparency and sharing
of knowledge ideals, and to the technical relevance of the site, with
time the Wikimedia system could be one of the canonical case studies for
the research community, and we would like to help to make that happen.
For instance, intrumenting (or maybe just logging) Wikimedia software in
the proper way, we could for instance profile different kinds of
requests (from the Squid front end to the database), identify
bottlenecks, measure delays and bandwidths in different steps in the
interactions, etc.
This said, we're of course ready to respect your policies. If for reason
you prefer to only provide that data yourself, or to trim it from such
or such other information (for privacy or other reasons), that's ok.
What I would like is to identify the information you could provide which
is useful for research, while letting you happy, not harming
performance, respecting your policies, etc.
With respect to privacy information, I fully understand your concerns,
and I'm also familiar with them, because of our previous work with the
libre software community. In fact, after some years of experience, we've
found that in many cases the best thing is to work jointly with projects
to identify which information and how, can be make available, maybe
under different conditions, to specific research groups or to the public
at large. I would like to do the same with Wikimedia, if possible.
Now, your specific questions (I understand that they refer to
information that could be used with some ease to track indivudual
identities).
> * How many people would require access to the data?
As little as possible. To start with, just Antonio and me, and probably
other researchers at my group. However, maybe this could be used as a
test-case to define some conditions that could be offered in the future
to other research groups. To be honest, I won't like to be the only
group with access to such data, since any study we make on them won't be
reproductible by others, and therefore it can hardly be called
research...
> * What is your research goal? Is it technical or sociological study?
Both. In the specific case of IP addresses, I would like to use it
mainly for geotargeting, which would allow for several interesting
studies. For instance, in the "sociological" side, it would be nice to
know the share of different countries for certain language editions of
the wikipedia (both in edits and reads): consider the case of English,
Spanish or Chinese, which for sure present different patterns. But it
can also be used to undertand how proxies are dealing with requests from
different mega-carriers. Or to identify crawlers and similar.
Of course, there are in some cases alternate ways of doing this kind of
research, but in most cases having IP addresses is the straightest way,
or the most reliable.
For now, we're not interested in individual patterns, and that's why
1/1,000 and even smaller samples are more than enough, if they are
reasonably non-biased.
> * How long would you need to collect data for, and how long would you
> store it?
Ideally, we would like to do it continuously over time, since the
dynamic evolution is quite interesting. But we're of course ready to
impse time limits if needed.
In summary, we are very thankful to the Wikimedia community for
providing as much information as you are providing now. We hope to go on
using it to better understand Wikipedia, Wikimedia systems, etc. But we
would also like to work with you to identify other sources of
information, which are currently not provided, but maybe could be
without harming Wikimedia or its users, and would be of great interest
for the research community. And would like to to all this in a way that
other research groups may also benefit from the data.
As somebody said in the previous thread, most of this can be done
either:
(1) by providing the data to researchers, or
(2) by asking researchers to write scripts or the like that run at
Wikimedia facilties, producing the output that would be sent to the
researchers without actually delivering the source data.
We would prefer (1) becase it depends less on the resources that
Wikimedia may have for implementing (2), because maybe (2) won't scale
if many groups start using those data, and because (2) makes review and
reproductibility of research more difficult. But if you prefer (2) (or
prefer it in some specific cases, such as the IP addresses of client
machines), let's see how we can implement it.
Again, sorry for the long message, and thanks for reading up to here.
I'd be happy to answer any comment you may have.
Saludos,
Jesus.
[1] http://www.nd.edu/~oss/Data/data.html
[2] http://meta.wikimedia.org/wiki/WikiXRay
Hoi,
The university of Amsterdam (UvA) is getting log information that is
thoroughly anonymised to the point where it becomes not as useful as it
should be. The UvA is working on what they call a "peer to peer Wikipedia".
Their interest in the data is not in the specific IP number of a requester
for information, their interest is in where a request is coming from. The
point is that is best, fastest and cheapest when information is available
from a peer that is close by.
When you consider that there is a wikipedia.ky a project that is outside of
the WMF where the justification is that it is expensive to get information
from outside the country, you will appreciate that a cache in Kirghistan
would make the reason for being for this project disappear. A peer to peer
Wikipedia allows for having peers in all parts of the world and the
information would as a result be potentially locally viable in countries
like Kirghistan but also in Africa and China..
In order to build this system it is necessary to understand how the need for
information develops. To build efficient routines that bring the information
in a sufficient number to the caches that are local to the requesters and in
order to ensure that data will be persistently available, it is necessary
for the UvA to have geographically relevant information on requests to the
WMF servers.
The UvA is one of the top universities in this field. In Andrew Tannenbaum
they have one of the leading thinkers and architects on computer and
Internet architecture as one of their staff. It is for this reason that I
again and this time publicly ask for the UvA to have this information.
With a peer to peer Wikipedia infrastructure the need for funding of the
Wikimedia Foundation will be significantly reduced. Before this project is
finished however, it may be two years down the road.. However
Thanks,
GerardM