For a while now, we've been releasing squid log data, stripped of personally identifying information such as IP addresses, to groups at two universities: Vrije Universiteit and the University of Minnesota. We now have a request pending from a third group, at Universidad Rey Juan Carlos in Spain. They are asking if they can have the full data stream including IP addresses, and they are prepared to sign a confidentiality agreement to get it.
I'm leaning towards letting them have it. Via the confidentiality agreement, we can avoid the most likely abuse scenarios, such as release of individual user profiles. Currently we let toolserver users process similar data, assisted by Wikipedia administrators who put web bugs on the site. They use it to produce the WikiCharts report. Are we to tell prospective research groups to use the toolserver, rather than their own substantial hardware, for analysis of Wikipedia traffic patterns?
I'm not sure if this would be allowed on the privacy policy, which does mention statistics, but doesn't say who is making them. Maybe the use of web bugs by administrators is already against the privacy policy. In any case, I think the question would benefit from community discussion, which is why I am posting it here.
-- Tim Starling
On 14/09/2007, Tim Starling tstarling@wikimedia.org wrote:
In any case, I think the question would benefit from community discussion, which is why I am posting it here.
It might be helpful (to prevent uninformed ramblings) if we could have a draft of the proposed confidentiality agreement, or at least a rough bulletpoint of what it would cover. Unless that's confidential ;-)
I assume the data processing and handling would be done in Spain? It's certainly much less of a legal headache to shift the data to Europe rather than from Europe...
Andrew Gray wtoye:
It might be helpful (to prevent uninformed ramblings) if we could have a draft of the proposed confidentiality agreement, or at least a rough bulletpoint of what it would cover. Unless that's confidential ;-)
I assume the data processing and handling would be done in Spain? It's certainly much less of a legal headache to shift the data to Europe rather than from Europe...
I'm with Andrew here; it depends on the terms and the scope of access within their community.
I'm also interested in what their research goal is. Is it technical or sociological study?
And - to try and not be uninformed - do we need to have a data disclosure policy? The edit history is a very valuable research dataset, should we have a "for academic research, under an appropriate non-disclosure agreement" as part of the privacy policy?
Brian McNeil
Andrew Gray wrote:
On 14/09/2007, Tim Starling tstarling@wikimedia.org wrote:
In any case, I think the question would benefit from community discussion, which is why I am posting it here.
It might be helpful (to prevent uninformed ramblings) if we could have a draft of the proposed confidentiality agreement, or at least a rough bulletpoint of what it would cover. Unless that's confidential ;-)
It hasn't been written yet.
I assume the data processing and handling would be done in Spain? It's certainly much less of a legal headache to shift the data to Europe rather than from Europe...
Yes.
-- Tim Starling
I'd be all for that. Helping out academic enlightenment is one of the goals of, well, every wikimedia project i know, this is simply another aspect of that. But, really, i've never been one to really care about my own privacy, i don't much care what people know about me and whether or not they like it. There are plenty out there who do, however. Making a prelim of the confidentiality agreement and publishing it might help them. It's already got my support though.
On 9/14/07, Tim Starling tstarling@wikimedia.org wrote:
Andrew Gray wrote:
On 14/09/2007, Tim Starling tstarling@wikimedia.org wrote:
In any case, I think the question would benefit from community discussion, which is why I am posting it here.
It might be helpful (to prevent uninformed ramblings) if we could have a draft of the proposed confidentiality agreement, or at least a rough bulletpoint of what it would cover. Unless that's confidential ;-)
It hasn't been written yet.
I assume the data processing and handling would be done in Spain? It's certainly much less of a legal headache to shift the data to Europe rather than from Europe...
Yes.
-- Tim Starling
foundation-l mailing list foundation-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/foundation-l
I'd be all for that. Helping out academic enlightenment is one of the goals of, well, every wikimedia project i know, this is simply another aspect of that. But, really, i've never been one to really care about my own privacy, i don't much care what people know about me and whether or not they like it. There are plenty out there who do, however. Making a prelim of the confidentiality agreement and publishing it might help them. It's already got my support though.
On 9/14/07, Tim Starling tstarling@wikimedia.org wrote:
For a while now, we've been releasing squid log data,
Is there a public url for accessing that data?
Mathias
And just two question: Do they need the actual IP-address or would just a distinct number to tell different IP addresses be sufficient?
When you say stripped of personally identifying information, does this include information such as search queries to our side that might to a certain degree be used to identify persons? People digging into the AOL-data did not need IP addresses to identify individual people.
Mathias Schindler wrote:
On 9/14/07, Tim Starling tstarling@wikimedia.org wrote:
For a while now, we've been releasing squid log data,
Is there a public url for accessing that data?
What data? You mean information about the project? The data itself is only available as a UDP stream, there's no URL.
And just two question: Do they need the actual IP-address or would just a distinct number to tell different IP addresses be sufficient?
I wouldn't recommend using a hashed IP address to anyone involved in academic work. I've worked in the academic sector, I know how important it is for data to be above any criticism. Any data using unique IP addresses as an estimate of individual user population would be severely skewed by proxies and NAT.
When you say stripped of personally identifying information, does this include information such as search queries to our side that might to a certain degree be used to identify persons? People digging into the AOL-data did not need IP addresses to identify individual people.
Yes it includes search queries, user page queries, etc., but they're all mixed in together in a homogeneous stream. There is no referrer data or user agent data. So there is no way to correlate requests.
Also, we are only sending them 1 in every 10 requests. You can't tell much about a person from one tenth of their requests, uniformly mixed in with requests from 100 million other people.
-- Tim Starling
On 9/14/07, Tim Starling tstarling@wikimedia.org wrote:
I wouldn't recommend using a hashed IP address to anyone involved in academic work. I've worked in the academic sector, I know how important it is for data to be above any criticism. Any data using unique IP addresses as an estimate of individual user population would be severely skewed by proxies and NAT.
Perhaps in order to prevent potentially violating our own privacy policy, we can meet the researchers half-way. If we can find out the reason they need IP addresses we can craft the data we send them to satisfy their request. For example:
a) they could just need the unique addresses to link together browsing patterns, but not care for them to be IP addresses. We could create convert the addresses into a unique number (or a salted hash) and send them the data.
b) they could be looking for network topology information; we could give them the first two or three octets of the IP address.
c) they could be looking for geographical distribution of queries; we could do the geo-lookup of addresses and give them coordinate resolution for each address instead of the address itself.
Obviously, a b and c are all somewhat contentious still, but probably less so than just giving them raw IP addresses, and could be a good compromise.
-ilya
On 9/14/07, Ilya Haykinson haykinson@gmail.com wrote:
If we can find out the reason they need IP addresses we can craft the data we send them to satisfy their request. For example:
Two years ago*, when we didn't actually have the data to release, I proposed a two pronged approach, restated here:
(1) Make as much of the non-private data public as we safely can, this maximizes the public value of this data and avoids the harm that picking favorites by sharing valuable data (commercially valuable as well as a academically valuable) with only certain groups. Plus it scales much better.
(2) Offer to run reasonable aggregation scripts for those who can describe a need for access to data we protect. For example, if they wanted to analyze article views vs country of origin the script could look up the countries and only disclose that.
If the needs of a researcher can't be met by data scrubbed with a custom aggregator, then I must question the usefulness of their research: If it's not possible to convert the research data into an aggregate result which has no privacy problems then the underlying data driving their research would be unpublishable, unrepeatable, and unverifiable.
Keep in mind that well over 99% of the people potentially impacted by this aren't our "community", they aren't people who have already agreed to lose a little privacy by making public edits... they are just readers.
It is my understanding that public libraries do not generally disclose detailed use records like this for outside research. Google and the other search engines fought in court to avoid providing the US government search log data.
I'm also disappointed with the standard of care provided of some other academic Wikipedia data researchers in recent memory.
So long as there exist *reasonable alternatives* I'm having a hard time seeing the justification for this proposed disclosure.
*For some reason our own archive of this thread seem to be missing. I found a third party copy: http://www.archivum.info/wikipedia-l@wikimedia.org/2005-08/msg00049.html
On 0, Gregory Maxwell gmaxwell@gmail.com scribbled:
On 9/14/07, Ilya Haykinson haykinson@gmail.com wrote:
If we can find out the reason they need IP addresses we can craft the data we send them to satisfy their request. For example:
Two years ago*, when we didn't actually have the data to release, I proposed a two pronged approach, restated here:
(1) Make as much of the non-private data public as we safely can, this maximizes the public value of this data and avoids the harm that picking favorites by sharing valuable data (commercially valuable as well as a academically valuable) with only certain groups. Plus it scales much better.
....
In a very strong sense, we can 'safely' make no data available. I went and did a little research (shucks, now I'm feeling like ArmedBlowfish). Entirely apart from obvious attacks using this data, like [[traffic analysis]] and all the various attacks Tor and remailer systems try to protect against, just the database alone is enough to compromise identities and reveal valuable information - even if you pseudonymize and remove data, and even if you insert dummy (but statistically valid, so it doesn't wreck analyses) data.
The obvious example to prove this would be the leak of AOL search queries, but there's an even better example. It turns out that Iceland has a very large and very well known national DNA database with which is associated a large quantity of metadata concerning family trees and what not (a somewhat amusing aside - a professor of mine once described her visits to Icelander-dominated parties; apparently when Icelanders have nothing better to chat about, or nothing particular in common, they simply go over their genealogies and figure out how they are related). Eventually [[Decode Genetics]]'s database was killed out of privacy concerns (http://observer.guardian.co.uk/international/story/0,6903,1217842,00.html etc.).
This is interesting, yes, but for us the interesting thing is that efforts were made to anonymous/scrub the data before use. Keeping in mind that the techniques were more advanced than the ones I've seem suggested here, the efforts failed. Inferences could be made from the data that broke the security quite easily. I found one particularly interesting paper on the topic; I quote from the abstract:
"Results: While susceptibility varies, we find that each of the protection methods studied is deficient in their protection against re-identification. In certain instances the protection schema itself, such as singly-encrypted pseudonymization, can be leveraged to compromise privacy even further than simple de-identification permits. In order to facilitate the future development of privacy protection methods, we provide a susceptibility comparison of the methods."
"Conclusion: This work illustrates the danger of blindly adopting identity protection methods for genomic data. Future methods must account for inferences that can be leaked from the data itself and the environment into which the data is being released in order to provide guarantees of privacy. While the protection methods reviewed in this paper provide a base for future protection strategies, our analyses provide guideposts for the development of provable privacy protecting methods."
("Why Pseudonyms Don’t Anonymize: A Computational Re-identification Analysis of Genomic Data Privacy Protection Systems"; http://privacy.cs.cmu.edu/dataprivacy/projects/linkage/lidap-wp19.pdf.)
-- gwern contacts Unix Force SUR Flame analysis bank Gamma CBNRC passwd
On 9/15/07, Gwern Branwen gwern0@gmail.com wrote:
In a very strong sense, we can 'safely' make no data available.
This is a counter-productive over-statement. It is only true in the same sort of useless sense that many dramatic maxims are true in...
I would not characterize it as such had you made any effort to concretely connect the background material, interesting as it will be to those who haven't seen it, to some aspect of our actual situation.
Gregory Maxwell wrote:
On 9/15/07, Gwern Branwen gwern0@gmail.com wrote:
In a very strong sense, we can 'safely' make no data available.
This is a counter-productive over-statement. It is only true in the same sort of useless sense that many dramatic maxims are true in...
I would not characterize it as such had you made any effort to concretely connect the background material, interesting as it will be to those who haven't seen it, to some aspect of our actual situation.
Please keep the discussion civil, Greg.
-- Tim Starling
On 9/15/07, Tim Starling tstarling@wikimedia.org wrote:
Gregory Maxwell wrote:
On 9/15/07, Gwern Branwen gwern0@gmail.com wrote:
In a very strong sense, we can 'safely' make no data available.
This is a counter-productive over-statement. It is only true in the same sort of useless sense that many dramatic maxims are true in...
[snip]
Please keep the discussion civil, Greg.
Tim, Your public admonishment to maintain civility is no less a breach of civility than my disagreement and admonishment to maintain a focus on our situation rather than spooky problems elsewhere.
My apologies to Gwern if my tone was recieved as excessively harsh, for that was not my intention. I do think the background references would be useful to others, so thank you for that.
On 2007.09.15 13:16:18 -0400, Gregory Maxwell gmaxwell@gmail.com scribbled 19 lines:
On 9/15/07, Tim Starling tstarling@wikimedia.org wrote:
Gregory Maxwell wrote:
On 9/15/07, Gwern Branwen gwern0@gmail.com wrote:
In a very strong sense, we can 'safely' make no data available.
This is a counter-productive over-statement. It is only true in the same sort of useless sense that many dramatic maxims are true in...
[snip]
Please keep the discussion civil, Greg.
Tim, Your public admonishment to maintain civility is no less a breach of civility than my disagreement and admonishment to maintain a focus on our situation rather than spooky problems elsewhere.
My apologies to Gwern if my tone was recieved as excessively harsh, for that was not my intention. I do think the background references would be useful to others, so thank you for that.
No, I wasn't bothered - I've been online long enough that my think skin disappeared a long time ago. I was more bothered that anyone thought I was wrong. :)
-- gwern 310 explicit UXO Merlin card CIA-DST TDYC AFSPC DDIS basement
On 2007.10.07 11:32:16 -0700, Ray Saintonge saintonge@telus.net scribbled 9 lines:
Gwern Branwen wrote:
No, I wasn't bothered - I've been online long enough that my think skin disappeared a long time ago. I was more bothered that anyone thought I was wrong. :)
It seems to me that a "think skin" is a reasonable path between "thick" and "thin". :-)
Ec
One flame, two flame, or, Think blue, count two, etc.?
But this just goes to show that there's always a Golden Mean, eh?
-- gwern NSIRL SASR SEAL MEU/SOCPSAC SURVIAC Meade KLM AKR data-haven 20755
On 2007.09.15 01:38:00 -0400, Gregory Maxwell gmaxwell@gmail.com scribbled 11 lines:
On 9/15/07, Gwern Branwen gwern0@gmail.com wrote:
In a very strong sense, we can 'safely' make no data available.
This is a counter-productive over-statement. It is only true in the same sort of useless sense that many dramatic maxims are true in...
Dramatic maxims are useful for shock value, which is what is needed here since people seem to be thinking that we can release vast amounts of data and not worry about abuses at all. This attitude shocks me a little, since almost by definition this subject involves releasing even more data than usual, and we've already seen abuses of public data. Not to mention that you *can't* trust researchers to keep it confidential, any more than you could anyone else. (Remember the AOL thing? It was one of their researchers who released it.)
Every bit of data reduces privacy and anonymity; this is a fact of life akin to one-time pads being unbreakable, or lossless compression being unable to compress some strings, or collisions for hashes shorter than the input...
I would not characterize it as such had you made any effort to concretely connect the background material, interesting as it will be to those who haven't seen it, to some aspect of our actual situation.
I assume everyone here is intelligent and doesn't need to have things spelled out in excruciating detail. For example, when I cite a specific Freedom House report, I assume I don't need to link the specific PDF - everyone here knows how to use Google because they've successfully subscribed to this list and are reading it.
When I cite a research paper showing that database inference attacks are powerful enough to defeat pseudonymizing and many other schemes, I don't think I should need to specifically say something rude and blunt; perhaps along the lines of "Oh, and everyone on the list who has suggested that we could just pseudonymize everything or only release parts of IP address - they're all incredibly naive fools with no appreciation for just how hard security is and how much information could be extracted from deceptively little data, and they really should just shut up and go read _Applied Cryptography_, or a bunch of Cryptogram backissues* and never again pontificate on security issues involving real people until they do."
The question here is not whether we can mangle the data so there is no danger of privacy violations. It exists, it will always exist. The question is, can we reduce that danger to below the average every-day risks of using the Internet such that our users won't have any reason to say that our privacy policy is a pack of lies and that the WMF has stabbed them in the back.
Right now, I'm not convinced it's worth it. Has anyone even said what the researchers want it for?
-- gwern SecDef AKR FLAME GEODSS on Blackmednet EODN keebler mines ^X
*or anything, really! I happen to like Bruce Schneier's writings, but there's a lot of security literature that would make the same point.
On 9/16/07, Gwern Branwen gwern0@gmail.com wrote:
On 2007.09.15 01:38:00 -0400, Gregory Maxwell gmaxwell@gmail.com scribbled 11 lines:
On 9/15/07, Gwern Branwen gwern0@gmail.com wrote:
In a very strong sense, we can 'safely' make no data available.
This is a counter-productive over-statement. It is only true in the same sort of useless sense that many dramatic maxims are true in...
Dramatic maxims are useful for shock value, which is what is needed here
We probably have an unresolvable difference in value.
In my view decision making processes need 'shock value' as much hen-houses need foxes. ...
since people seem to be thinking that we can release vast amounts of data and not worry about abuses at all. This attitude shocks me a little, since almost by definition this subject involves releasing even more data than usual, and we've already seen abuses of public data.
At the beginning of the thread the initial respondents appeared to be under the mistaken impression that we were already liberally releasing effectively identical information.
In later replies the tone has been more negative.. to the point where I'm concerned that we may at risk of discarding the baby with the bathwater.
Not to mention that you *can't* trust researchers to keep it confidential, any more than you could anyone else.
Well, more than "anyone else" perhaps. Certainly it would be better to give the data to 'researchers' than a malicious force, or to someone completely unqualified to handle private data. ... But at the same time it would be better still to minimize disclosure.
Every bit of data reduces privacy and anonymity; this is a fact of life
Technically true, but not useful.
I assume everyone here is intelligent and
Then why resort to shock statements and over-generalizations?
[snip]
The question here is not whether we can mangle the data so there is no danger of privacy violations. It exists, it will always exist. The question is, can we reduce that danger to below the average every-day risks
[snip]
Right now, I'm not convinced it's worth it.
[snip]
I think you are creating a false choice here: The choice when dealing with private data isn't only between "no release at all" and "substantial risk but below the average every day risk".
Even while keeping the pedantic "Every bit of data reduces privacy and anonymity" in mind, there are many types of data extract which pose an exposure level so low that we can fairly classify it as none when speaking English rather than pedantese:
For example, no one sane is going to claim that releasing the daily viewership rates for existent articles with some quantization is going to cause an measurable impact to anyone's privacy or anonymity.
On Sat, 15 Sep 2007, Gregory Maxwell wrote:
Even while keeping the pedantic "Every bit of data reduces privacy and anonymity" in mind, there are many types of data extract which pose an exposure level so low that we can fairly classify it as none when speaking English rather than pedantese:
For example, no one sane is going to claim that releasing the daily viewership rates for existent articles with some quantization is going to cause an measurable impact to anyone's privacy or anonymity.
Fair enough. Also, I would find it useful to have a "please share my data with others" option for those who would like data related to their use of the site to be available to researchers and others. There are some interesting questions which can be answered approximately by looking at full data from a dozen non-randomly selected active editors, and I imagine that there are at least that many regulars who wouldn't mind sharing their usage patterns.
[for instance, the kind of daily and monthly edit cycles that you used to be able to see from the edit-count tool for everyone, and that can still be gathered from publicly available data...]
SJ
On 9/14/07, Ilya Haykinson haykinson@gmail.com wrote:
On 9/14/07, Tim Starling tstarling@wikimedia.org wrote:
I wouldn't recommend using a hashed IP address to anyone involved in academic work. I've worked in the academic sector, I know how important it is for data to be above any criticism. Any data using unique IP addresses as an estimate of individual user population would be severely skewed by proxies and NAT.
Perhaps in order to prevent potentially violating our own privacy policy, we can meet the researchers half-way.
The best way to avoid violating the privacy policy would be to change it to say exactly what it is you plan on doing, and to not give data from before the policy is changed.
If we can find out the reason they need IP addresses we can craft the data we send them to satisfy their request. For example:
a) they could just need the unique addresses to link together browsing patterns, but not care for them to be IP addresses. We could create convert the addresses into a unique number (or a salted hash) and send them the data.
In case anyone's seriously considering this, make sure you've read [[AOL search data scandal]] which should show you why it's completely useless. This is *especially* true with Wikipedia data, where the urls we access constantly reveal who we are (e.g. http://en.wikipedia.org/wiki/User_talk:Whatever).
b) they could be looking for network topology information; we could give them the first two or three octets of the IP address.
Three octects would be almost as bad as a) for the same reasons. Two octets would be better, but less useful too.
c) they could be looking for geographical distribution of queries; we could do the geo-lookup of addresses and give them coordinate resolution for each address instead of the address itself.
If that geo information is limited to country, I guess it wouldn't be too bad.
On 9/14/07, Tim Starling tstarling@wikimedia.org wrote:
For a while now, we've been releasing squid log data, stripped of personally identifying information such as IP addresses, to groups at two universities: Vrije Universiteit and the University of Minnesota. We now have a request pending from a third group, at Universidad Rey Juan Carlos in Spain. They are asking if they can have the full data stream including IP addresses, and they are prepared to sign a confidentiality agreement to get it.
How long would the log file run for, and how long would the university keep the log?
On 9/14/07, Tim Starling tstarling@wikimedia.org wrote:
For a while now, we've been releasing squid log data, stripped of personally identifying information such as IP addresses, to groups at two universities: Vrije Universiteit and the University of Minnesota. We now have a request pending from a third group, at Universidad Rey Juan Carlos in Spain. They are asking if they can have the full data stream including IP addresses, and they are prepared to sign a confidentiality agreement to get it.
I'm leaning towards letting them have it. Via the confidentiality agreement, we can avoid the most likely abuse scenarios, such as release of individual user profiles. Currently we let toolserver users process similar data, assisted by Wikipedia administrators who put web bugs on the site. They use it to produce the WikiCharts report. Are we to tell prospective research groups to use the toolserver, rather than their own substantial hardware, for analysis of Wikipedia traffic patterns?
I'm not sure if this would be allowed on the privacy policy, which does mention statistics, but doesn't say who is making them. Maybe the use of web bugs by administrators is already against the privacy policy. In any case, I think the question would benefit from community discussion, which is why I am posting it here.
-- Tim Starling
I don't know if we should be letting any outside groups have the IP addresses/data we are supposed to keep private; I'm uncomfortable with that. I'd sooner we have someone here who is already trusted take requests to run queries. (I note that Greg volunteers to do this... and, for that matter, has been asking for access to do just such things in the past.)
I don't think relying on an NDA to keep things private is effective enough to meet our obligations. If we don't trust people to use proper research ethics we shouldn't give them access to anything important in the first place. But mistakes happen, leaks happen, and that you can show somewhere along the way someone signed something that said they wouldn't disclose private data doesn't take back the damage done from mishandling.
The rest of the log data, that isn't private -- I don't see why you should need to be a university group to access it. Is there somewhere to do so publicly, or at least where anyone may make a request?
-Kat
On 9/14/07, Tim Starling tstarling@wikimedia.org wrote: [snip]
They are asking if they can have the full data stream including IP addresses, and they are prepared to sign a confidentiality agreement to get it.
[snip]
Currently we let toolserver users process similar data, assisted by Wikipedia administrators who put web bugs on the site. They use it to produce the WikiCharts report. Are we to tell prospective research groups to use the toolserver, rather than their own substantial hardware, for analysis of Wikipedia traffic patterns?
[snip]
This is simply not true.
The web bug used by Wikicharts uses a URL which gets a custom log format which logs only the most basic data, here is an example entry:
[14/Sep/2007:00:09:36 +0000] "GET /xyz.png?ns=0&title=Honored%20Matres&factor=6000&wiki=enwiki HTTP/1.1"
That is the entirety of the logged data. With the exception of the HTTP version nothing is gathered which is not strictly necessary to produce the top viewed page data, and even that is gathered at a sampling rate low enough to make the usefulness questionable.
Not that it isn't horribly silly that we're using a JS web-bug and toolserver for this because we are already recording much better data while the wikicharts approach is unreliably, low quality, and trivially subject to manipulation. At the time Wikicharts was established there was no Wikimedia logging, and because all of the Wikimedia logging data is kept private even from most of our own 'inside people', Wikicharts continues to use this method for its reporting.
The data we are providing to outsiders is substantially better than the data available to people with @wikimedia.org addresses, including myself.
For the moment I'm going to refrain from making further public comment on this subject because I've not yet read most of the messages and I think consideration is deserved before issuing some harsh criticism. ... but the comment about wikicharts logging is a factual matter which demanded correction.
Tim Starling wrote:
For a while now, we've been releasing squid log data, stripped of personally identifying information such as IP addresses, to groups at two universities: Vrije Universiteit and the University of Minnesota. We now have a request pending from a third group, at Universidad Rey Juan Carlos in Spain. They are asking if they can have the full data stream including IP addresses, and they are prepared to sign a confidentiality agreement to get it.
Why do they need the ips? What is the purpose of the data?
I don't see why personally identifying information could be needed other than to personally identify someone. Given that you say ip's are not to be used as unique ids... Maybe they're going to proxyscan hundreds of ips to find out if they're proxies??
I'd like to see the request reasons :P
PS: The intercepted data would surely be useless but would the data stream with "personally identifying information" be vulnerable to a man-in-the-middle attack?
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
On 14/09/2007 13:41, Tim Starling wrote: [..snip..]
I'm not sure if this would be allowed on the privacy policy, which does mention statistics, but doesn't say who is making them. Maybe the use of web bugs by administrators is already against the privacy policy. In any case, I think the question would benefit from community discussion, which is why I am posting it here.
- From http://wikimediafoundation.org/wiki/Privacy_policy#Private_logging There are 6 points, mostly about law enforcement and project protection against abuse, followed by "Wikimedia policy does not permit public distribution of such information under any circumstances, except as described above", no hint about academic research or NDAs that would allow third parties to access those personal informations, not without *explicit* consent from the users.
We have strict rules about how CUs have to handle private data of _editors_ and then we would allow three universities to access data of _any_user_ that access WMF sites? I consider myself mildly paranoid, so this is undoubtedly POV, but I think this idea is crazy.
Some people use static IP addresses, even with personal information attached to whois records, did you ever considered it? - --
Brownout
ICQ IM: 236537882 MSN IM: brown dot out at hotmail dot com OpenPGP key: 0xCB11EA7E fingerprint = 6706 B72E 0500 EC52 B33D 13B6 FCFA 8BE5 CB11 EA7E
On 9/14/07, Brownout brovvnout@gmail.com wrote:
We have strict rules about how CUs have to handle private data of _editors_ and then we would allow three universities to access data of _any_user_ that access WMF sites? I consider myself mildly paranoid, so this is undoubtedly POV, but I think this idea is crazy.
Some people use static IP addresses, even with personal information attached to whois records, did you ever considered it?
It's a tremendous bit of information. For those people whose identities are in their WP profile, you'd be giving access to everything they ever read. For those people whose identities aren't in their WP profile, you'd be giving location information which might very well be enough to identify them.
What I still don't understand is what period this information would be from. Would it only be a UDP stream of new requests, or would it include old log data? At least if it's only new requests those of us who are "mildly paranoid" can make sure we always access WP through tor.
Anthony wrote:
What I still don't understand is what period this information would be from. Would it only be a UDP stream of new requests, or would it include old log data? At least if it's only new requests those of us who are "mildly paranoid" can make sure we always access WP through tor.
I've been passing on questions from this thread to the researchers, and I'm still waiting for their reply. So I won't answer most of the questions just yet, but I can answer this one.
It's a UDP stream of new requests, they won't get any old data.
There is no old log data for them to have, except at 1/1000 sampling, and even that has gaps in it due to disk-full conditions, and it doesn't go back very far.
-- Tim Starling
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Tim Starling wrote:
Anthony wrote:
What I still don't understand is what period this information would be from. Would it only be a UDP stream of new requests, or would it include old log data? At least if it's only new requests those of us who are "mildly paranoid" can make sure we always access WP through tor.
I've been passing on questions from this thread to the researchers, and I'm still waiting for their reply. So I won't answer most of the questions just yet, but I can answer this one.
It's a UDP stream of new requests, they won't get any old data.
There is no old log data for them to have, except at 1/1000 sampling, and even that has gaps in it due to disk-full conditions, and it doesn't go back very far.
A stream of requests of live Wikipedia information is valuable as hell. Imagine Google Zeitgeist, but instead of just getting a once-a-year snapshot of it, you get it every second.
I think this information is too powerful and valuable to pick and choose who gets it. Either find a way to release it with a public API so that anyone can access it (and there are many excellent uses for this kind of data), or don't release it at all. I especially don't like the idea of certain people getting raw data that easily identifies users. It's a total sidestep of the check user policy. I know this isn't what most editors signed up for.
No comment on your other points, but id hardly consider a university conducting research under a nda 'public' disclosure at all, to the point where i don't think this even breaks the privacy policy, although whether or not to do it should be a community decision.
On 9/14/07, Brownout brovvnout@gmail.com wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
On 14/09/2007 13:41, Tim Starling wrote: [..snip..]
I'm not sure if this would be allowed on the privacy policy, which does mention statistics, but doesn't say who is making them. Maybe the use of web bugs by administrators is already against the privacy policy. In any case, I think the question would benefit from community discussion, which is why I am posting it here.
There are 6 points, mostly about law enforcement and project protection against abuse, followed by "Wikimedia policy does not permit public distribution of such information under any circumstances, except as described above", no hint about academic research or NDAs that would allow third parties to access those personal informations, not without *explicit* consent from the users.
We have strict rules about how CUs have to handle private data of _editors_ and then we would allow three universities to access data of _any_user_ that access WMF sites? I consider myself mildly paranoid, so this is undoubtedly POV, but I think this idea is crazy.
Some people use static IP addresses, even with personal information attached to whois records, did you ever considered it?
Brownout
ICQ IM: 236537882 MSN IM: brown dot out at hotmail dot com OpenPGP key: 0xCB11EA7E fingerprint = 6706 B72E 0500 EC52 B33D 13B6 FCFA 8BE5 CB11 EA7E
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8-svn4549 (GNU/Linux)
iE8DBQFG6yE4fQNSMkK8eW8RCD9QAN9YlzvMcZ2Tm0qe9LTNmqpFx0FeE97SfD+c wry/AOCh6HhyV3gj0TMqxPxiYfRl7si4qEMPIwHmChgs =CP9N -----END PGP SIGNATURE-----
foundation-l mailing list foundation-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/foundation-l
Brock Weller wrote:
No comment on your other points, but id hardly consider a university conducting research under a nda 'public' disclosure at all, to the point where i don't think this even breaks the privacy policy, although whether or not to do it should be a community decision.
Even if it's not a violation of the privacy policy per se, IMO the privacy policy should still be updated to reflect this use of the data if it's something we agree is a legitimate use of it and are going to be doing on a semi-regular basis. We already list a few things the data is used for, so can just add another one to the list.
-Mark
On 9/14/07, Tim Starling tstarling@wikimedia.org wrote:
For a while now, we've been releasing squid log data, stripped of personally identifying information such as IP addresses, to groups at two universities: Vrije Universiteit and the University of Minnesota. We now have a request pending from a third group, at Universidad Rey Juan Carlos in Spain. They are asking if they can have the full data stream including IP addresses, and they are prepared to sign a confidentiality agreement to get it.
"Wikimedia will not sell or share private information, such as email addresses, with third parties, unless you agree to release this information, or it is required by law to release the information." http://wikimediafoundation.org/wiki/Privacy_policy
Under the current policy I would not support it, even if "private information" is somewhat ambiguous: we must err on the side of caution.
I might support a research exemption clause in future versions of the policy _if_ a compelling case can be made that such an exemption is needed, and that no alternative research method would produce results of approximately the same quality. So far no such case has been made.
Whatever we do, it is crucial that we make it clear to our users through our privacy policy what is going on. In that spirit, I would also appreciate it if the privacy policy could be updated to describe the existing agreements with universities, and the work that is being done on the toolserver.
Wikiresearch-l had a roundtable about this at Wikimania two years ago. We reached no conclusion. I would love to pipe this data through my quality classifier, especially combined with the edit histories of the associated users. But do you realize what kind of a double whammy that is? Not only do you have their surfing habits, you've got their editing habits. On one of the largest websites in the world. This data is of unspeakable value not only to researchers, but to spammers, would-be identity thieves and others.
Although having this data is a wet dream of mine, I find it unconscionable to release it, and I feel that whoever was responsible for releasing it has already overstepped their bounds. We already know from the New York Times analyzing AOL's search logs that persons can be identified from search logs, and we know from Microsoft's Non-Disclosure Agreements with universities around the world for portions of the Windows 2000 source code that these NDAs, even to universities, are not effective in stopping the data from being leaked.
Now that the data has already been released, it is imminent that the foundation create an explicit philosophy about data retention policies and the circumstances under which user data may be released. I suggest that it never be released, and that the foundation hire and/or appoint a statistician for analyzing logs in-house. Perhaps this person can act as a liaison in certain, well defined situations that do not compromise the personal information of anyone beyond what is already available in database dumps. This is the only ethical approach in my opinion.
On 9/15/07, Erik Moeller erik@wikimedia.org wrote:
On 9/14/07, Tim Starling tstarling@wikimedia.org wrote:
For a while now, we've been releasing squid log data, stripped of personally identifying information such as IP addresses, to groups at two universities: Vrije Universiteit and the University of Minnesota. We now have a request pending from a third group, at Universidad Rey Juan Carlos in Spain. They are asking if they can have the full data stream including IP addresses, and they are prepared to sign a confidentiality agreement to get it.
"Wikimedia will not sell or share private information, such as email addresses, with third parties, unless you agree to release this information, or it is required by law to release the information." http://wikimediafoundation.org/wiki/Privacy_policy
Under the current policy I would not support it, even if "private information" is somewhat ambiguous: we must err on the side of caution.
I might support a research exemption clause in future versions of the policy _if_ a compelling case can be made that such an exemption is needed, and that no alternative research method would produce results of approximately the same quality. So far no such case has been made.
Whatever we do, it is crucial that we make it clear to our users through our privacy policy what is going on. In that spirit, I would also appreciate it if the privacy policy could be updated to describe the existing agreements with universities, and the work that is being done on the toolserver. -- Toward Peace, Love & Progress: Erik
DISCLAIMER: This message does not represent an official position of the Wikimedia Foundation or its Board of Trustees.
foundation-l mailing list foundation-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/foundation-l
On 9/15/07, Brian Brian.Mingus@colorado.edu wrote:
Although having this data is a wet dream of mine, I find it unconscionable to release it, and I feel that whoever was responsible for releasing it has already overstepped their bounds.
Unless I'm misunderstanding the rest of the thread, no one has of yet released data containing personally identifiable information (the statement about toolserver and wikicharts in the original post was in error).
-Kat
The only ethical choice? Are you employing hyperbole or serious? Lets set aside the ip request for now, and just look at the two releases made so far, without personal info. You're seriously suggesting that someone infering from edits that person x, without knowing the identity of that person, likes to edit about kangaroo's and trees, but not tree kangaroos, used to create a deeper, better understanding of mankind and the way we function is the unethical choice? Like i said, leaveing out this new request and just going with the two previous releases, it would seem almost criminal to not make the non personal data available for research.
On 9/15/07, Brian Brian.Mingus@colorado.edu wrote:
Wikiresearch-l had a roundtable about this at Wikimania two years ago. We reached no conclusion. I would love to pipe this data through my quality classifier, especially combined with the edit histories of the associated users. But do you realize what kind of a double whammy that is? Not only do you have their surfing habits, you've got their editing habits. On one of the largest websites in the world. This data is of unspeakable value not only to researchers, but to spammers, would-be identity thieves and others.
Although having this data is a wet dream of mine, I find it unconscionable to release it, and I feel that whoever was responsible for releasing it has already overstepped their bounds. We already know from the New York Times analyzing AOL's search logs that persons can be identified from search logs, and we know from Microsoft's Non-Disclosure Agreements with universities around the world for portions of the Windows 2000 source code that these NDAs, even to universities, are not effective in stopping the data from being leaked.
Now that the data has already been released, it is imminent that the foundation create an explicit philosophy about data retention policies and the circumstances under which user data may be released. I suggest that it never be released, and that the foundation hire and/or appoint a statistician for analyzing logs in-house. Perhaps this person can act as a liaison in certain, well defined situations that do not compromise the personal information of anyone beyond what is already available in database dumps. This is the only ethical approach in my opinion.
On 9/15/07, Erik Moeller erik@wikimedia.org wrote:
On 9/14/07, Tim Starling tstarling@wikimedia.org wrote:
For a while now, we've been releasing squid log data, stripped of personally identifying information such as IP addresses, to groups at two universities: Vrije Universiteit and the University of Minnesota. We now have a request pending from a third group, at Universidad Rey Juan Carlos in Spain. They are asking if they can have the full data stream including IP addresses, and they are prepared to sign a confidentiality agreement to get it.
"Wikimedia will not sell or share private information, such as email addresses, with third parties, unless you agree to release this information, or it is required by law to release the information." http://wikimediafoundation.org/wiki/Privacy_policy
Under the current policy I would not support it, even if "private information" is somewhat ambiguous: we must err on the side of caution.
I might support a research exemption clause in future versions of the policy _if_ a compelling case can be made that such an exemption is needed, and that no alternative research method would produce results of approximately the same quality. So far no such case has been made.
Whatever we do, it is crucial that we make it clear to our users through our privacy policy what is going on. In that spirit, I would also appreciate it if the privacy policy could be updated to describe the existing agreements with universities, and the work that is being done on the toolserver. -- Toward Peace, Love & Progress: Erik
DISCLAIMER: This message does not represent an official position of the Wikimedia Foundation or its Board of Trustees.
foundation-l mailing list foundation-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/foundation-l
foundation-l mailing list foundation-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/foundation-l
Brian wrote:
Although having this data is a wet dream of mine, I find it unconscionable to release it, and I feel that whoever was responsible for releasing it has already overstepped their bounds. We already know from the New York Times analyzing AOL's search logs that persons can be identified from search logs, and we know from Microsoft's Non-Disclosure Agreements with universities around the world for portions of the Windows 2000 source code that these NDAs, even to universities, are not effective in stopping the data from being leaked.
The data that has been released cannot be used to identify individuals. The AOL search data could be used to identify individuals, because searches were tagged with a pseudonymous identifier. There are no such identifiers in the data we are sending out.
For example, a search for a social security number, by itself, tells you nothing about the individual who made it. Was it the owner of the SSN, an employer, or someone going through the man's rubbish? Or was it a Wikipedian trying to determine if someone's SSN is notable enough to include in an article?
In the unlikely event that someone types their life story into the search box and clicks "go", you still don't know who wrote it, whether it was autobiographical, slander or fantasy.
If you see the pattern of a person's requests to Wikipedia, then you can infer something about them. But you can't do that with the data we are sending.
And finally, note that we are not releasing this data publically, nor am I suggesting that we should. We are not sending it to anyone who wants it. We are sending it to three research groups at respectable universities.
I can imagine a research group being tempted to republish a code snippet from Windows 2000. I find it hard to imagine that a research group would be tempted to mine 100 billion log lines for some tiny fragment of private data, and then release that data publically or sell it to spammers.
-- Tim Starling
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Tim Starling wrote:
Brian wrote:
Although having this data is a wet dream of mine, I find it unconscionable to release it, and I feel that whoever was responsible for releasing it has already overstepped their bounds. We already know from the New York Times analyzing AOL's search logs that persons can be identified from search logs, and we know from Microsoft's Non-Disclosure Agreements with universities around the world for portions of the Windows 2000 source code that these NDAs, even to universities, are not effective in stopping the data from being leaked.
The data that has been released cannot be used to identify individuals. The AOL search data could be used to identify individuals, because searches were tagged with a pseudonymous identifier. There are no such identifiers in the data we are sending out.
I'm going to assume good faith here and just assume that you simply don't know what the AOL search data was about. The AOL search data was NOT tagged with pseudonymous data (by which I'm assuming you mean usernames). It was tagged with random numbers. The way privacy was compromised in the AOL search data scandal had nothing to do with what the data was labeled as and everything to do with what the data was. One could look at all of the searches made by a given person and clue in on who they were - e.g. by looking for local subjects in their searches, see if they searched for anyone by name (maybe themselves or people they knew), see if they searched for any esoteric subjects, etc.
You would do well to educate yourself on what the AOL search data scandal actually was, because it seems like we may already be making the same mistakes without you realizing it.
For example, a search for a social security number, by itself, tells you nothing about the individual who made it. Was it the owner of the SSN, an employer, or someone going through the man's rubbish? Or was it a Wikipedian trying to determine if someone's SSN is notable enough to include in an article?
Actually, a search for a social security number tells you pretty much everything you need to know and leads directly to infringement of privacy. Many people unmasked in the AOL search data scandal had been searching for personally identifiable information.
In the unlikely event that someone types their life story into the search box and clicks "go", you still don't know who wrote it, whether it was autobiographical, slander or fantasy.
You're being unrealistic here. You're assuming that the person doing the investigating is a complete moron and isn't able to put one and one together. That simply isn't true. In the AOL search data scandal, reporters were able to discover many real life identities using information that was far, far less substantial than a complete life story. Something as simple as a few keyword searches for obscure hobbies and location-specific searches was enough to track some people down. After all, how many Yorktown terrier enthusiasts do you think you're going to find in average Small Town, USA?
On 9/15/07, Ben McIlwain cydeweys@gmail.com wrote: [snip]
The AOL search data was NOT tagged with pseudonymous data (by which I'm assuming you mean usernames). It was tagged with random numbers. The way privacy was compromised in the AOL search data scandal had nothing to do with what the data was labeled as and everything to do with what the data was. One could look at all of the searches made by a given person and clue in on who they were - e.g. by looking for local subjects in their searches, see if they searched for anyone by name (maybe themselves or people they knew), see if they searched for any esoteric subjects, etc.
A unique random ID is a pseudonym. The ability to tie multiple searches to the same pseudonym was key, ... while I could guess the probably identity of a single search in some cases without any pseudonym it is, as you pointed out, the ability to tie them togeather which creates trouble.
The point Tim was making was that the data Wikimedia has *previously released* did not include any sort of identifyer, pseudonominous or not, and thus doesn't have the same risks.
The data which is *proposed* to be disclosed would include IPs, which acts as either a pseudonominous identifyer or an outright identifyer. I doubt Tim would disagree that there are significant privacy implications in the case of those. Which is, of course, why he said they were willing to enter into a NDA.
Gregory Maxwell wrote:
On 9/15/07, Ben McIlwain cydeweys@gmail.com wrote: [snip]
The AOL search data was NOT tagged with pseudonymous data (by which I'm assuming you mean usernames). It was tagged with random numbers. The way privacy was compromised in the AOL search data scandal had nothing to do with what the data was labeled as and everything to do with what the data was. One could look at all of the searches made by a given person and clue in on who they were - e.g. by looking for local subjects in their searches, see if they searched for anyone by name (maybe themselves or people they knew), see if they searched for any esoteric subjects, etc.
A unique random ID is a pseudonym. The ability to tie multiple searches to the same pseudonym was key, ... while I could guess the probably identity of a single search in some cases without any pseudonym it is, as you pointed out, the ability to tie them togeather which creates trouble.
What he said.
-- Tim Starling
Erik Moeller wrote:
On 9/14/07, Tim Starling tstarling@wikimedia.org wrote:
For a while now, we've been releasing squid log data, stripped of personally identifying information such as IP addresses, to groups at two universities: Vrije Universiteit and the University of Minnesota. We now have a request pending from a third group, at Universidad Rey Juan Carlos in Spain. They are asking if they can have the full data stream including IP addresses, and they are prepared to sign a confidentiality agreement to get it.
"Wikimedia will not sell or share private information, such as email addresses, with third parties, unless you agree to release this information, or it is required by law to release the information." http://wikimediafoundation.org/wiki/Privacy_policy
Under the current policy I would not support it, even if "private information" is somewhat ambiguous: we must err on the side of caution.
Yes. The first question is, would providing this data violate the privacy policy, which protects "private information" - often but not always assumed to mean personally-identifiable information. If we consider the squid log data to include potentially personally-identifiable/private information, then we can't release it to a third party. Regardless of how much we trust them, or what they are willing to sign.
If the release does NOT violate the privacy policy, then the question becomes whether it violates existing community standards & practices. I don't know the answer to that. But there has been lots of discussion here, which may suggest there's not a clear consensus view.
IMO we want to help academics and we share lots of their values.. but it is more important that we protect our own community of users/contributors. So we want to err on that side.
I might support a research exemption clause in future versions of the policy _if_ a compelling case can be made that such an exemption is needed, and that no alternative research method would produce results of approximately the same quality. So far no such case has been made.
Yes. Regardless, that would apply on a going-forward basis only; we obviously could not change the terms of use retroactively/non-consensually.
Whatever we do, it is crucial that we make it clear to our users through our privacy policy what is going on. In that spirit, I would also appreciate it if the privacy policy could be updated to describe the existing agreements with universities, and the work that is being done on the toolserver.
Sue Gardner wrote:
Erik Moeller wrote:
On 9/14/07, Tim Starling tstarling@wikimedia.org wrote:
For a while now, we've been releasing squid log data, stripped of personally identifying information such as IP addresses, to groups at two universities: Vrije Universiteit and the University of Minnesota. We now have a request pending from a third group, at Universidad Rey Juan Carlos in Spain. They are asking if they can have the full data stream including IP addresses, and they are prepared to sign a confidentiality agreement to get it.
"Wikimedia will not sell or share private information, such as email addresses, with third parties, unless you agree to release this information, or it is required by law to release the information." http://wikimediafoundation.org/wiki/Privacy_policy
Under the current policy I would not support it, even if "private information" is somewhat ambiguous: we must err on the side of caution.
Yes. The first question is, would providing this data violate the privacy policy, which protects "private information" - often but not always assumed to mean personally-identifiable information. If we consider the squid log data to include potentially personally-identifiable/private information, then we can't release it to a third party. Regardless of how much we trust them, or what they are willing to sign.
Trust and signatures are not enough. How will they react if a government demands the release of private information? If we determine that we will not release it in the absence of a court order, what recourse do we have if the acquirers are not willing to resist a government order in the courts? In some jurisdictions there may be no such right to challenge such an order.
If the release does NOT violate the privacy policy, then the question becomes whether it violates existing community standards & practices. I don't know the answer to that. But there has been lots of discussion here, which may suggest there's not a clear consensus view.
There's at least a consensus insofar as appreciating that there are lot concerns about this issue that cannot be easily. The simple fact that Tim sought the advice of this list before barging ahead tells us that even the proponent of of these acts has his doubts.
Ec
On 15/09/2007, Ray Saintonge saintonge@telus.net wrote:
Trust and signatures are not enough. How will they react if a government demands the release of private information? If we determine that we will not release it in the absence of a court order, what recourse do we have if the acquirers are not willing to resist a government order in the courts? In some jurisdictions there may be no such right to challenge such an order.
It's going to Spain. The data protection laws (and culture) there are more stringent than those in the US. This sort of handwaving is a little misleading... it's at just as much risk of a government demand, with substantially lower legal protection or right to refuse, on *our* servers!
(The same argument applies to whoever said "and if an Iranian university asked for it?"... that would be a very different question, and we would be quite within our rights to say no if we felt the information would not be appropiately safeguarded from misuse either by the recipient or their government)
The repeated mention of a non-disclosure agreement indicates to me that most everyone involved in this conversation knows, at the most basic level, that we're talking about releasing private data.
Have our readers agreed to release this information? Are they aware that their private browsing habits will be subject to third-party review? Why don't we announce this in the sitenotice, and see what our readers think about it?
The privacy policy is quite explicit: "Wikimedia will not sell or share private information, such as email addresses, with third parties, unless you agree to release this information, or it is required by law to release the information." There is no exception for "but it's really cool," nor "they asked nicely," nor even "they said it's totally still private, signed a contract and everything."
If we alter the privacy policy every time we feel like releasing (or selling) information, we might as well not have a privacy policy at all.
Yes, research is important. Yes, our goal is to spread and increase the sum of human knowledge. But privacy of private data is currently written into policy as being *important*, and I haven't yet seen a compelling reason to change that.
This should not be a casual decision. The information security of our editors and readers should be an utmost priority.
-Luna
On 9/19/07, Luna lunasantin@gmail.com wrote:
Yes, research is important. Yes, our goal is to spread and increase the sum of human knowledge. But privacy of private data is currently written into policy as being *important*, and I haven't yet seen a compelling reason to change that.
This should not be a casual decision. The information security of our editors and readers should be an utmost priority.
Information security is particularly important given that cyberstalking has become an increasing problem on Wikipedia. We're currently hearing about several new cases a month, and some of them have been quite serious, with editors (usually admins) being contacted at their homes, family members threatened with violence, threats to contact employers, and so on.
My understanding is that, with the information people are considering releasing, it would be possible for someone to work out which editor had which IP address, which would be a serious betrayal of trust.
Sarah
On 9/19/07, SlimVirgin slimvirgin@gmail.com wrote:
On 9/19/07, Luna lunasantin@gmail.com wrote:
Yes, research is important. Yes, our goal is to spread and increase the sum of human knowledge. But privacy of private data is currently written into policy as being *important*, and I haven't yet seen a compelling reason to change that.
This should not be a casual decision. The information security of our editors and readers should be an utmost priority.
Information security is particularly important given that cyberstalking has become an increasing problem on Wikipedia. We're currently hearing about several new cases a month, and some of them have been quite serious, with editors (usually admins) being contacted at their homes, family members threatened with violence, threats to contact employers, and so on.
My understanding is that, with the information people are considering releasing, it would be possible for someone to work out which editor had which IP address, which would be a serious betrayal of trust.
Sarah
Right; any relatively unique edit (to a given article, without many temporally close-by edits) could be traced from the HTTP operations to the article edit logs and ID the user involved. Repeat for all the users who edit in a given time period... odds are high that this could be used to effectively mass-checkuser the whole site. Given a database dump and the HTTP data stream, one could write a tool to automatically resolve everything pretty easily.
While I am generally all for editors being more open about their identities, giving anyone the power to do this is a problem, in my opinion. We restrict this level of data access internally rather strictly; allowing it out in the open, to independent researchers, is potentially very problematic. It worries me, and if it worries me, it certainly will worry those who are more concerned with preserving pseudonymity and privacy concerns. They would likely feel that this is a breach of the explicit or implied privacy policy, and I would tend to agree with them.
Even replacing IPs with unique hashes or other IDs would allow leakage of info; one could extend the theoretical tool above, to find all temporally relatively unique edits by a given unique ID and look in the database dump for any that were done by someone not logged in.
Also vulnerable to a brute force attack. There are only 2^32 possible IPs; that's about 4 billion. Excluding the rather brutally obvious complete IP -> hash lookup table method, it would take only about an hour to search the whole space if your CPU can do a million hashes a second 2,000 usable OPS/hash or so). Anyone performing a widespread search would undoubtedly build the table; it's going to be small (64 bit hash -> 32 GB) compared to modern disks (and some people's RAM...).
If you salted each IP with a different salt, that would be effective, but would also require us to generate and store a large secure table of salts (or, ip -> forward hash). And it still doesn't get around the temporally relatively unique edits comparison method.
On 9/19/07, SlimVirgin slimvirgin@gmail.com wrote: [snip]
My understanding is that, with the information people are considering releasing, it would be possible for someone to work out which editor had which IP address, which would be a serious betrayal of trust.
Hopefully you can see from my prior posts on this thread that I favor a conservative handling of private data and you won't mistake my point below for an insensitivity to your concerns.
I agree that the log data must not be handled in a way that reduces privacy, but I disagree with the implied claim that there is a high level of privacy for *editors* to begin with.
If editors are betting on the privacy of their IP addresses to avoid harassment or stalkers then they are making a bad bet. I do not want people to be surprised when they discover the privacy they thought they had did not really exist.
There are many ways a users IP can be leaked. For example, whenever you follow a link to an external site your address is leaked to that site. Any administrator can inject CSS or JS into your personal or the site wide files which could cause your browser to connect to another site and give away your address. Your use of email along with your account can reveal your address. We have a great many checkusers, and while they are trustworthy their machines or accounts could become compromised. Checkuser data is sent unencrypted to checkusers across the Internet. ... it's very very very easy to accidentally edit while logged out, especially when you cross over to one of our other wikis like commons or meta.
The protections provided today are not bad. But they are not very good because very good protection would be someplace between highly inconvenient and impossible.
Only the most paranoid and inconvenience tolerant people have a fighting chance of keeping their totally secret during a long editing carrier.
Most people simply lack the foresight (few expect stalkers the day they make their first edit), technical expertise, and patience required to strongly protect their anonymity while editing.
Providing privacy strong enough to stop a stalker for people who are indirectly spewing out large amounts of information about themselves in the form of edits is just a really hard problem which I don't have a solution for...
On 9/19/07, Gregory Maxwell gmaxwell@gmail.com wrote:
On 9/19/07, SlimVirgin slimvirgin@gmail.com wrote: [snip]
My understanding is that, with the information people are considering releasing, it would be possible for someone to work out which editor had which IP address, which would be a serious betrayal of trust.
Hopefully you can see from my prior posts on this thread that I favor a conservative handling of private data and you won't mistake my point below for an insensitivity to your concerns.
I agree that the log data must not be handled in a way that reduces privacy, but I disagree with the implied claim that there is a high level of privacy for *editors* to begin with.
If editors are betting on the privacy of their IP addresses to avoid harassment or stalkers then they are making a bad bet. I do not want people to be surprised when they discover the privacy they thought they had did not really exist.
There are many ways a users IP can be leaked. For example, whenever you follow a link to an external site your address is leaked to that site. Any administrator can inject CSS or JS into your personal or the site wide files which could cause your browser to connect to another site and give away your address. Your use of email along with your account can reveal your address. We have a great many checkusers, and while they are trustworthy their machines or accounts could become compromised. Checkuser data is sent unencrypted to checkusers across the Internet. ... it's very very very easy to accidentally edit while logged out, especially when you cross over to one of our other wikis like commons or meta.
Yes, I agree that protecting IP address is hard. Just as an example, we have one stalker (and I'm using the word advisedly) who posts links on people's talk pages to what appears to be Wikipedia articles, purportedly asking for advice, but in fact diverting that user to the stalker's own website, so he can pick up the IP. He's also sent e-mails with disguised links that divert people to a blog he has access to.
The concerns of people being harassed are partly to do with not wanting people to know where we edit from, but also to do with fears that the more determined stalkers could get into the user's computer if they knew the exact IP, which is a more serious invasion than knowing you live in New York or wherever.
The protections provided today are not bad. But they are not very good because very good protection would be someplace between highly inconvenient and impossible.
Only the most paranoid and inconvenience tolerant people have a fighting chance of keeping their totally secret during a long editing carrier.
Most people simply lack the foresight (few expect stalkers the day they make their first edit), technical expertise, and patience required to strongly protect their anonymity while editing.
Providing privacy strong enough to stop a stalker for people who are indirectly spewing out large amounts of information about themselves in the form of edits is just a really hard problem which I don't have a solution for...
I agree with you. It's very tricky.
The only workable solution I can see is to make it less likely that stalkers will want to target particular admins. One way to do that would be to set up anonymous admin accounts that multiple admins could use. So for example, if a difficult user needs to be blocked, any admin could access the joint admin account to make the block. The user would only see that User:Admin1 had blocked him. Only trusted people would have access to which admin had made a block with User:Admin1 at time T.
I know it would complicate things, and it might make admin abuse a little more likely. And we'd still have the problem of potential leaks, so it wouldn't be foolproof by any means.
Sarah
Without asking for details, how is this accomplished? I can only see it if he emails, and one replies, or if he tells you about a web site, and you click on it.
Yes, I agree that protecting IP address is hard. Just as an example, we have one stalker (and I'm using the word advisedly) who posts links on people's talk pages to what appears to be Wikipedia articles, purportedly asking for advice, but in fact diverting that user to the stalker's own website, so he can pick up the IP. He's also sent e-mails with disguised links that divert people to a blog he has access to. --
David Goodman, Ph.D, M.L.S.
On 9/20/07, David Goodman dgoodmanny@gmail.com wrote:
Without asking for details, how is this accomplished? I can only see it if he emails, and one replies, or if he tells you about a web site, and you click on it.
Yes, it works only if you click on the link. But the links are disguised to look like something else -- a Wikipedia page, for example.
Sarah
Yes, I agree that protecting IP address is hard. Just as an example, we have one stalker (and I'm using the word advisedly) who posts links on people's talk pages to what appears to be Wikipedia articles, purportedly asking for advice, but in fact diverting that user to the stalker's own website, so he can pick up the IP. He's also sent e-mails with disguised links that divert people to a blog he has access to.
SlimVirgin wrote:
On 9/20/07, David Goodman wrote:
Without asking for details, how is this accomplished? I can only see it if he emails, and one replies, or if he tells you about a web site, and you click on it.
Yes, it works only if you click on the link. But the links are disguised to look like something else -- a Wikipedia page, for example.
Bank swindles do a lot of that too. I have no problem with removing deceptive links that go somewhere different from where they purport to go.
To some extent people also need to practise a little safety of their own, as with not accepting all cookies.
Ec
On 20/09/2007, Ray Saintonge saintonge@telus.net wrote:
SlimVirgin wrote:
On 9/20/07, David Goodman wrote:
Without asking for details, how is this accomplished? I can only see it if he emails, and one replies, or if he tells you about a web site, and you click on it.
Yes, it works only if you click on the link. But the links are disguised to look like something else -- a Wikipedia page, for example.
Bank swindles do a lot of that too. I have no problem with removing deceptive links that go somewhere different from where they purport to go.
To some extent people also need to practise a little safety of their own, as with not accepting all cookies.
I'm all for safe web surfing, but that doesn't include not clicking on links. Not entering your bank account details afterwards, sure, but clicking on the link is usually pretty safe.
On 9/20/07, Thomas Dalton thomas.dalton@gmail.com wrote:
On 20/09/2007, Ray Saintonge saintonge@telus.net wrote:
SlimVirgin wrote:
On 9/20/07, David Goodman wrote:
Without asking for details, how is this accomplished? I can only see it if he emails, and one replies, or if he tells you about a web site, and you click on it.
Yes, it works only if you click on the link. But the links are disguised to look like something else -- a Wikipedia page, for example.
Bank swindles do a lot of that too. I have no problem with removing deceptive links that go somewhere different from where they purport to go.
To some extent people also need to practise a little safety of their own, as with not accepting all cookies.
I'm all for safe web surfing, but that doesn't include not clicking on links. Not entering your bank account details afterwards, sure, but clicking on the link is usually pretty safe.
So what's your solution?
I say if you don't want anyone to find out your IP address, use Tor, that's what it was built for. Then if you accidentally click on a link and wind up at an attack site, that site's administrator has no way to figure out your IP.
That only solves the problem of leaking your IP address, of course. The larger problem of being pseudonymous is essentially unsolved. With enough effort just about any pseudonym can be cracked.
So what's your solution?
I say if you don't want anyone to find out your IP address, use Tor, that's what it was built for. Then if you accidentally click on a link and wind up at an attack site, that site's administrator has no way to figure out your IP.
That only solves the problem of leaking your IP address, of course. The larger problem of being pseudonymous is essentially unsolved. With enough effort just about any pseudonym can be cracked.
If you're worried about someone finding out your IP address, then yes, Tor is probably your best bet.
On 20/09/2007, SlimVirgin slimvirgin@gmail.com wrote:
The only workable solution I can see is to make it less likely that stalkers will want to target particular admins.
It's tricky. The problem is that a lot of the people who get blocked are blocked because they're arseholes or nutters. They will take a block for whatever reason as an unacceptable assault against (a) their ego or (b) the REVEALED TRUTH (in capitals). This then gives them an exciting new holy mission in life.
One way to do that would be to set up anonymous admin accounts that multiple admins could use. So for example, if a difficult user needs to be blocked, any admin could access the joint admin account to make the block. The user would only see that User:Admin1 had blocked him. Only trusted people would have access to which admin had made a block with User:Admin1 at time T. I know it would complicate things, and it might make admin abuse a little more likely. And we'd still have the problem of potential leaks, so it wouldn't be foolproof by any means.
Crikey, I'm trying to imagine how paranoid people would get with that in place compared to now. It strikes me as disastrous public relations to remove any accountability or traceability. If you think the paranoids are bad now ...
A solution to an edge case that breaks the normal case is unlikely to gain traction.
- d.
There are plenty of admins that happily make their real identity public knowledge and apparently aren't so afraid of "stalkers" that they're unwilling to block people. There's probably at least one of them online 24 hours a day. Get one of them to make the block.
If there are some gaps in that 24 hour coverage, hire someone to fill in those gaps. Pay them enough that they can buy a PO box, an alarm system for their house, etc. How does society handle having judges and police and presidents and soldiers and other figures who have to make and enforce decisions that rile up a few nutters? Not by making them unaccountable for their actions. If Wikipedia is a serious project creating a real benefit to society, why shouldn't it do the same thing? Being part of the wikipolice is surely less dangerous than being part of the real police.
On 9/20/07, David Gerard dgerard@gmail.com wrote:
On 20/09/2007, SlimVirgin slimvirgin@gmail.com wrote:
The only workable solution I can see is to make it less likely that stalkers will want to target particular admins.
It's tricky. The problem is that a lot of the people who get blocked are blocked because they're arseholes or nutters. They will take a block for whatever reason as an unacceptable assault against (a) their ego or (b) the REVEALED TRUTH (in capitals). This then gives them an exciting new holy mission in life.
One way to do that would be to set up anonymous admin accounts that multiple admins could use. So for example, if a difficult user needs to be blocked, any admin could access the joint admin account to make the block. The user would only see that User:Admin1 had blocked him. Only trusted people would have access to which admin had made a block with User:Admin1 at time T. I know it would complicate things, and it might make admin abuse a little more likely. And we'd still have the problem of potential leaks, so it wouldn't be foolproof by any means.
Crikey, I'm trying to imagine how paranoid people would get with that in place compared to now. It strikes me as disastrous public relations to remove any accountability or traceability. If you think the paranoids are bad now ...
A solution to an edge case that breaks the normal case is unlikely to gain traction.
- d.
foundation-l mailing list foundation-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/foundation-l
On 20/09/2007, Anthony wikimail@inbox.org wrote:
There are plenty of admins that happily make their real identity public knowledge and apparently aren't so afraid of "stalkers" that they're unwilling to block people. There's probably at least one of them online 24 hours a day. Get one of them to make the block.
I agree, that ought to be enough in most situations. It would be good to have something to fall back on if we end up needing to block someone known to be dangerous, though.
If there are some gaps in that 24 hour coverage, hire someone to fill in those gaps. Pay them enough that they can buy a PO box, an alarm system for their house, etc. How does society handle having judges and police and presidents and soldiers and other figures who have to make and enforce decisions that rile up a few nutters? Not by making them unaccountable for their actions. If Wikipedia is a serious project creating a real benefit to society, why shouldn't it do the same thing? Being part of the wikipolice is surely less dangerous than being part of the real police.
Presidents have bodyguards. Judges generally have police escorts if they need them. Police and soldiers are trained and equipped to defend themselves. Giving Wikipedia admins personal protection would be taking things a little too far, IMHO ;).
On 9/20/07, Thomas Dalton thomas.dalton@gmail.com wrote:
On 20/09/2007, Anthony wikimail@inbox.org wrote:
There are plenty of admins that happily make their real identity public knowledge and apparently aren't so afraid of "stalkers" that they're unwilling to block people. There's probably at least one of them online 24 hours a day. Get one of them to make the block.
I agree, that ought to be enough in most situations. It would be good to have something to fall back on if we end up needing to block someone known to be dangerous, though.
If someone is known to be dangerous, shouldn't we be calling the police? How would having a pool account help matters? The dangerous person would just go after everyone in the pool, or whoever set up the pool, or Jimbo, or the board members (many of whose home addresses are easily found).
Maybe Jimbo would be willing to make the block in those high profile cases. I doubt his doing so would bring him any more attention from stalkers than he already has.
If there are some gaps in that 24 hour coverage, hire someone to fill in those gaps. Pay them enough that they can buy a PO box, an alarm system for their house, etc. How does society handle having judges and police and presidents and soldiers and other figures who have to make and enforce decisions that rile up a few nutters? Not by making them unaccountable for their actions. If Wikipedia is a serious project creating a real benefit to society, why shouldn't it do the same thing? Being part of the wikipolice is surely less dangerous than being part of the real police.
Presidents have bodyguards. Judges generally have police escorts if they need them. Police and soldiers are trained and equipped to defend themselves. Giving Wikipedia admins personal protection would be taking things a little too far, IMHO ;).
For volunteers, yes. But if being an admin is so dangerous that enough people aren't volunteering, hiring one or two people to essentially be paid admins would be a possibility. Creating a world in which every single person can share freely in the sum of human knowledge is a big real world task which has costs and risks involved in it.
Personally I think there are probably enough volunteers right now to cover the task, and hiring someone would be overkill. The solution is as I said it a month or so ago. If you're not willing to deal with stalkers, don't be an admin, or at least don't be an admin that performs controversial actions. But if the choice is between taking away admin accountability (as suggested by Sarah) and hiring a few body guards, I think the latter is a much better choice.
Am I dreaming, or have I wandered into some alternate universe, or has the whole world gone insane, or what?
You're talking about a bunch of nerds who edit text on some website here, not judges, police, soldiers... please for god's sake have some sense of perspective.
Being an administrator is not "dangerous". There are more than enough people volunteering to help (if you don't think so, stop turning down RfAs for stupid reasons). If someone's too chicken to issue a block, well whoopee-do, that's their problem. Chances are the block wasn't warranted anyway.
Administrators do not need body guards, alarm systems in their house or indeed anything other than the common sense not to post their credit card details online. Nobody has ever been murdered in their sleep because they banned someone on the Star Wars forum they moderate. We're talking about the same kind of thing here.
Having a lovely time back in reality,
-Gurch
--- Anthony wikimail@inbox.org wrote:
On 9/20/07, Thomas Dalton thomas.dalton@gmail.com wrote:
On 20/09/2007, Anthony wikimail@inbox.org wrote:
There are plenty of admins that happily make their
real identity
public knowledge and apparently aren't so afraid of
"stalkers" that
they're unwilling to block people. There's probably
at least one of
them online 24 hours a day. Get one of them to make
the block.
I agree, that ought to be enough in most situations. It
would be good
to have something to fall back on if we end up needing
to block
someone known to be dangerous, though.
If someone is known to be dangerous, shouldn't we be calling the police? How would having a pool account help matters? The dangerous person would just go after everyone in the pool, or whoever set up the pool, or Jimbo, or the board members (many of whose home addresses are easily found).
Maybe Jimbo would be willing to make the block in those high profile cases. I doubt his doing so would bring him any more attention from stalkers than he already has.
If there are some gaps in that 24 hour coverage, hire
someone to fill
in those gaps. Pay them enough that they can buy a
PO box, an alarm
system for their house, etc. How does society handle
having judges
and police and presidents and soldiers and other
figures who have to
make and enforce decisions that rile up a few
nutters? Not by making
them unaccountable for their actions. If Wikipedia
is a serious
project creating a real benefit to society, why
shouldn't it do the
same thing? Being part of the wikipolice is surely
less dangerous
than being part of the real police.
Presidents have bodyguards. Judges generally have
police escorts if
they need them. Police and soldiers are trained and
equipped to defend
themselves. Giving Wikipedia admins personal protection
would be
taking things a little too far, IMHO ;).
For volunteers, yes. But if being an admin is so dangerous that enough people aren't volunteering, hiring one or two people to essentially be paid admins would be a possibility. Creating a world in which every single person can share freely in the sum of human knowledge is a big real world task which has costs and risks involved in it.
Personally I think there are probably enough volunteers right now to cover the task, and hiring someone would be overkill. The solution is as I said it a month or so ago. If you're not willing to deal with stalkers, don't be an admin, or at least don't be an admin that performs controversial actions. But if the choice is between taking away admin accountability (as suggested by Sarah) and hiring a few body guards, I think the latter is a much better choice.
foundation-l mailing list foundation-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/foundation-l
Stalkers are not a nice thing to deal with and I think you're underplaying the seriousness of the issue. They may be no serious risk of physical threat and thus your comments about bodyguards are appropriate. However, you're not taking into account what it is like when you get death threats by email and snail-mail.
About 6 years ago I had to deal with the divorce of my current partner when she left her husband after he assaulted her. I was made aware through a third party that he had obtained a shotgun and was looking for me. In the end - thanks to me employing good legal representation and hiding my location - he decided to turn the aforementioned weapon on himself.
Don't joke about stalking, it happens.
Brian.
-----Original Message----- From: foundation-l-bounces@lists.wikimedia.org [mailto:foundation-l-bounces@lists.wikimedia.org] On Behalf Of Matthew Britton Sent: 20 September 2007 16:27 To: Wikimedia Foundation Mailing List Subject: Re: [Foundation-l] Release of squid log data
Am I dreaming, or have I wandered into some alternate universe, or has the whole world gone insane, or what?
You're talking about a bunch of nerds who edit text on some website here, not judges, police, soldiers... please for god's sake have some sense of perspective.
Being an administrator is not "dangerous". There are more than enough people volunteering to help (if you don't think so, stop turning down RfAs for stupid reasons). If someone's too chicken to issue a block, well whoopee-do, that's their problem. Chances are the block wasn't warranted anyway.
Administrators do not need body guards, alarm systems in their house or indeed anything other than the common sense not to post their credit card details online. Nobody has ever been murdered in their sleep because they banned someone on the Star Wars forum they moderate. We're talking about the same kind of thing here.
Having a lovely time back in reality,
-Gurch
--- Anthony wikimail@inbox.org wrote:
On 9/20/07, Thomas Dalton thomas.dalton@gmail.com wrote:
On 20/09/2007, Anthony wikimail@inbox.org wrote:
There are plenty of admins that happily make their
real identity
public knowledge and apparently aren't so afraid of
"stalkers" that
they're unwilling to block people. There's probably
at least one of
them online 24 hours a day. Get one of them to make
the block.
I agree, that ought to be enough in most situations. It
would be good
to have something to fall back on if we end up needing
to block
someone known to be dangerous, though.
If someone is known to be dangerous, shouldn't we be calling the police? How would having a pool account help matters? The dangerous person would just go after everyone in the pool, or whoever set up the pool, or Jimbo, or the board members (many of whose home addresses are easily found).
Maybe Jimbo would be willing to make the block in those high profile cases. I doubt his doing so would bring him any more attention from stalkers than he already has.
If there are some gaps in that 24 hour coverage, hire
someone to fill
in those gaps. Pay them enough that they can buy a
PO box, an alarm
system for their house, etc. How does society handle
having judges
and police and presidents and soldiers and other
figures who have to
make and enforce decisions that rile up a few
nutters? Not by making
them unaccountable for their actions. If Wikipedia
is a serious
project creating a real benefit to society, why
shouldn't it do the
same thing? Being part of the wikipolice is surely
less dangerous
than being part of the real police.
Presidents have bodyguards. Judges generally have
police escorts if
they need them. Police and soldiers are trained and
equipped to defend
themselves. Giving Wikipedia admins personal protection
would be
taking things a little too far, IMHO ;).
For volunteers, yes. But if being an admin is so dangerous that enough people aren't volunteering, hiring one or two people to essentially be paid admins would be a possibility. Creating a world in which every single person can share freely in the sum of human knowledge is a big real world task which has costs and risks involved in it.
Personally I think there are probably enough volunteers right now to cover the task, and hiring someone would be overkill. The solution is as I said it a month or so ago. If you're not willing to deal with stalkers, don't be an admin, or at least don't be an admin that performs controversial actions. But if the choice is between taking away admin accountability (as suggested by Sarah) and hiring a few body guards, I think the latter is a much better choice.
foundation-l mailing list foundation-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/foundation-l
_______________________________________________ foundation-l mailing list foundation-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/foundation-l
On 9/20/07, Brian McNeil brian.mcneil@wikinewsie.org wrote:
Stalkers are not a nice thing to deal with and I think you're underplaying the seriousness of the issue. They may be no serious risk of physical threat and thus your comments about bodyguards are appropriate. However, you're not taking into account what it is like when you get death threats by email and snail-mail.
I think the point is that nothing we do is going to stop this from happening. Make an arb-com role account and now everyone on the arb com will get the death threats instead of just the people supporting the block.
It's not a problem that Wikipedia can solve, so the best way to deal with it is to let admins know what they're getting into and let people who don't want to be admins still contribute in other ways.
Brian McNeil wrote:
Stalkers are not a nice thing to deal with and I think you're underplaying the seriousness of the issue. They may be no serious risk of physical threat and thus your comments about bodyguards are appropriate. However, you're not taking into account what it is like when you get death threats by email and snail-mail.
About 6 years ago I had to deal with the divorce of my current partner when she left her husband after he assaulted her. I was made aware through a third party that he had obtained a shotgun and was looking for me. In the end - thanks to me employing good legal representation and hiding my location - he decided to turn the aforementioned weapon on himself.
Don't joke about stalking, it happens.
Brian.
Yes, stalking happens. It happens in situations such as that which you describe; this is certainly a problem and such incidents should certainly be taken seriously.
But it doesn't happen because someone was banned from a website.
Anyone can make a death threat online. Actually carrying out said threat is never in the mind of whoever makes it. Such threats are the product of the trolls who were blocked in the first place. This thread has turned into exactly the sort of over-the-top response they are trying to get.
Someone, please... tell me I'm not the only one who can see this?
Still enjoying life in the real world,
-Gurch
-----Original Message----- From: foundation-l-bounces@lists.wikimedia.org [mailto:foundation-l-bounces@lists.wikimedia.org] On Behalf Of Matthew Britton Sent: 20 September 2007 16:27 To: Wikimedia Foundation Mailing List Subject: Re: [Foundation-l] Release of squid log data
Am I dreaming, or have I wandered into some alternate universe, or has the whole world gone insane, or what?
You're talking about a bunch of nerds who edit text on some website here, not judges, police, soldiers... please for god's sake have some sense of perspective.
Being an administrator is not "dangerous". There are more than enough people volunteering to help (if you don't think so, stop turning down RfAs for stupid reasons). If someone's too chicken to issue a block, well whoopee-do, that's their problem. Chances are the block wasn't warranted anyway.
Administrators do not need body guards, alarm systems in their house or indeed anything other than the common sense not to post their credit card details online. Nobody has ever been murdered in their sleep because they banned someone on the Star Wars forum they moderate. We're talking about the same kind of thing here.
Having a lovely time back in reality,
-Gurch
--- Anthony wikimail@inbox.org wrote:
On 9/20/07, Thomas Dalton thomas.dalton@gmail.com wrote:
On 20/09/2007, Anthony wikimail@inbox.org wrote:
There are plenty of admins that happily make their
real identity
public knowledge and apparently aren't so afraid of
"stalkers" that
they're unwilling to block people. There's probably
at least one of
them online 24 hours a day. Get one of them to make
the block.
I agree, that ought to be enough in most situations. It
would be good
to have something to fall back on if we end up needing
to block
someone known to be dangerous, though.
If someone is known to be dangerous, shouldn't we be calling the police? How would having a pool account help matters? The dangerous person would just go after everyone in the pool, or whoever set up the pool, or Jimbo, or the board members (many of whose home addresses are easily found).
Maybe Jimbo would be willing to make the block in those high profile cases. I doubt his doing so would bring him any more attention from stalkers than he already has.
If there are some gaps in that 24 hour coverage, hire
someone to fill
in those gaps. Pay them enough that they can buy a
PO box, an alarm
system for their house, etc. How does society handle
having judges
and police and presidents and soldiers and other
figures who have to
make and enforce decisions that rile up a few
nutters? Not by making
them unaccountable for their actions. If Wikipedia
is a serious
project creating a real benefit to society, why
shouldn't it do the
same thing? Being part of the wikipolice is surely
less dangerous
than being part of the real police.
Presidents have bodyguards. Judges generally have
police escorts if
they need them. Police and soldiers are trained and
equipped to defend
themselves. Giving Wikipedia admins personal protection
would be
taking things a little too far, IMHO ;).
For volunteers, yes. But if being an admin is so dangerous that enough people aren't volunteering, hiring one or two people to essentially be paid admins would be a possibility. Creating a world in which every single person can share freely in the sum of human knowledge is a big real world task which has costs and risks involved in it.
Personally I think there are probably enough volunteers right now to cover the task, and hiring someone would be overkill. The solution is as I said it a month or so ago. If you're not willing to deal with stalkers, don't be an admin, or at least don't be an admin that performs controversial actions. But if the choice is between taking away admin accountability (as suggested by Sarah) and hiring a few body guards, I think the latter is a much better choice.
On 9/20/07, Matthew Britton matthew.britton@btinternet.com wrote:
Brian McNeil wrote:
Stalkers are not a nice thing to deal with and I think you're underplaying the seriousness of the issue. They may be no serious risk of physical threat and thus your comments about bodyguards are appropriate. However, you're not taking into account what it is like when you get death threats by email and snail-mail.
About 6 years ago I had to deal with the divorce of my current partner when she left her husband after he assaulted her. I was made aware through a third party that he had obtained a shotgun and was looking for me. In the end - thanks to me employing good legal representation and hiding my location - he decided to turn the aforementioned weapon on himself.
Don't joke about stalking, it happens.
Brian.
Yes, stalking happens. It happens in situations such as that which you describe; this is certainly a problem and such incidents should certainly be taken seriously.
But it doesn't happen because someone was banned from a website.
I don't have time to respond in full to this, but I'm afraid it very much does happen because people are blocked or banned, and some of it has been quite serious. Nothing rising to the level of being pursued with a weapon, but very upsetting threats of violence to family members, and attempts to destroy people's careers and reputations.
Sarah
SlimVirgin wrote:
On 9/20/07, Matthew Britton matthew.britton@btinternet.com wrote:
Brian McNeil wrote:
Stalkers are not a nice thing to deal with and I think you're underplaying the seriousness of the issue. They may be no serious risk of physical threat and thus your comments about bodyguards are appropriate. However, you're not taking into account what it is like when you get death threats by email and snail-mail.
About 6 years ago I had to deal with the divorce of my current partner when she left her husband after he assaulted her. I was made aware through a third party that he had obtained a shotgun and was looking for me. In the end - thanks to me employing good legal representation and hiding my location - he decided to turn the aforementioned weapon on himself.
Don't joke about stalking, it happens.
Brian.
Yes, stalking happens. It happens in situations such as that which you describe; this is certainly a problem and such incidents should certainly be taken seriously.
But it doesn't happen because someone was banned from a website.
I don't have time to respond in full to this, but I'm afraid it very much does happen because people are blocked or banned, and some of it has been quite serious. Nothing rising to the level of being pursued with a weapon, but very upsetting threats of violence to family members, and attempts to destroy people's careers and reputations.
Sarah
There is stalking, and then there is trolling with intent to cause the maximum amount of fuss and emotional distress.
"I'm going to kill your family" is, in terms of emotional distress per word, about as efficient as you can get. It is therefore hardly surprising that it is a common trolling tactic.
One or two unfortunate cases aside (which I cannot help but feel were not entirely unprovoked) it is this latter issue, not stalking per se, which Wikipedia is experiencing.
Furthermore, the issue isn't limited to administrators -- it's perfectly possible to make enemies on-line without having any such extra abilities (though one could argue it helps) -- so discussion of the problem as though it is a phenomenon unique to administrators, as has been happening in this thread, is hardly useful.
-Gurch
On 9/20/07, Matthew Britton matthew.britton@btinternet.com wrote:
SlimVirgin wrote:
On 9/20/07, Matthew Britton matthew.britton@btinternet.com wrote:
Brian McNeil wrote:
Stalkers are not a nice thing to deal with and I think you're underplaying the seriousness of the issue. They may be no serious risk of physical threat and thus your comments about bodyguards are appropriate. However, you're not taking into account what it is like when you get death threats by email and snail-mail.
About 6 years ago I had to deal with the divorce of my current partner when she left her husband after he assaulted her. I was made aware through a third party that he had obtained a shotgun and was looking for me. In the end - thanks to me employing good legal representation and hiding my location - he decided to turn the aforementioned weapon on himself.
Don't joke about stalking, it happens.
Brian.
Yes, stalking happens. It happens in situations such as that which you describe; this is certainly a problem and such incidents should certainly be taken seriously.
But it doesn't happen because someone was banned from a website.
I don't have time to respond in full to this, but I'm afraid it very much does happen because people are blocked or banned, and some of it has been quite serious. Nothing rising to the level of being pursued with a weapon, but very upsetting threats of violence to family members, and attempts to destroy people's careers and reputations.
Sarah
There is stalking, and then there is trolling with intent to cause the maximum amount of fuss and emotional distress.
"I'm going to kill your family" is, in terms of emotional distress per word, about as efficient as you can get. It is therefore hardly surprising that it is a common trolling tactic.
One or two unfortunate cases aside (which I cannot help but feel were not entirely unprovoked) it is this latter issue, not stalking per se, which Wikipedia is experiencing.
Furthermore, the issue isn't limited to administrators -- it's perfectly possible to make enemies on-line without having any such extra abilities (though one could argue it helps) -- so discussion of the problem as though it is a phenomenon unique to administrators, as has been happening in this thread, is hardly useful.
-Gurch
While I agree that 90-something % of "death threats" and the like online amount to extreme cases of trolling (people who have no real-world intent or practical mindset / opportunity to actually commit violence against the threatened target), there is a small fringe of actual legitimate threats of violence, and some are followed through on.
Enough of them are real credible threats that it's not unreasonable to treat them, categorically, as a legitimate risk.
A focusing phenomena has been noticed where more of both trolling and legitimate threats are made to people percieved to be in positions of authority - newsgroup moderators, AOL forum moderators, ISP staff, and probably Wikipedia administrators.
That said, I don't agree with going into a bunker mentality about this. If you go into hiding afterwards, the bad guy / troll won.
On 20/09/2007, Matthew Britton matthew.britton@btinternet.com wrote:
Yes, stalking happens. It happens in situations such as that which you describe; this is certainly a problem and such incidents should certainly be taken seriously. But it doesn't happen because someone was banned from a website.
Um, in the case of Wikipedia you're factually incorrect. The hard stalking work of Judd Bagley of overstock.com is a counterexample. Stalking on a corporate budget no less!
- d.
On 9/20/07, David Gerard dgerard@gmail.com wrote:
On 20/09/2007, Matthew Britton matthew.britton@btinternet.com wrote:
Yes, stalking happens. It happens in situations such as that which you describe; this is certainly a problem and such incidents should certainly be taken seriously. But it doesn't happen because someone was banned from a website.
Um, in the case of Wikipedia you're factually incorrect. The hard stalking work of Judd Bagley of overstock.com is a counterexample. Stalking on a corporate budget no less!
So call the cops. You know who he is. You know what jurisdiction he lives in. Stalking is illegal. So if he really is stalking you, call the cops.
Please note that I have no evidence that Judd actually has stalked anybody.
On 9/20/07, David Gerard dgerard@gmail.com wrote:
Um, in the case of Wikipedia you're factually incorrect. The hard stalking work of Judd Bagley of overstock.com is a counterexample. Stalking on a corporate budget no less!
When did investigation and fact based criticism become synonymous with stalking? I missed that memo.
Or is it only stalking when it's someone "we" dislike investigating someone "we" like, and protected free speech the other way around? (like the extensive research that some of the "anti-stalkers" put into Daniel Brandt these last few years)
I'm not saying that I agree with the allegations, but to call it stalking when someone investigates something which they reasonably believe to be misconduct just seems wrong to me.
It's possible for perfect reasonable people to believe completely stupid things. We see it all the time. We should respond to their concerns with respect, and dispassionate facts. If they are reasonable the disagreement will be easily resolved, and if they are unreasonable their continued aggression towards a reasonable and respectful response will discredit them in ways that no amount of censorship could ever hope to achieve.
On 20/09/2007, Gregory Maxwell gmaxwell@gmail.com wrote:
When did investigation and fact based criticism become synonymous with stalking? I missed that memo. Or is it only stalking when it's someone "we" dislike investigating someone "we" like, and protected free speech the other way around? (like the extensive research that some of the "anti-stalkers" put into Daniel Brandt these last few years) I'm not saying that I agree with the allegations, but to call it stalking when someone investigates something which they reasonably believe to be misconduct just seems wrong to me.
His activities in this regard (though not in relation to Wikipedia editors) have made serious papers. I fear you're talking out your arse on this one.
- d.
On 9/20/07, David Gerard dgerard@gmail.com wrote:
On 20/09/2007, Gregory Maxwell gmaxwell@gmail.com wrote:
When did investigation and fact based criticism become synonymous with stalking? I missed that memo. Or is it only stalking when it's someone "we" dislike investigating someone "we" like, and protected free speech the other way around? (like the extensive research that some of the "anti-stalkers" put into Daniel Brandt these last few years) I'm not saying that I agree with the allegations, but to call it stalking when someone investigates something which they reasonably believe to be misconduct just seems wrong to me.
His activities in this regard (though not in relation to Wikipedia editors) have made serious papers. I fear you're talking out your arse on this one.
The closest I can find is an accusation by [good guy number 1], who ironically uses the same exact law (Section 113 of the "Violence Against Women and Department of Justice Reauthorization Act") that [bad guy number 1] accuses Wikipedia admins of violating. And [good guy number 1]'s accusation in itself rests upon "outing" [bad guy number 2].
If that law really is to be read as broadly as [bad guy number 1] and [good guy number 1] want it read (that "annoying" people anonymously is illegal), then [good guy number 1], [good gal number 1], and [bad guy number 2] are all guilty of violating it.
On 9/20/07, Matthew Britton matthew.britton@btinternet.com wrote:
. This thread has turned into exactly the sort of over-the-top response they are trying to get.
Someone, please... tell me I'm not the only one who can see this?
Still enjoying life in the real world,
-Gurch
As a neutral observer, it looks like a bunker mentality to dispose of WP:Dispute Resolution policy, abandon any pretense to so-called "Accountability", and purge dissenting voices as terrorists making death threats.
Rob Smith
Rob Smith wrote:
On 9/20/07, Matthew Britton matthew.britton@btinternet.com wrote:
. This thread has turned into exactly the sort of over-the-top response they are trying to get.
Someone, please... tell me I'm not the only one who can see this?
Still enjoying life in the real world,
-Gurch
As a neutral observer, it looks like a bunker mentality to dispose of WP:Dispute Resolution policy, abandon any pretense to so-called "Accountability", and purge dissenting voices as terrorists making death threats.
Rob Smith
What? Next you'll be telling me they want to expand the external links policy so they can ban anyone who links to a site they don't like.
Oh wait...
-Gurch
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Matthew Britton wrote:
Yes, stalking happens. It happens in situations such as that which you describe; this is certainly a problem and such incidents should certainly be taken seriously.
But it doesn't happen because someone was banned from a website.
Anyone can make a death threat online. Actually carrying out said threat is never in the mind of whoever makes it. Such threats are the product of the trolls who were blocked in the first place. This thread has turned into exactly the sort of over-the-top response they are trying to get.
Someone, please... tell me I'm not the only one who can see this?
You're not the only one Gurch. I think people are taking the idle threats of idiot teenagers way too seriously.
Brian McNeil wrote:
Stalkers are not a nice thing to deal with and I think you're underplaying the seriousness of the issue. They may be no serious risk of physical threat and thus your comments about bodyguards are appropriate. However, you're not taking into account what it is like when you get death threats by email and snail-mail.
I don't think it's fair to imply that those of us who oppose specific "anti-stalking" measures are somehow unfamiliar with the issue. Many of us have experienced crazy people going on crusades and sending out flurries of crazy and often threatening messages and/or phone calls. I'm just more jaded about it, and consider it part of the cost of doing business, so to speak, when dealing with electronic communications systems. That sort of thing has been a part of the internet for as long as I can remember, and the internet hasn't fallen apart because of it, so people muddle through as always. Nobody came up with a way to solve the problem on Usenet in the mid-1990s without making other things worse, and I don't think you're going to come up with a magic solution today on Wikipedia without breaking something else either.
-Mark
Anthony wrote:
There are plenty of admins that happily make their real identity public knowledge and apparently aren't so afraid of "stalkers" that they're unwilling to block people. There's probably at least one of them online 24 hours a day. Get one of them to make the block.
If there are some gaps in that 24 hour coverage, hire someone to fill in those gaps. Pay them enough that they can buy a PO box, an alarm system for their house, etc.
Until I got to this bit I was going to say, "Oh please, if it's that much of a problem, just give me my sysop bit back and I'll block anyone you tell me to." It's only clicking a button on a website, after all.
But if you're offering *money* for my services, perhaps I should hold out for a better deal. I could use a new alarm system...
-Gurch
On 9/20/07, Matthew Britton matthew.britton@btinternet.com wrote:
Anthony wrote:
There are plenty of admins that happily make their real identity public knowledge and apparently aren't so afraid of "stalkers" that they're unwilling to block people. There's probably at least one of them online 24 hours a day. Get one of them to make the block.
If there are some gaps in that 24 hour coverage, hire someone to fill in those gaps. Pay them enough that they can buy a PO box, an alarm system for their house, etc.
Until I got to this bit I was going to say, "Oh please, if it's that much of a problem, just give me my sysop bit back and I'll block anyone you tell me to." It's only clicking a button on a website, after all.
But if you're offering *money* for my services, perhaps I should hold out for a better deal. I could use a new alarm system...
Holding out will only work if you can convince all the other admins to hold out too. Maybe y'all should unionize...
Anthony wrote:
On 9/20/07, Matthew Britton matthew.britton@btinternet.com wrote:
Anthony wrote:
There are plenty of admins that happily make their real identity public knowledge and apparently aren't so afraid of "stalkers" that they're unwilling to block people. There's probably at least one of them online 24 hours a day. Get one of them to make the block.
If there are some gaps in that 24 hour coverage, hire someone to fill in those gaps. Pay them enough that they can buy a PO box, an alarm system for their house, etc.
Until I got to this bit I was going to say, "Oh please, if it's that much of a problem, just give me my sysop bit back and I'll block anyone you tell me to." It's only clicking a button on a website, after all.
But if you're offering *money* for my services, perhaps I should hold out for a better deal. I could use a new alarm system...
Holding out will only work if you can convince all the other admins to hold out too. Maybe y'all should unionize...
We demand a 10% raise!
Wait, that's still nothing...
-Gurch
On 9/19/07, SlimVirgin slimvirgin@gmail.com wrote:
Yes, I agree that protecting IP address is hard.
Not for admins. Just use Tor.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Anthony wrote:
On 9/19/07, SlimVirgin slimvirgin@gmail.com wrote:
Yes, I agree that protecting IP address is hard.
Not for admins. Just use Tor.
It's very easy to say "just use Tor". But have you actually done so? I bet I have more Tor experience than 99% of the people on this list -- I semi-regularly use it for web browsing and I've even written up some GNU/Linux applications designed to interface through Tor on the command line. And my simple conclusion is this: Tor is slow. Really really slow. It turns a 100ms page load into a page load that takes many seconds, *if* it doesn't time out. Using Tor makes the web browsing experience significantly worse, and only makes sense to use when security is really in question. Wikipedia should not be a site whose security is so risky that we have to recommend our admins go through the agony of trying to do all of their Wikipedia work through Tor.
And by the way, remember that all unencrypted web traffic ends up unencrypted at the Tor exit node, and can be (and sometimes is) sniffed by unscrupulous folks. If you are using Tor you *must* make sure to use only the secure Wikimedia https proxy. Even that is difficult though, because you'll end up clicking a link that takes you to unsecure http pages (such as a diff links), and before you can blink, your admin cookie has gone across the web unencrypted. As far as I can see there is no fool-proof way of using Tor with Wikipedia, except for maybe blocking unencrypted http Wikipedia at a firewall level.
On 9/20/07, Ben McIlwain cydeweys@gmail.com wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Anthony wrote:
On 9/19/07, SlimVirgin slimvirgin@gmail.com wrote:
Yes, I agree that protecting IP address is hard.
Not for admins. Just use Tor.
It's very easy to say "just use Tor". But have you actually done so?
Umm, yeah.
I bet I have more Tor experience than 99% of the people on this list -- I semi-regularly use it for web browsing and I've even written up some GNU/Linux applications designed to interface through Tor on the command line.
I edit Wikipedia through Tor all the time. I even set up a script which compares the list of tor exit nodes against the list of blocked Wikipedia IPs and tells Tor to use only exit nodes which allow editing, thus avoiding blocking.
And my simple conclusion is this: Tor is slow. Really really slow. It turns a 100ms page load into a page load that takes many seconds, *if* it doesn't time out.
Do you have the latest version? I'm getting fairly consistent page loads of less than a second right now. Maybe it's because of the exit node thing. But it seems to me like you must not have the latest version.
Using Tor makes the web browsing experience significantly worse, and only makes sense to use when security is really in question.
Well, obviously "security" is a big issue for Sarah.
Wikipedia should not be a site whose security is so risky that we have to recommend our admins go through the agony of trying to do all of their Wikipedia work through Tor.
I wouldn't recommend it to everyone, only to paranoid people like me and Sarah.
And by the way, remember that all unencrypted web traffic ends up unencrypted at the Tor exit node, and can be (and sometimes is) sniffed by unscrupulous folks. If you are using Tor you *must* make sure to use only the secure Wikimedia https proxy.
Of course. This is a good idea for admins to always do anyway.
Even that is difficult though, because you'll end up clicking a link that takes you to unsecure http pages (such as a diff links), and before you can blink, your admin cookie has gone across the web unencrypted. As far as I can see there is no fool-proof way of using Tor with Wikipedia, except for maybe blocking unencrypted http Wikipedia at a firewall level.
Umm, now I'm going to have to ask you: have you ever actually used Tor? Cookies don't get sent to the unsecure pages, and the diff links aren't unsecure.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Anthony wrote:
And my simple conclusion is this: Tor is slow. Really really slow. It turns a 100ms page load into a page load that takes many seconds, *if* it doesn't time out.
Do you have the latest version? I'm getting fairly consistent page loads of less than a second right now. Maybe it's because of the exit node thing. But it seems to me like you must not have the latest version.
Yeah, I have the latest version. The speed issues are widely experienced by many, many people. It's not just me. You seem to be lucky.
Even that is difficult though, because you'll end up clicking a link that takes you to unsecure http pages (such as a diff links), and before you can blink, your admin cookie has gone across the web unencrypted. As far as I can see there is no fool-proof way of using Tor with Wikipedia, except for maybe blocking unencrypted http Wikipedia at a firewall level.
Cookies don't get sent to the unsecure pages, and the diff links aren't unsecure.
Diff links are insecure. When someone puts a diff link onto a page, the secure proxy does not edit that link to turn it into a secure link. As for the cookies issue, I guess I was confusing myself because I am logged onto en-wiki as well as the secure proxy.
On 2007.09.20 20:15:12 -0400, Ben McIlwain cydeweys@gmail.com scribbled 39 lines: ...
And by the way, remember that all unencrypted web traffic ends up unencrypted at the Tor exit node, and can be (and sometimes is) sniffed by unscrupulous folks. If you are using Tor you *must* make sure to use only the secure Wikimedia https proxy. Even that is difficult though, because you'll end up clicking a link that takes you to unsecure http pages (such as a diff links), and before you can blink, your admin cookie has gone across the web unencrypted.
...
Is this actually true, though? As I've said before, I edit through secure.wikimedia.org, and I've done so for the past few months. In that time, I've clicked on external links to en.wikipedia.org/wiki/whatever - not internal links to https://secure.wikimedia.org/wikipedia/en/wiki/whatever - and not once have I found myself to be logged in on En.
-- gwern veggie GEODSS NOCS JRA Marxist 2010 SP4 JICA 5707 Yobie
On 9/20/07, Gwern Branwen gwern0@gmail.com wrote:
On 2007.09.20 20:15:12 -0400, Ben McIlwain cydeweys@gmail.com scribbled 39 lines: ...
And by the way, remember that all unencrypted web traffic ends up unencrypted at the Tor exit node, and can be (and sometimes is) sniffed by unscrupulous folks. If you are using Tor you *must* make sure to use only the secure Wikimedia https proxy. Even that is difficult though, because you'll end up clicking a link that takes you to unsecure http pages (such as a diff links), and before you can blink, your admin cookie has gone across the web unencrypted.
...
Is this actually true, though? As I've said before, I edit through secure.wikimedia.org, and I've done so for the past few months. In that time, I've clicked on external links to en.wikipedia.org/wiki/whatever - not internal links to https://secure.wikimedia.org/wikipedia/en/wiki/whatever - and not once have I found myself to be logged in on En.
No, it's absolutely untrue. I just verified it. The cookies are properly sent as "secure" cookies, "secure" being a flag which when set means not only will cookies not be sent to en.wikipedia.org, they won't even be sent to http://secure.wikimedia.org/.
The only workable solution I can see is to make it less likely that stalkers will want to target particular admins. One way to do that would be to set up anonymous admin accounts that multiple admins could use. So for example, if a difficult user needs to be blocked, any admin could access the joint admin account to make the block. The user would only see that User:Admin1 had blocked him. Only trusted people would have access to which admin had made a block with User:Admin1 at time T.
I know it would complicate things, and it might make admin abuse a little more likely. And we'd still have the problem of potential leaks, so it wouldn't be foolproof by any means.
I would go with an ArbCom role account that can enforce decisions made by ArbCom on their private mailing list. That way there are not any individuals with the power to make untraceable blocks. ArbCom are the most trusted members of the community (they may not have universal support, but they are still trusted with checkuser and oversight), so it makes sense that they be the ones to do this.
On 19/09/2007, Gregory Maxwell gmaxwell@gmail.com wrote:
... it's very very very easy to accidentally edit while logged out, especially when you cross over to one of our other wikis like commons or meta.
Mmm. I have, in the past, edited (unintentionally) logged out from three different home addresses, from work, and from a friend's machine *whose IP resolved to her name*.
(Seriously. Oh, you have to love badly-designed university networks.)
Providing privacy strong enough to stop a stalker for people who are indirectly spewing out large amounts of information about themselves in the form of edits is just a really hard problem which I don't have a solution for...
Mmm. Completely concealing all information which could be an indirect pointer would lead you to spend all your time randomly copyediting articles on things that don't interest you, interacting with no-one, *and nothing else*, which whilst useful assumes a rather high level of dedication to the project :-)
On 9/20/07, Andrew Gray shimgray@gmail.com wrote:
On 19/09/2007, Gregory Maxwell gmaxwell@gmail.com wrote:
... it's very very very easy to accidentally edit while logged out, especially when you cross over to one of our other wikis like commons or meta.
Mmm. I have, in the past, edited (unintentionally) logged out from three different home addresses, from work, and from a friend's machine *whose IP resolved to her name*.
That's another advantage of using Tor. If you accidentally try to edit while logged out, you usually wind up with an "IP is blocked" message!
On 20/09/2007, Anthony wikimail@inbox.org wrote:
That's another advantage of using Tor. If you accidentally try to edit while logged out, you usually wind up with an "IP is blocked" message!
Heh. Maybe we could get around the whole blocking-open-proxies argument by redefining it as "a feature to help our security-conscious users" ;-)
On 9/20/07, Andrew Gray andrew.gray@dunelm.org.uk wrote:
On 20/09/2007, Anthony wikimail@inbox.org wrote:
That's another advantage of using Tor. If you accidentally try to edit while logged out, you usually wind up with an "IP is blocked" message!
Heh. Maybe we could get around the whole blocking-open-proxies argument by redefining it as "a feature to help our security-conscious users" ;-)
Sounds good, just make sure you softblock them.
Actually, now that checkuser is in place maybe it's time to just stop displaying IP addresses of "anon editors" regardless of whether or not they use Tor.
On 20/09/2007, Andrew Gray shimgray@gmail.com wrote:
On 19/09/2007, Gregory Maxwell gmaxwell@gmail.com wrote:
... it's very very very easy to accidentally edit while logged out, especially when you cross over to one of our other wikis like commons or meta.
Mmm. I have, in the past, edited (unintentionally) logged out from three different home addresses, from work, and from a friend's machine *whose IP resolved to her name*.
(Seriously. Oh, you have to love badly-designed university networks.)
If I accidentally edit logged out, you'll find the enwiki user page for my static IP address links to my user page. I understand that other people want to keep their identity secret, but I really don't care. Perhaps I would feel differently if someone starting stalking me, but even then I doubt it. To be honest, not trying to hide probably reduces your chances of being stalked - what's the point in stalking someone who tells you exactly where to find them? (Well, I don't intend to hand out my home address or phone number, but it wouldn't be too difficult for someone to find out my uni timetable, find my picture on the noticeboard in my college and wait for me outside Algebraic Geometry. I would probably notice if they followed me home, though.)
wikimedia-l@lists.wikimedia.org