[Foundation-l] Wikipedia tracks user behaviour via third party companies

List overview All Threads
Download

newer

older

[Foundation-l] UN announces free...

Re: [Foundation-l] pt:wiki policies

Tim 'avatar' Bartel

4 Jun 2009 4 Jun '09

8:18 a.m.

Hi, recently the report of the KnowPrivacy [1] study - a research project by the School of Information from University of California in Berkeley - hit the German media [2]. It came to the conclusion that "All of the top 50 websites contained at least one web bug at some point in a one month time period." [3] which includes wikipedia.org. This is very troubleing and irritating for some of our (German) users who are very sensitive to data privacy topics. So I established contact to Brian W. Carver (University of California) who connected me to David Cancel, the maintainer of Ghostery, which was used to identify the web bugs. David wrote me today:

...

The following web bug trackers were reported to us, on the following subdomains: Google Analytics - vls.wikipedia.org Doubleclick - hu.wikipedia.org Both were seen in yesterday's data so they're recent. We don't receive any page level information so that's as much detail as we have. Hope that helps.

I wasn't able to track down the Doubleclick web bug on the hungarian Wikipedia, but Google Analytics web bug is integrated in every page of the West Flemish Wikipedia via JavaScript [4]. Our privacy policy [5] states "The Wikimedia Foundation may keep raw logs of such transactions [IP and other technical information], but these will not be published or used to track legitimate users." and "As a general principle, the access to, and retention of, personally identifiable data in all projects should be minimal and should be used only internally to serve the well-being of the projects." I think we should stop the current use of Google Analytics ASAP. Bye, Tim. -- http://wikimedia.de [1] http://knowprivacy.org [2] http://www.heise.de/newsticker/Studie-Google-fuehrend-bei-Web-Bug-Nutzung--… [3] http://www.knowprivacy.org/report/KnowPrivacy_Final_Report.pdf, p. 4 [4] http://vls.wikipedia.org/wiki/MediaWiki:Common.js [5] http://wikimediafoundation.org/wiki/Privacy_policy

Show replies by date

Domas Mituzas

4 Jun 4 Jun

8:55 a.m.

Hi,

...

I think we should stop the current use of Google Analytics ASAP.

I'm usually proponent of indefinite bans to people who do this, but there are others who want milder approaches :-) Indeed, this is violation of our privacy policy, and never should be allowed. Thanks for headsup. Do note, hu.wikipedia.org has external stats aggregator, 'stats.wikipedia.hu', which is hosted on vhost102.sx6.tolna.net - and all our traffic is sent there ( http://hu.wikipedia.org/w/index.php?title=MediaWiki:Lastmodifiedat&oldi… - as well as few other places ) I removed from both. Thanks again :) Domas

Nikola Smolenski

12:20 p.m.

New subject: [Foundation-l] Wikipedia tracks user behaviour via third party companies

Domas Mituzas wrote:

...

Do note, hu.wikipedia.org has external stats aggregator, 'stats.wikipedia.hu', which is hosted on vhost102.sx6.tolna.net - and all our traffic is sent there ( http://hu.wikipedia.org/w/index.php?title=MediaWiki:Lastmodifiedat&oldi… - as well as few other places )

One way to fight this would be to offer more detailed visitor statistics to people who need them.

Neil Harris

4:07 p.m.

New subject: [Foundation-l] Wikipedia tracks user behaviour via third party companies

Nikola Smolenski wrote:

...

Domas Mituzas wrote:

One way to fight this would be to offer more detailed visitor statistics to people who need them. __________

And another, possibily even more effective one would be to prevent the loading of external resources in the software, except possibly via editors' own custom user javascript pages. -- Neil

Tisza Gergő

3:23 p.m.

New subject: [Foundation-l] Wikipedia tracks user behaviour via third party companies

Domas Mituzas <midom.lists@...> writes:

...

The stats aggregator for hu.wikipedia.org was set up with community approval, the public results contain no identifiable per-machine information (you can check them here: http://stats.wikipedia.hu/cgi-bin/awstats.pl ), and the records are not used for any other purposes. I think it is well within the lines of the privacy policy. As for Doubleclick, that was probably a mistake on KnowPrivacy's part - maybe they misidentified the aggregator (we use awstats) because Doubleclick uses a similar method? If not, I would appreciate if they could serve with more detailed information.

Tim 'avatar' Bartel

3:58 p.m.

Hi, 2009/6/4 Tisza Gergő <gtisza(a)gmail.com>om>:

...

As for Doubleclick, that was probably a mistake on KnowPrivacy's part - maybe they misidentified the aggregator (we use awstats) because Doubleclick uses a similar method? If not, I would appreciate if they could serve with more detailed information.

Sad but true, they don't have further information on that. I'll try to reproduce it. Bye, Tim. -- http://wikimedia.de

Neil Harris

12:01 p.m.

New subject: [Foundation-l] Wikipedia tracks user behaviour via third party companies

Tim 'avatar' Bartel wrote:

...

John at Darkstar

12:20 p.m.

New subject: [Foundation-l] Wikipedia tracks user behaviour via third party companies

We need tools to track user behavior inside Wikipedia. As it is now we know nearly nothing at all about user behavior and nearly all people saying anything about users at Wikipedia makes gross estimates and wild guesses. User privacy on Wikipedia is is close to a public hoax, pages are transfered unencrypted and with user names in clear text. Anyone with access to a public hub is able to intercept and identify users, in addition to _all_ websites that are referenced during an edit on Wikipedia through correlation of logs. Compared to this the whole previous discussion about the Iranian steward is somewhat strange, if not completely ridiculous. Get real, the whole system and access to it is completely open! John Neil Harris skrev:

...

Tim 'avatar' Bartel wrote:

Surely this is something which should be possible to block at the MediaWiki level, by suppressing the generation of any HTML that loads any indirect resources (scripts, iframes, images, etc.) whatsoever other than from a clearly defined whitelist of Wikimedia-Foundation-controlled domains? Doing this should completely stop site admins from adding web bugs. -- Neil _______________________________________________ foundation-l mailing list foundation-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

John at Darkstar

12:48 p.m.

New subject: [Foundation-l] Wikipedia tracks user behaviour via third party companies

Forgot a link to an article which describes very well privacy on Wikipedia! ;) http://en.wikipedia.org/wiki/The_Emperor%27s_New_Clothes John at Darkstar skrev:

...

Tim 'avatar' Bartel wrote:

_______________________________________________ foundation-l mailing list foundation-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Neil Harris

1:07 p.m.

New subject: [Foundation-l] Wikipedia tracks user behaviour via third party companies

John at Darkstar wrote:

...

As you say, there is no possibility of absolute privacy from anyone with access to the traffic stream, since the Internet was never engineered to give this kind of privacy. Wikipedia as "completely open" as any other non-https website -- and, even with https, as with any other website with publicly visible transactions, for anyone with access to the traffic stream, simple traffic analysis is generally enough to correlate user identities to IPs. A combination of http and Tor is probably as good as it gets in attempting to avoid this, but even this has its limitations. But it is simply unreasonable to equate this with no privacy at all. Most possible eavesdroppers do _not_ have access to the entire traffic stream, and those who do have access to traffic generally only have access to part of the traffic stream, and even then, most of them can't be bothered to eavesdrop, or are discouraged from doing so by privacy laws. Given this, it is quite reasonable to take appropriate technical measures that attempt to keep as much of that remaining privacy as secure as possible. -- Neil

John at Darkstar

1:33 p.m.

New subject: [Foundation-l] Wikipedia tracks user behaviour via third party companies

The interesting thing is "who has interest in which users identity". Lets make an example, some organization sets up a site with a honeypot and logs all visitors. Then they correlates that with RC-logs from Wikipedia and then checks out who adds external links back to themselves. They do not need direct access to Wikipedia logs or the raw traffic. There is only one valid reason as I see it to avoid certain stat engines, and that is to block advertising companies from getting information about the readers. The writers does not have any real anonymity at all. John Neil Harris skrev:

...

John at Darkstar wrote:

Neil Harris

4:03 p.m.

New subject: [Foundation-l] Wikipedia tracks user behaviour via third party companies

John at Darkstar wrote:

...

Indeed they could. But even so, they would still have great difficulty in getting more than a small fraction of Wikipedia's readers to both visit the honeypot and make an edit that links to it, and the vast majority of unaffected users will still avoid being bitten by this attack. And even then, they will still only have obtained a mapping between the user's current IP and their Wikipedia account, and will still have to correlate this back to a personal identity, which is often harder than it might seem to be in theory. The world is a dangerous place, but just because privacy and security can never be absolute is not a reason to make good faith efforts to preserve it as much of both as reasonably possible within the limits of time and resources available. Just because a door can be knocked down with a sledgehammer (or a wall demolished with a pneumatic hammer) is not a reason not to have a lock on it, or a door there in the first place. -- Neil

Jussi-Ville Heiskanen

5 Jun 5 Jun

5:08 a.m.

New subject: [Foundation-l] Wikipedia tracks user behaviour via third party companies

Neil Harris wrote:

...

John at Darkstar wrote:

The Finnish folk saying has it that locks are there against honest folk, not against thieves. That is any lock can be compromised by determined enough pursuivants, but are a significant signal and sense that what is on the other side is not a matter for all passersby. Yours, Jussi-Ville Heiskanen

Unionhawk

4 Jun 4 Jun

5:13 p.m.

...

Surely this is something which should be possible to block at the

...

Tim 'avatar' Bartel wrote:

subdomains:

> Google Analytics - vls.wikipedia.org > Doubleclick - hu.wikipedia.org > Both were seen in yesterday's data so they're recent. We don't receive

any page level information so that's as much detail as we have. Hope that helps.

Dan Rosenthal

5:35 p.m.

Installing Google Analytics, even for our own purposes, is a bad idea. For one, it creates a link to google that is not necessarily what we want; it would be a big target for people to try and hack, and it presents tempting security risks on Google's end. Not to mention, as far as I know the program is proprietary. If we're going to do something like this, it should be open source, and it should something that we can internally install and monitor without external options. That is, again, assuming we do something like that. That's not a foregone presumption. I'm not convinced that we need to be tracking user behavior at this point in time, or that the tradeoffs for doing so are worth any benefits, or that doing so is in furtherance of our mission. -Dan On Jun 4, 2009, at 11:13 AM, Unionhawk wrote:

...

Surely this is something which should be possible to block at the

MediaWiki level Maybe if we set up Google Analytics in the first place (done by the Foundation office) and never used it; the foundation could set up analytics for all projects with a super secure password, and never use it. Will this work, or will somebody else be able to set up analytics still? Go Freedom! Unionhawk On Thu, Jun 4, 2009 at 6:01 AM, Neil Harris <usenet(a)tonal.clara.co.uk>wrote;wrote: > Tim 'avatar' Bartel wrote: >> Hi, >> >> recently the report of the KnowPrivacy [1] study - a research >> project >> by the School of Information from University of California in >> Berkeley >> - hit the German media [2]. >> >> It came to the conclusion that "All of the top 50 websites contained >> at least one web bug at some point in a one month time period." [3] >> which includes wikipedia.org. >> >> This is very troubleing and irritating for some of our (German) >> users >> who are very sensitive to data privacy topics. So I established >> contact to Brian W. Carver (University of California) who >> connected me >> to David Cancel, the maintainer of Ghostery, which was used to >> identify the web bugs. David wrote me today: >> >> >>> The following web bug trackers were reported to us, on the >>> following > subdomains: >>> Google Analytics - vls.wikipedia.org >>> Doubleclick - hu.wikipedia.org >>> Both were seen in yesterday's data so they're recent. We don't >>> receive > any page level information so that's as much detail as we have. > Hope that > helps. >>> >> >> I wasn't able to track down the Doubleclick web bug on the hungarian >> Wikipedia, but Google Analytics web bug is integrated in every >> page of >> the West Flemish Wikipedia via JavaScript [4]. >> >> Our privacy policy [5] states "The Wikimedia Foundation may keep raw >> logs of such transactions [IP and other technical information], but >> these will not be published or used to track legitimate users." and >> "As a general principle, the access to, and retention of, personally >> identifiable data in all projects should be minimal and should be >> used >> only internally to serve the well-being of the projects." >> >> I think we should stop the current use of Google Analytics ASAP. >> >> Bye, Tim. >> >>

Surely this is something which should be possible to block at the

> MediaWiki level, by suppressing the generation of any HTML that > loads > any indirect resources (scripts, iframes, images, etc.) whatsoever > other > than from a clearly defined whitelist of Wikimedia-Foundation- > controlled > domains? > > Doing this should completely stop site admins from adding web bugs. > > -- Neil > > > _______________________________________________ > foundation-l mailing list > foundation-l(a)lists.wikimedia.org > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/ > foundation-l > _______________________________________________ foundation-l mailing list foundation-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Alex

6:32 p.m.

New subject: [Foundation-l] Wikipedia tracks user behaviour via third party companies

Dan Rosenthal wrote:

...

The plain pageview stats are already available. Erik Zachte has been doing some work on other stats. <http://stats.wikimedia.org/EN/VisitorsSampledLogRequests.htm> If I were to compile a wishlist of stats things: 1. stats.grok.se data for non-Wikipedia projects 2. A better interface for stats.wikimedia.org - There's a lot of data there, but it can be hard to find it and its not very publicized. The only reason I knew about the link above is because someone pointed it out to me once and I bookmarked it. 3. Pageview stats at <http://dammit.lt/wikistats/> in files based on projects. It would be a lot easier for people at the West Flemish Wikipedia to analyze statistics themselves if they didn't have to download tons of data they don't need. -- Alex (wikipedia:en:User:Mr.Z-man)

Domas Mituzas

7 Jun 7 Jun

8:02 p.m.

Hello,

...

If I were to compile a wishlist of stats things: 1. stats.grok.se data for non-Wikipedia projects

the raw data is available, anyone can build anything like that, as long as they have resources. I've suggested Henrik to opensource his software, but probably it suffers from "not nice enough to show" yet.

...

3. Pageview stats at <http://dammit.lt/wikistats/> in files based on projects. It would be a lot easier for people at the West Flemish Wikipedia to analyze statistics themselves if they didn't have to download tons of data they don't need.

I'm considering some kind of API, but have to rethink the process (though some people want to have more data - like country tagging - instead of less data, hehe ;-), though apparently people who cry for stats most are also ones that are bashing my actions and attacking 'volunteer developers' , so... On the other hand, 'tons of data' is just 50MB an hour. :-) Cheers, Domas

Robert Rohde

4 Jun 4 Jun

6:53 p.m.

On Thu, Jun 4, 2009 at 8:35 AM, Dan Rosenthal <swatjester(a)gmail.com> wrote:

...

<snip> I may be misreading, but I believe Unionhawk's suggestion was to setup -- but not install -- Google Analytics in the hope that simply registering the accounts would block anyone else from creating an Analytics account pointed at Wikipedia. (I don't know if it actually works that way.) That strikes me as rather too much work though. Better to block the relevant URLs from being inserted in the first place. That could be accomplished in any one of several technical ways. One idea is the proposal to install the AbuseFilter in a global mode, i.e. rules loaded at Meta that apply everywhere. If that were done (and there are some arguments about whether it is a good idea), then it could be used to block these types of URLs from being installed, even by admins. -Robert Rohde

John at Darkstar

5 Jun 5 Jun

5:51 a.m.

New subject: [Foundation-l] Wikipedia tracks user behaviour via third party companies

...

One idea is the proposal to install the AbuseFilter in a global mode, i.e. rules loaded at Meta that apply everywhere. If that were done (and there are some arguments about whether it is a good idea), then it could be used to block these types of URLs from being installed, even by admins.

Identifying client side generated urls from server side opens up a whole lot of problems of its own. Basically you need a script that runs in a hostile environment and reports back to a server when a whole series of urls are injected from code loaded from some sources (mediawiki-space) but not from other sources user space), still code loaded from user space through call to mediawiki space should be allowed. Add to this that your url identifying code has to run after a script has generated the url and before it do any cleanup. The url verification can't just say that a url is hostile, it has to check it somehow, and that leads to reporting of the url - if the reporting code still executes at that moment. Urk... John

Alex

6:15 a.m.

New subject: [Foundation-l] Wikipedia tracks user behaviour via third party companies

John at Darkstar wrote:

...

Hmm? There's no reason to do anything like that. The AbuseFilter would just prevent sitewide JS pages from being saved with the particular URLs or a particular code block in them. It'll stop the well-meaning but misguided admins. Short of restricting site JS to the point of uselessness, you'll never be able to stop determined abusers. -- Alex (wikipedia:en:User:Mr.Z-man)

John at Darkstar

6:49 a.m.

New subject: [Foundation-l] Wikipedia tracks user behaviour via third party companies

...

A very typical code fragment to make a stat url is something like document.write('<img scr="' + server + digest + '">'); - server is some kind of external url - digest is just some random garbage to bypass caching This kind of code exists in so many variants that it is very difficult to say anything about how it may be implemented. Often it will not use a document.write on systems like Wikipedia but instead use createElement() Very often someone claims that the definition of "server" will be complete and may be used to identify the external server sufficiently. That is not a valid claim as many such sites can be referred for other purposes. Note also that the number of urls will be huge as this type of service is very popular, not to say that anyone that want may set up a special stat aggregator on an otherwise unknown domain. Basically, simple regexps are not sufficient for detecting this kind of code. Otherwise, take a look at Simetricals earlier post. John

Alex

7:52 a.m.

New subject: [Foundation-l] Wikipedia tracks user behaviour via third party companies

John at Darkstar wrote:

...

Other purposes that have valid uses loading 3rd party content on a Wikimedia wiki? Like what?

...

Note also that the number of urls will be huge as this type of service is very popular, not to say that anyone that want may set up a special stat aggregator on an otherwise unknown domain. Basically, simple regexps are not sufficient for detecting this kind of code.

I don't think I said it would be perfect, the idea isn't to 100% prevent it, just to try to stop the most obvious cases like Google analytics.

...

Otherwise, take a look at Simetricals earlier post. John

-- Alex (wikipedia:en:User:Mr.Z-man)

John at Darkstar

8:14 a.m.

New subject: [Foundation-l] Wikipedia tracks user behaviour via third party companies

Alex skrev:

...

John at Darkstar wrote:

Other purposes that have valid uses loading 3rd party content on a Wikimedia wiki? Like what?

If you don't trust other sites you also has to accept that you can't trust ant kind of «toolserver» where you don't have complete control. That opens a lot of problems

...

I don't think I said it would be perfect, the idea isn't to 100% prevent it, just to try to stop the most obvious cases like Google analytics.

Its not that it won't be perfect, it simply will not work. John

Alex

8:32 a.m.

New subject: [Foundation-l] Wikipedia tracks user behaviour via third party companies

John at Darkstar wrote:

...

Alex skrev:

John at Darkstar wrote:

Other purposes that have valid uses loading 3rd party content on a Wikimedia wiki? Like what?

If you don't trust other sites you also has to accept that you can't trust ant kind of «toolserver» where you don't have complete control. That opens a lot of problems

Its not just a matter of trust, its a matter of use. Why would people be loading content from or linking to servers used to collect website stats in the sitewide JS on a Wikimedia wiki?

...

I don't think I said it would be perfect, the idea isn't to 100% prevent it, just to try to stop the most obvious cases like Google analytics.

Its not that it won't be perfect, it simply will not work.

And anything more complex would likely be too complicated and/or too inefficient to be worthwhile.

...

John _______________________________________________ foundation-l mailing list foundation-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

-- Alex (wikipedia:en:User:Mr.Z-man)

Aryeh Gregor

3:33 p.m.

On Fri, Jun 5, 2009 at 2:14 AM, John at Darkstar<vacuum(a)jeb.no> wrote:

...

Its not that it won't be perfect, it simply will not work.

It will in most cases if you don't mind some false positives. False positives would be acceptable if it's just a warning page that the admin could click through. Check for anything that looks like a URL that doesn't go to a Wikimedia domain, and if one is being inserted into MediaWiki:*.js (or MediaWiki:*.css), politely notify the adder that it's against Wikimedia's privacy policy to include content from third-party domains, and anyone who adds it may be desysopped. That would stop well-intended additions, provided the sysop knows English and/or the message can be translated. And every such warning could be logged so that stewards or whoever could keep an eye on that site's CSS/JS for a while to make sure there are no evasion attempts. That would be quite effective.

John at Darkstar

5:27 a.m.

New subject: [Foundation-l] Wikipedia tracks user behaviour via third party companies

...

Not to mention, as far as I know the program is proprietary.

This is an example of whats the real problem here; its not the security issues but the users political issues.

...

I'm not convinced that we need to be tracking user behavior at this point in time, or that the tradeoffs for doing so are worth any benefits, or that doing so is in furtherance of our mission.

One example of a very important solution is to identify missing links between articles. Articles without parents are a special case. Articles without children too. Articles where a reoccurring problem persist in a missing link between two articles are the general case, but this can not be solved without referrer logging, or better logging to a external server after a JS-function has identified the logging as necessary. Unfortunatly such logging can't be done today so we must stick to the two less than optimum special cases. John

Dan Rosenthal

6 a.m.

On Jun 4, 2009, at 11:27 PM, John at Darkstar wrote:

...

Not to mention, as far as I know the program is proprietary.

This is an example of whats the real problem here; its not the security issues but the users political issues.

I fail to see what that has to do with anything. I'm just about as far from the open-source politics as you can get. Proprietary code can't be modified to suit our needs.

Aryeh Gregor

4 Jun 4 Jun

7:44 p.m.

On Thu, Jun 4, 2009 at 6:01 AM, Neil Harris<usenet(a)tonal.clara.co.uk> wrote:

...

Not possible as long as we allow JS to be added. See [[halting problem]]. On Thu, Jun 4, 2009 at 6:20 AM, John at Darkstar<vacuum(a)jeb.no> wrote:

...

User privacy on Wikipedia is is close to a public hoax, pages are transfered unencrypted and with user names in clear text. Anyone with access to a public hub is able to intercept and identify users, in addition to _all_ websites that are referenced during an edit on Wikipedia through correlation of logs.

This only works for getting info on totally random Wikipedia users, who happen to edit using your router. This isn't a serious compromise of privacy for practical purposes due to the resources required to get info on a large number of users, or to target a specific user. Users who are concerned about this, however, can use secure.wikimedia.org. Note that if you make edits, it should be pretty easy for a MITM to figure out your IP address even if you're using SSL: 1) Watch all traffic going to Wikimedia IP addresses. 2) Guess which traffic streams correspond to edits by looking at the amount of data the client is sending. 3) Correlate suspected edits with RecentChanges over a period of time. Once they know your IP address, if they're a MITM, they can still figure out what sites you're accessing, just not the exact pages (or exact domain in the case of virtual hosting). So if you want real privacy against MITMs, you still need to use something like Tor, as usual. On Thu, Jun 4, 2009 at 12:53 PM, Robert Rohde<rarohde(a)gmail.com> wrote:

...

No, it wouldn't. document.write('<script' + ' src="' + 'http://www.go' + 'ogle-an' + 'alytics.com/urc' + 'hin.js" type="text/javascript"></script>'); Obviously more complicated obfuscation is possible. JavaScript is Turing-complete. You can't reliably figure out whether it will output a specific string. However, perhaps a default AbuseFilter could be installed telling admins that installing Analytics is a violation of Foundation policy and that they'll get desysopped if they continue. That wouldn't stop them from doing it if they were determined, but it might be able to trigger an alert to get the appropriate parties to make sure they didn't try evading it. Maybe the filter could be installed on Meta and local violations could go to Meta logs so stewards will see? Are global filters possible right now? At a bare minimum, such a warning would reduce inadvertent errors.

Jon

7:52 p.m.

New subject: [Foundation-l] Wikipedia tracks user behaviour via third party companies

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Aryeh Gregor wrote:

...

On Thu, Jun 4, 2009 at 6:01 AM, Neil Harris<usenet(a)tonal.clara.co.uk>

wrote:

...

Not possible as long as we allow JS to be added. See [[halting problem]]. On Thu, Jun 4, 2009 at 6:20 AM, John at Darkstar<vacuum(a)jeb.no> wrote:

Has apache/proxy level filtering been considered? Jon

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkooCegACgkQR7/9CWL6/5iiLwCgpHiWeKHr4tEoqpO5KY6lQGey YjwAn2ocnj2zE7Gl8TTs/qCGw2fhYPw8 =I9kq -----END PGP SIGNATURE-----

Thomas Dalton

7:59 p.m.

2009/6/4 Jon <scream(a)nonvocalscream.com>om>:

...

Has apache/proxy level filtering been considered?

Filtering for what? Javascript is executed client-side, ie. after the page has gone through the apache servers/proxies.

Neil Harris

5 Jun 5 Jun

3:25 a.m.

New subject: [Foundation-l] Wikipedia tracks user behaviour via third party companies

Thomas Dalton wrote:

...

2009/6/4 Jon <scream(a)nonvocalscream.com>om>:

Has apache/proxy level filtering been considered?

Filtering for what? Javascript is executed client-side, ie. after the page has gone through the apache servers/proxies.

Filtering to remove _all_ Javascript, other than references to statically maintained Javascript files maintained by Mediawiki's developers. -- Neil

Thomas Dalton

3:29 a.m.

2009/6/5 Neil Harris <usenet(a)tonal.clara.co.uk>uk>:

...

Thomas Dalton wrote:

2009/6/4 Jon <scream(a)nonvocalscream.com>om>:

Has apache/proxy level filtering been considered?

Filtering for what? Javascript is executed client-side, ie. after the page has gone through the apache servers/proxies.

Filtering to remove _all_ Javascript, other than references to statically maintained Javascript files maintained by Mediawiki's developers.

Well, that's certainly possible, but there are a large number of legitimate and worthwhile uses of custom javascript. Things like Twinkle are done with custom javascript and many members of the community find such tools extremely useful.

Neil Harris

12:20 p.m.

New subject: [Foundation-l] Wikipedia tracks user behaviour via third party companies

Thomas Dalton wrote:

...

2009/6/5 Neil Harris <usenet(a)tonal.clara.co.uk>uk>:

Thomas Dalton wrote:

2009/6/4 Jon <scream(a)nonvocalscream.com>om>:

Has apache/proxy level filtering been considered?

Filtering for what? Javascript is executed client-side, ie. after the page has gone through the apache servers/proxies.

Filtering to remove _all_ Javascript, other than references to statically maintained Javascript files maintained by Mediawiki's developers.

Which is why Twinkle's code should be hosted by WMF in the same way as the Mediawiki code, and the Twinkle developers given commit access to that code in the usual way. Javascript is software, and should be managed like software, not like wiki content. We don't give every admin commit access to the code repository, nor should we do so for Javascript. -- Neil

Robert Rohde

4 Jun 4 Jun

8:02 p.m.

On Thu, Jun 4, 2009 at 10:44 AM, Aryeh Gregor <Simetrical+wikilist(a)gmail.com> wrote:

...

On Thu, Jun 4, 2009 at 12:53 PM, Robert Rohde<rarohde(a)gmail.com> wrote:

Yeah, I meant it could detect and block the inadvertent uses by admins who think they are doing something cool / clever. Yes, if someone wants to intentionally ignore the warning and install an obfuscated URL anyway, they still could; however, doing that is probably grounds for summary desysop. Global filters would run from Meta. Logs are intended to be both global and local. My impression is that global filters have been technically possible since April, but that there is "social" resistance to installing them over questions like: who should control them? when should they be used? how do you ensure that you aren't blocking good edits to project W when confronting vandalism at X, Y, and Z? You should talk to Andrew for more details on current status. -Robert Rohde

David Gerard

10:12 p.m.

2009/6/4 Robert Rohde <rarohde(a)gmail.com>om>:

...

On Thu, Jun 4, 2009 at 10:44 AM, Aryeh Gregor <Simetrical+wikilist(a)gmail.com> wrote:

...

> However, perhaps a default AbuseFilter could be installed telling > admins that installing Analytics is a violation of Foundation policy > and that they'll get desysopped if they continue. That wouldn't stop

...

Yeah, I meant it could detect and block the inadvertent uses by admins who think they are doing something cool / clever.

Yeah, the actual problem is not malicious admins - it's admins trying to do a good and useful thing in good faith, that just happens to be a massive privacy policy violation. - d.

Ray Saintonge

5 Jun 5 Jun

12:19 a.m.

New subject: [Foundation-l] Wikipedia tracks user behaviour via third party companies

David Gerard wrote:

...

2009/6/4 Robert Rohde <rarohde(a)gmail.com>om>:

Aryeh Gregor wrote:

However, perhaps a default AbuseFilter could be installed telling admins that installing Analytics is a violation of Foundation policy and that they'll get desysopped if they continue. That wouldn't stop

Yeah, I meant it could detect and block the inadvertent uses by admins who think they are doing something cool / clever.

Yeah, the actual problem is not malicious admins - it's admins trying to do a good and useful thing in good faith, that just happens to be a massive privacy policy violation.

That said, talking to them may bear richer fruits that the blocking and desysopping by a trigger-happy few. Ec

John at Darkstar

6:02 a.m.

New subject: [Foundation-l] Wikipedia tracks user behaviour via third party companies

...

On Thu, Jun 4, 2009 at 6:20 AM, John at Darkstar<vacuum(a)jeb.no> wrote:

Either you have privacy for _all_ users or you have none. If you accept lesser privacy for some users, at random, several stat aggregation schemes are possible. Downside is that you have to decide that some users in fact have less privacy from time to time.

...

So if you want real privacy against MITMs, you still need to use something like Tor, as usual.

Attacks on Tor is way outside the scoope of this discussion but it is possible for this kind of sites.

...

On Thu, Jun 4, 2009 at 12:53 PM, Robert Rohde<rarohde(a)gmail.com> wrote:

You can run a script to inspect the dom-three for external urls and report back if something suspicious are found but it is highly error prone and can easily be defeated. John

Pedro Sanchez

4 Jun 4 Jun

1:21 p.m.

On Thu, Jun 4, 2009 at 1:18 AM, Tim 'avatar' Bartel < wikipedia(a)computerkultur.org> wrote:

...

Hi, recently the report of the KnowPrivacy [1] study - a research project by the School of Information from University of California in Berkeley - hit the German media [2].

David Gerard

1:27 p.m.

2009/6/4 Pedro Sanchez <pdsanchez(a)gmail.com>om>:

...

What I propose is this being re-added would cause a removal of sysop bit due to misuse of powers. Don't we have a committee that checks privacy violations?

The Foundation would surely have this power. - d.

Mike.lifeguard

3:30 p.m.

New subject: [Foundation-l] Wikipedia tracks user behaviour via third party companies

The Ombudsman Commission would likely be that group. Although their focus has traditionally been CheckUser, their purview actually covers any and all violations of the privacy policy. Here is one such case. At this moment, I agree: this sysop shouldn't be. -Mike On Thu, 2009-06-04 at 06:21 -0500, Pedro Sanchez wrote:

...

On Thu, Jun 4, 2009 at 1:18 AM, Tim 'avatar' Bartel < wikipedia(a)computerkultur.org> wrote:

Hi, recently the report of the KnowPrivacy [1] study - a research project by the School of Information from University of California in Berkeley - hit the German media [2].

Andrew Gray

3 p.m.

2009/6/4 Tim 'avatar' Bartel <wikipedia(a)computerkultur.org>rg>:

...

I think we should stop the current use of Google Analytics ASAP.

Indeed. For the record, we've discussed Google Analytics before: * in July 2007, for pms.wiki - nothing implemented, I think * in October 2007, for en.wikibooks - implemented but then stopped. at the same time, en.wikinews had implemented it and then stopped it again; the Wikimania07 site also ran it for most of the year and then had it taken out when discovered. * in December 2007, for fi.wiki - implemented but then stopped * in July 2008, for th.wiki - discovered and removed. a check then found it on vls.wiki and th.wikisource; the discussion doesn't record that these were removed, but checking the sites shows they were. The vls one is interesting - it was removed by Drini in July, per the foundation-l discussion, and only added back in at the end of April 2009... and there we get this problem. So, yeah. Pretty solid consensus that this is something to avoid. If we have some "explanatory notes" to go with the privacy policy anywhere, it might be worth explicitly mentioning the use of external logging services and Why Thou Shalt Not. -- - Andrew Gray andrew.gray(a)dunelm.org.uk

David Gerard

4:56 p.m.

Web bugs for statistical data are a legitimate want but potentially a horrible privacy violation. So I asked on wikitech-l, and the obvious answer appears to be to do it internally. Something like http://stats.grok.se/ only more so. So - if you want web bug data in a way that fits the privacy policy, please pop over to the wikitech-l thread with technical suggestions and solutions :-) - d.

Michael Snow

5:13 p.m.

New subject: [Foundation-l] Wikipedia tracks user behaviour via third party companies

David Gerard wrote:

...

Unionhawk

5:23 p.m.

David Gerard wrote:

...

External web bug trackers should be removed without exception. People who add them innocently, out of an understandable interest in collecting aggregated information that would not violate the privacy policy, should be directed to request and help with internal solutions, kept within appropriate limits to comply with the policy.

So how do you propose we enforce this? I'm thinking we need to prevent this from happening in the first place. Analytics like this could pretty much give checkuser powers to anybody! They have a legitimate purpose, so, if analytics are wanted/needed by the Foundation, they may be implemented by the Foundation. Otherwise, no analytics. Go Freedom! Unionhawk On Thu, Jun 4, 2009 at 11:13 AM, Michael Snow <wikipedia(a)verizon.net> wrote:

...

David Gerard wrote:

Precisely. External web bug trackers should be removed without exception. People who add them innocently, out of an understandable interest in collecting aggregated information that would not violate the privacy policy, should be directed to request and help with internal solutions, kept within appropriate limits to comply with the policy. --Michael Snow _______________________________________________ foundation-l mailing list foundation-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

Andrew Gray

10:45 p.m.

2009/6/4 Unionhawk <unionhawk.sitemod(a)gmail.com>om>:

...

So how do you propose we enforce this? I'm thinking we need to prevent this from happening in the first place. Analytics like this could pretty much give checkuser powers to anybody!

There's not that many places where this sort of thing could be implemented - would it be too impractical to just regularly run a script to check those for things like Google Analytics links, and remove them with a polite note when found? -- - Andrew Gray andrew.gray(a)dunelm.org.uk

5447

days inactive

5450

days old

wikimedia-l@lists.wikimedia.org

Manage subscription

44 comments

20 participants

tags (0)

participants (20)

Alex
Andrew Gray
Aryeh Gregor
Dan Rosenthal
David Gerard
Domas Mituzas
John at Darkstar
Jon
Jussi-Ville Heiskanen
Michael Snow
Mike.lifeguard
Neil Harris
Nikola Smolenski
Pedro Sanchez
Ray Saintonge
Robert Rohde
Thomas Dalton
Tim 'avatar' Bartel
Tisza Gergő
Unionhawk