Google web bugs in Mediawiki js from admins - technical workarounds? - Wikitech-l

List overview All Threads
Download

newer

Google web bugs in Mediawiki js from admins - technical workarounds?

older

Proyect Doogg - New Funtion for...

Re: [Wikitech-l] Hotlinking (was...

David Gerard

4 Jun 2009 4 Jun '09

7:19 a.m.

Keeping well-meaning admins from putting Google web bugs in the JavaScript is a game of whack-a-mole.

Are there any technical workarounds feasible? If not blocking the loading of external sites entirely (I understand hu:wp uses a web bug that isn't Google), perhaps at least listing the sites somewhere centrally viewable?

- d.

Show replies by date

Daniel Kinzler

4 Jun 4 Jun

7:31 a.m.

David Gerard schrieb:

...

Keeping well-meaning admins from putting Google web bugs in the JavaScript is a game of whack-a-mole.

Are there any technical workarounds feasible? If not blocking the loading of external sites entirely (I understand hu:wp uses a web bug that isn't Google), perhaps at least listing the sites somewhere centrally viewable?

Perhaps the solution would be to simply set up our own JS based usage tracker? There are a few options available http://en.wikipedia.org/wiki/List_of_web_analytics_software, and for starters, the backend could run on the toolserver.

Note that anything processing IP addresses will need special approval on the TS.

-- daniel

David Gerard

7:34 a.m.

2009/6/4 Daniel Kinzler daniel@brightbyte.de:

...

David Gerard schrieb:

...

...
Keeping well-meaning admins from putting Google web bugs in the JavaScript is a game of whack-a-mole. Are there any technical workarounds feasible? If not blocking the

...

Perhaps the solution would be to simply set up our own JS based usage tracker? There are a few options available http://en.wikipedia.org/wiki/List_of_web_analytics_software, and for starters, the backend could run on the toolserver. Note that anything processing IP addresses will need special approval on the TS.

If putting that on the toolserver passes privacy policy muster, that'd be an excellent solution. Then external site loading can be blocked.

(And if the toolservers won't melt in the process.)

- d.

Michael Rosenthal

7:49 a.m.

I suggest keep the bug on Wikimedia's servers and using a tool which relies on SQL databases. These could be shared with the toolserver where the "official" version of the analysis tool runs and users are enabled to run their own queries (so taking a tool with a good database structure would be nice). With that the toolserver users could set up their own cool tools on that data.

On Thu, Jun 4, 2009 at 4:34 PM, David Gerard dgerard@gmail.com wrote:

...

2009/6/4 Daniel Kinzler daniel@brightbyte.de:

...
David Gerard schrieb:

...
...
Keeping well-meaning admins from putting Google web bugs in the JavaScript is a game of whack-a-mole. Are there any technical workarounds feasible? If not blocking the

...
Perhaps the solution would be to simply set up our own JS based usage tracker? There are a few options available http://en.wikipedia.org/wiki/List_of_web_analytics_software, and for starters, the backend could run on the toolserver. Note that anything processing IP addresses will need special approval on the TS.

If putting that on the toolserver passes privacy policy muster, that'd be an excellent solution. Then external site loading can be blocked.

(And if the toolservers won't melt in the process.)

d.

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

David Gerard

7:53 a.m.

2009/6/4 Michael Rosenthal rosenthal3000@googlemail.com:

...

I suggest keep the bug on Wikimedia's servers and using a tool which relies on SQL databases. These could be shared with the toolserver where the "official" version of the analysis tool runs and users are enabled to run their own queries (so taking a tool with a good database structure would be nice). With that the toolserver users could set up their own cool tools on that data.

I understand the problem with stats before was that the stats server would melt under the load. Leon's old wikistats page sampled 1:1000. The current stats (on dammit.lt and served up nicely on http://stats.grok.se) are every hit, but I understand (Domas?) that it was quite a bit of work to get the firehose of data in such a form as not to melt the receiving server trying to process it.

OK, then the problem becomes: how to set up something like stats.grok.se feasibly internally for all the other data gathered from a hit? (Modulo stuff that needs to be blanked per privacy policy.)

- d.

Gregory Maxwell

8 a.m.

On Thu, Jun 4, 2009 at 10:53 AM, David Gerard dgerard@gmail.com wrote:

...

I understand the problem with stats before was that the stats server would melt under the load. Leon's old wikistats page sampled 1:1000. The current stats (on dammit.lt and served up nicely on http://stats.grok.se) are every hit, but I understand (Domas?) that it was quite a bit of work to get the firehose of data in such a form as not to melt the receiving server trying to process it.

OK, then the problem becomes: how to set up something like stats.grok.se feasibly internally for all the other data gathered from a hit? (Modulo stuff that needs to be blanked per privacy policy.)

What exactly are people looking for that isn't available from stats.grok.se that isn't a privacy concern?

I had assumed that people kept installing these bugs because they wanted source network break downs per-article and other clear privacy violations.

Finne Boonen

8:23 a.m.

On Thu, Jun 4, 2009 at 17:00, Gregory Maxwell gmaxwell@gmail.com wrote:

...

On Thu, Jun 4, 2009 at 10:53 AM, David Gerard dgerard@gmail.com wrote:

...
I understand the problem with stats before was that the stats server would melt under the load. Leon's old wikistats page sampled 1:1000. The current stats (on dammit.lt and served up nicely on http://stats.grok.se) are every hit, but I understand (Domas?) that it was quite a bit of work to get the firehose of data in such a form as not to melt the receiving server trying to process it.

OK, then the problem becomes: how to set up something like stats.grok.se feasibly internally for all the other data gathered from a hit? (Modulo stuff that needs to be blanked per privacy policy.)

What exactly are people looking for that isn't available from stats.grok.se that isn't a privacy concern?

I had assumed that people kept installing these bugs because they wanted source network break downs per-article and other clear privacy violations.

On top of views/page I'd be interested in keywords used, entry&exit points, path analysis when people are editing (do they save/leave/try to find help/...) #edit starts, #submitted edits that don't get saved.

henna

-- "Maybe you knew early on that your track went from point A to B, but unlike you I wasn't given a map at birth!" Alyssa, "Chasing Amy"

David Gerard

11:29 a.m.

2009/6/4 Finne Boonen hennar@gmail.com:

...

On Thu, Jun 4, 2009 at 17:00, Gregory Maxwell gmaxwell@gmail.com wrote:

...

...
What exactly are people looking for that isn't available from stats.grok.se that isn't a privacy concern? I had assumed that people kept installing these bugs because they wanted source network break downs per-article and other clear privacy violations.

...

On top of views/page I'd be interested in keywords used, entry&exit points, path analysis when people are editing (do they save/leave/try to find help/...) #edit starts, #submitted edits that don't get saved.

Path analysis is a big one. All that other stuff, if it won't violate privacy, would be fantastically useful to researchers, internal and external, in ways we won't have even thought of yet, and help us considerably to improve the projects.

(This would have to be given considerable thought from a security/hacker mindset - e.g. even with IPs stripped, listing user pages and user page edits would likely give away an identity. Talk pages may do the same. Those are just off the top of my head, I'm sure someone has already made a list of what they could work out even with IPs anonymised or even stripped.)

- d.

Stephen Bain

4:56 p.m.

On Fri, Jun 5, 2009 at 1:00 AM, Gregory Maxwellgmaxwell@gmail.com wrote:

...

What exactly are people looking for that isn't available from stats.grok.se that isn't a privacy concern?

A good question.

Related questions are: 1) what can't be built into stats.grok.se (or other services built on the same data)? 2) is there anything that really needs to be done by javascript?

Don't forget all of the currently available traffic analysis here:

http://stats.wikimedia.org/EN/VisitorsSampledLogRequests.htm

-- Stephen Bain stephen.bain@gmail.com

Daniel Kinzler

7:55 a.m.

Michael Rosenthal schrieb:

...

I suggest keep the bug on Wikimedia's servers and using a tool which relies on SQL databases. These could be shared with the toolserver where the "official" version of the analysis tool runs and users are enabled to run their own queries (so taking a tool with a good database structure would be nice). With that the toolserver users could set up their own cool tools on that data.

Well, the original problem is that wikipedia has so many page views, writing each to a database will simply melt that database. we are talking about 50000 hits per second. this is of course also true for the toolserver.

I was thinking about a solution that uses sampling, or would only be applied to specific pages or small projects. We hat something similar for the old wikicharts.

-- daniel

Neil Harris

7:59 a.m.

Michael Rosenthal wrote:

...

I suggest keep the bug on Wikimedia's servers and using a tool which relies on SQL databases. These could be shared with the toolserver where the "official" version of the analysis tool runs and users are enabled to run their own queries (so taking a tool with a good database structure would be nice). With that the toolserver users could set up their own cool tools on that data.

If Javascript was used to serve the bug, it would be quite easy to only load the bug some small fraction of the time, allowing a fair statistical sample of JS-enabled readers (who should, I hope, be fairly representative of the whole population) to be taken without melting down the servers.

I suspect the fact that most bots and spiders do not interpret Javascript, and would thus be excluded from participating in the traffic survey, could be regarded as an added bonus.

-- Neil

Mike.lifeguard

8:01 a.m.

New subject: Google web bugs in Mediawiki js from admins - technical workarounds?

On Thu, 2009-06-04 at 15:34 +0100, David Gerard wrote:

...

Then external site loading can be blocked.

Why do we need to block loading from all external sites? If there are specific & problematic ones (like google analytics) then why not block those?

-Mike

David Gerard

8:03 a.m.

2009/6/4 Mike.lifeguard mikelifeguard@fastmail.fm:

...

On Thu, 2009-06-04 at 15:34 +0100, David Gerard wrote:

...

...
Then external site loading can be blocked.

...

Why do we need to block loading from all external sites? If there are specific & problematic ones (like google analytics) then why not block those?

Because having the data go outside Wikimedia at all is a privacy policy violation, as I understand it (please correct me if I'm wrong).

- d.

Daniel Kinzler

8:07 a.m.

David Gerard schrieb:

...

2009/6/4 Mike.lifeguard mikelifeguard@fastmail.fm:

...
On Thu, 2009-06-04 at 15:34 +0100, David Gerard wrote:

...
...
Then external site loading can be blocked.

...
Why do we need to block loading from all external sites? If there are specific & problematic ones (like google analytics) then why not block those?

Because having the data go outside Wikimedia at all is a privacy policy violation, as I understand it (please correct me if I'm wrong).

I agree with that, *especially* if it's for the purpose of aggregating data about users.

-- daniel

Gregory Maxwell

8:07 a.m.

On Thu, Jun 4, 2009 at 11:01 AM, Mike.lifeguard mikelifeguard@fastmail.fm wrote:

...

On Thu, 2009-06-04 at 15:34 +0100, David Gerard wrote:

...
Then external site loading can be blocked.

Why do we need to block loading from all external sites? If there are specific & problematic ones (like google analytics) then why not block those?

Because:

(1) External loading results in an uncontrolled leak of private reader and editor information to third parties, in contravention of the privacy policy as well as basic ethical operating principles.

(1a) most external loading script usage will also defeat users choice of SSL and leak more information about their browsing to their local network. It may also bypass any wikipedia specific anonymization proxies they are using to keep their reading habits private.

(2) External loading produces a runtime dependency on third party sites. Some other site goes down and our users experience some kind of loss of service.

(3) The availability of external loading makes Wikimedia a potential source of very significant DDOS attacks, intentional or otherwise.

Thats not to say that there aren't reasons to use remote loading, but the potential harms mean that it should probably be a default-deny permit-by-exception process rather than the other way around.

Mike.lifeguard

9:11 a.m.

New subject: Google web bugs in Mediawiki js from admins - technical workarounds?

Thanks, that clarifies matters for me. I wasn't aware of #1, though I guess upon reflection that makes sense.

-Mike

On Thu, 2009-06-04 at 11:07 -0400, Gregory Maxwell wrote:

...

On Thu, Jun 4, 2009 at 11:01 AM, Mike.lifeguard mikelifeguard@fastmail.fm wrote:

...
On Thu, 2009-06-04 at 15:34 +0100, David Gerard wrote:

...
Then external site loading can be blocked.

Why do we need to block loading from all external sites? If there are specific & problematic ones (like google analytics) then why not block those?

Because:

(1) External loading results in an uncontrolled leak of private reader and editor information to third parties, in contravention of the privacy policy as well as basic ethical operating principles.

(1a) most external loading script usage will also defeat users choice of SSL and leak more information about their browsing to their local network. It may also bypass any wikipedia specific anonymization proxies they are using to keep their reading habits private.

(2) External loading produces a runtime dependency on third party sites. Some other site goes down and our users experience some kind of loss of service.

(3) The availability of external loading makes Wikimedia a potential source of very significant DDOS attacks, intentional or otherwise.

Thats not to say that there aren't reasons to use remote loading, but the potential harms mean that it should probably be a default-deny permit-by-exception process rather than the other way around.

Thomas Dalton

8:44 a.m.

2009/6/4 Mike.lifeguard mikelifeguard@fastmail.fm:

...

On Thu, 2009-06-04 at 15:34 +0100, David Gerard wrote:

...
Then external site loading can be blocked.

Why do we need to block loading from all external sites? If there are specific & problematic ones (like google analytics) then why not block those?

I can't think of any time when we would need to load anything from an external site, so why not block them completely and eliminate the privacy concern entirely?

Brian

10:01 a.m.

How does installing 3rd party analytics software help the WMF accomplish its goals?

On Thu, Jun 4, 2009 at 8:31 AM, Daniel Kinzler daniel@brightbyte.de wrote:

...

David Gerard schrieb:

...
Keeping well-meaning admins from putting Google web bugs in the JavaScript is a game of whack-a-mole.

Are there any technical workarounds feasible? If not blocking the loading of external sites entirely (I understand hu:wp uses a web bug that isn't Google), perhaps at least listing the sites somewhere centrally viewable?

Perhaps the solution would be to simply set up our own JS based usage tracker? There are a few options available http://en.wikipedia.org/wiki/List_of_web_analytics_software, and for starters, the backend could run on the toolserver.

Note that anything processing IP addresses will need special approval on the TS.

-- daniel

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

David Gerard

11:34 a.m.

2009/6/4 Brian Brian.Mingus@colorado.edu:

...

How does installing 3rd party analytics software help the WMF accomplish its goals?

Detailed analysis of how users actually use the site would be vastly useful in improving the sites' content and usability.

- d.

Brian

1:48 p.m.

That's why WMF now has a usability lab.

On Thu, Jun 4, 2009 at 12:34 PM, David Gerard dgerard@gmail.com wrote:

...

2009/6/4 Brian Brian.Mingus@colorado.edu:

...
How does installing 3rd party analytics software help the WMF accomplish

its

...
goals?

Detailed analysis of how users actually use the site would be vastly useful in improving the sites' content and usability.

d.

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

David Gerard

1:51 p.m.

2009/6/4 Brian Brian.Mingus@colorado.edu:

...

That's why WMF now has a usability lab.

Yep. They'd dive on this stuff with great glee if we can implement it without breaking privacy or melting servers.

- d.

Gregory Maxwell

7:52 a.m.

On Thu, Jun 4, 2009 at 10:19 AM, David Gerard dgerard@gmail.com wrote:

...

Keeping well-meaning admins from putting Google web bugs in the JavaScript is a game of whack-a-mole.

Are there any technical workarounds feasible? If not blocking the loading of external sites entirely (I understand hu:wp uses a web bug that isn't Google), perhaps at least listing the sites somewhere centrally viewable?

Restrict site-wide JS and raw HTML injection to a smaller subset of users who have been specifically schooled in these issues.

This approach is also compatible with other approaches. It has the advantage of being simple to implement and should produce a considerable reduction in problems regardless of the underlying cause.

Just be glad no one has yet turned english wikipedia's readers into their own personal DDOS drone network.

David Gerard

8:02 a.m.

2009/6/4 Gregory Maxwell gmaxwell@gmail.com:

...

Restrict site-wide JS and raw HTML injection to a smaller subset of users who have been specifically schooled in these issues.

Is it feasible to allow admins to use raw HTML as appropriate but not raw JS? Being able to fix MediaWiki: space messages with raw HTML is way too useful on the occasions where it's useful.

- d.

Daniel Kinzler

8:08 a.m.

David Gerard schrieb:

...

2009/6/4 Gregory Maxwell gmaxwell@gmail.com:

...
Restrict site-wide JS and raw HTML injection to a smaller subset of users who have been specifically schooled in these issues.

Is it feasible to allow admins to use raw HTML as appropriate but not raw JS? Being able to fix MediaWiki: space messages with raw HTML is way too useful on the occasions where it's useful.

Possible yes, sensible no. Because if you can edit raw html, you can inject javascript.

-- daniel

Neil Harris

8:18 a.m.

Daniel Kinzler wrote:

...

David Gerard schrieb:

...
2009/6/4 Gregory Maxwell gmaxwell@gmail.com:

...
Restrict site-wide JS and raw HTML injection to a smaller subset of users who have been specifically schooled in these issues.

Is it feasible to allow admins to use raw HTML as appropriate but not raw JS? Being able to fix MediaWiki: space messages with raw HTML is way too useful on the occasions where it's useful.

Possible yes, sensible no. Because if you can edit raw html, you can inject javascript.

-- daniel

Not if you sanitize the HTML after the fact: just cleaning out <script> tags and elements from the HTML stream should do the job.

After this has been done to the user-generated content, the desired locked-down script code can then be inserted at the final stages of page generation.

-- Neil

Neil Harris

8:56 a.m.

Neil Harris wrote:

...

Daniel Kinzler wrote:

...
David Gerard schrieb:

...
2009/6/4 Gregory Maxwell gmaxwell@gmail.com:

...
Restrict site-wide JS and raw HTML injection to a smaller subset of users who have been specifically schooled in these issues.

Is it feasible to allow admins to use raw HTML as appropriate but not raw JS? Being able to fix MediaWiki: space messages with raw HTML is way too useful on the occasions where it's useful.

Possible yes, sensible no. Because if you can edit raw html, you can inject javascript.

-- daniel

Not if you sanitize the HTML after the fact: just cleaning out <script> tags and elements from the HTML stream should do the job.

After this has been done to the user-generated content, the desired locked-down script code can then be inserted at the final stages of page generation.

-- Neil

Come to think of it, you could also allow the carefully vetted loading of scripts from a very limited whitelist of Wikimedia-hosted and controlled domains and paths, when performing that sanitization.

Inline scripts remain a bad idea: there are just too many ways to obfuscate them and/or inject data into them to have any practical prospect of limiting them to safe features without heroic efforts.

However; writing a javascript sanitizer that restricted the user to a "safe" subset of the language, by first parsing and then resynthesizing the code using formal methods for validation, in a way similar to the current solution for TeX, would be an interesting project!

-- Neil

Andrew Garrett

9:04 a.m.

On 04/06/2009, at 4:08 PM, Daniel Kinzler wrote:

...

David Gerard schrieb:

...
2009/6/4 Gregory Maxwell gmaxwell@gmail.com:

...
Restrict site-wide JS and raw HTML injection to a smaller subset of users who have been specifically schooled in these issues.

Is it feasible to allow admins to use raw HTML as appropriate but not raw JS? Being able to fix MediaWiki: space messages with raw HTML is way too useful on the occasions where it's useful.

Possible yes, sensible no. Because if you can edit raw html, you can inject javascript.

When did we start treating our administrators as potentially malicious attackers? Any administrator could, in theory, add a cookie-stealing script to my user JS, steal my account, and grant themselves any rights they please.

We trust our administrators. If we don't, we should move the editinterface right further up the chain.

-- Andrew Garrett Contract Developer, Wikimedia Foundation agarrett@wikimedia.org http://werdn.us

Mike.lifeguard

9:23 a.m.

New subject: Google web bugs in Mediawiki js from admins - technical workarounds?

On Thu, 2009-06-04 at 17:04 +0100, Andrew Garrett wrote:

...

When did we start treating our administrators as potentially malicious attackers? Any administrator could, in theory, add a cookie-stealing script to my user JS, steal my account, and grant themselves any rights they please.

We trust our administrators. If we don't, we should move the editinterface right further up the chain.

They are potentially malicious attackers, but we nevertheless trust them not to do bad things. "We" in this case refers only to most of Wikimedia, I guess, since there has been no shortage of paranoia both on bugzilla and this list recently - a sad state of affairs to be sure.

-Mike

Gregory Maxwell

9:45 a.m.

On Thu, Jun 4, 2009 at 12:04 PM, Andrew Garrett agarrett@wikimedia.org wrote:

...

...
...
Is it feasible to allow admins to use raw HTML as appropriate but not raw JS? Being able to fix MediaWiki: space messages with raw HTML is way too useful on the occasions where it's useful.

Possible yes, sensible no. Because if you can edit raw html, you can inject javascript.

When did we start treating our administrators as potentially malicious attackers? Any administrator could, in theory, add a cookie-stealing script to my user JS, steal my account, and grant themselves any rights they please.

We trust our administrators. If we don't, we should move the editinterface right further up the chain.

90% of possibly malicious the things administrators could possibly do is easily un-doable. Most of the remainder is limited in scope to impacting few users at a time. Site wide JS can cause irreparable harm to users privacy and do it to hundreds of thousands in an instant.

Outside of raw html and JS no other admin feature grants the ability to completely disable third party sites.

And forget malice— there is no reason for admins to add remote loading external resources. Any such addition is going to violate the privacy policy. Yet it keeps happening.

You don't have to be malicious to be completely unaware of the privacy implications or of the potential DOS risks. You don't have to be malicious to use the same sloppy editing practices which are used on easily and instantly revertible articles while editing site messages and JS (Caching ensures that many JS and message mistakes aren't completely undone for many hours). Though we shouldn't preclude the possibility of occasional malice: It isn't as though we haven't had admin's choose easily guessable passwords in the past, or admins flip their lids and attempt to cause problems.

In places where the harm is confined and can be undone softer security measures make sense. As the destructive and distractive power increases the appropriate level of security also increases.

We impose stiffer regulation for access permissions like checkuser… even though an admins ability to add webbugs is significantly more powerful a privacy invasion tool than checkuser. (Checkusers can't see typical reader activities!)

Raw HTML and JS have drastically different implications than most other 'admin' functions. Accordingly the optimal security behaviour is different. When there are few enough admins the problems are infrequent enough to ignore, but as things grow...

The number of uses of site-wide JS and Raw HTML are fairly limited. As are the number of users with the technical skills required to use them correctly. Arguably every instance of user manipulation of raw HTML and site-wide JS is a deficiency in mediawiki.

Regarding HTML sanitation: Raw HTML alone without JS is enough to violate users privacy: Just add a hidden image tag to a remote site. Yes you could sanitize out various bad things, but then thats not raw HTML anymore, is it?

I think the biggest problem to reducing accesses is that far more mediawiki messages are uncooked than is needed. Were it not for this I expect this access would have been curtailed somewhat a long time ago.

Aryeh Gregor

11:31 a.m.

On Thu, Jun 4, 2009 at 11:56 AM, Neil Harrisusenet@tonal.clara.co.uk wrote:

...

However; writing a javascript sanitizer that restricted the user to a "safe" subset of the language, by first parsing and then resynthesizing the code using formal methods for validation, in a way similar to the current solution for TeX, would be an interesting project!

Interesting, but probably not very useful. If we restricted JavaScript the way we restricted TeX, we'd have to ban function definitions, loops, conditionals, and most function calls. I suspect you'd have to make it pretty much unusable to make output of specific strings impossible.

On Thu, Jun 4, 2009 at 12:45 PM, Gregory Maxwellgmaxwell@gmail.com wrote:

...

Regarding HTML sanitation: Raw HTML alone without JS is enough to violate users privacy: Just add a hidden image tag to a remote site. Yes you could sanitize out various bad things, but then thats not raw HTML anymore, is it?

It might be good enough for the purposes at hand, though. What are the use-cases for wanting raw HTML in messages, instead of wikitext or plaintext?

jidanni＠jidanni.org

3:21 p.m.

Actually you can take a lesson from Google, and every once in a while prefix all links e.g., "http://en.wikipedia.org/url?http://en.wikipedia.org/wiki/Norflblarg". (some kind of recording redirector). How do I know Google does this every so often in their search results pages? I use DontGet{:*://*.google.*/url?*} in my wwwoffle.conf file, so the alarm bells ring, whereas the average user would never notice the difference.

David Gerard

11:32 a.m.

2009/6/4 Gregory Maxwell gmaxwell@gmail.com:

...

I think the biggest problem to reducing accesses is that far more mediawiki messages are uncooked than is needed. Were it not for this I expect this access would have been curtailed somewhat a long time ago.

I think you've hit the actual problem there. Someone with too much time on their hands who could go through all of the MediaWiki: space to see what really needs to be HTML rather than wikitext?

- d.

Chad

11:44 a.m.

On Thu, Jun 4, 2009 at 2:32 PM, David Gerard dgerard@gmail.com wrote:

...

2009/6/4 Gregory Maxwell gmaxwell@gmail.com:

...
I think the biggest problem to reducing accesses is that far more mediawiki messages are uncooked than is needed. Were it not for this I expect this access would have been curtailed somewhat a long time ago.

I think you've hit the actual problem there. Someone with too much time on their hands who could go through all of the MediaWiki: space to see what really needs to be HTML rather than wikitext?

d.

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

See bug 212[1], which is (sort of) a tracker for the wikitext-ification of the messages.

-Chad

[1] https://bugzilla.wikimedia.org/show_bug.cgi?id=212

David Gerard

11:30 a.m.

2009/6/4 Andrew Garrett agarrett@wikimedia.org:

...

When did we start treating our administrators as potentially malicious attackers? Any administrator could, in theory, add a cookie-stealing script to my user JS, steal my account, and grant themselves any rights they please.

That's why I started this thread talking about things being done right now by well-meaning admins :-)

- d.

John at Darkstar

10:17 p.m.

You don't have to inject javascript to do user tracking. This is possible with all kind of raw html that leads to inclusion of external elements, including style defs for ordinary markup. John

Daniel Kinzler skrev:

...

David Gerard schrieb:

...
2009/6/4 Gregory Maxwell gmaxwell@gmail.com:

...
Restrict site-wide JS and raw HTML injection to a smaller subset of users who have been specifically schooled in these issues.

Is it feasible to allow admins to use raw HTML as appropriate but not raw JS? Being able to fix MediaWiki: space messages with raw HTML is way too useful on the occasions where it's useful.

Possible yes, sensible no. Because if you can edit raw html, you can inject javascript.

-- daniel

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Aryeh Gregor

5 Jun 5 Jun

6:17 a.m.

On Fri, Jun 5, 2009 at 1:17 AM, John at Darkstarvacuum@jeb.no wrote:

...

You don't have to inject javascript to do user tracking. This is possible with all kind of raw html that leads to inclusion of external elements, including style defs for ordinary markup.

The point on CSS is a good one. The site-wide stylesheets can use url(), which would be enough for tracking even with no raw HTML or JS allowed. Of course, JavaScript can allow more extensive and useful tracking, which is why it's used by things like Analytics.

Platonides

4:02 p.m.

David Gerard wrote:

...

Keeping well-meaning admins from putting Google web bugs in the JavaScript is a game of whack-a-mole.

Are there any technical workarounds feasible? If not blocking the loading of external sites entirely (I understand hu:wp uses a web bug that isn't Google), perhaps at least listing the sites somewhere centrally viewable?

d.

Make a filter of google analytics (plus all other known web bugs). When it matches, make an admin look at that user contributions (you can just log it and have the admin review it daily or make a complex mail notification system). Even if you blocked it, the admin can bypass any filter. A sysadmin reviewing the code added cannot be fooled so easily.

Taking advantage of this thread. Take a look at http://es.wikipedia.org/wiki/Special:Search Can those images be moved to a server under our control (WMF, Toolserver...)?

Potentially, those systems they could be retrieving the search queries and ips of all visitors via the referer.

Originally, some of those images were hosted by WMF-FR but was stopped because it overloaded their server.

K. Peachey

6 Jun 6 Jun

6:51 a.m.

On Sat, Jun 6, 2009 at 9:02 AM, PlatonidesPlatonides@gmail.com wrote:

...

David Gerard wrote:

...
Keeping well-meaning admins from putting Google web bugs in the JavaScript is a game of whack-a-mole.

Are there any technical workarounds feasible? If not blocking the loading of external sites entirely (I understand hu:wp uses a web bug that isn't Google), perhaps at least listing the sites somewhere centrally viewable?

d.

Make a filter of google analytics (plus all other known web bugs). When it matches, make an admin look at that user contributions (you can just log it and have the admin review it daily or make a complex mail notification system). Even if you blocked it, the admin can bypass any filter. A sysadmin reviewing the code added cannot be fooled so easily.

Taking advantage of this thread. Take a look at http://es.wikipedia.org/wiki/Special:Search Can those images be moved to a server under our control (WMF, Toolserver...)?

Potentially, those systems they could be retrieving the search queries and ips of all visitors via the referer.

Originally, some of those images were hosted by WMF-FR but was stopped because it overloaded their server.

They should be hosted like a standard image with a FUR and then inculded via a interface message, then the servers should cache them so load shouldn't cause a issue. (although it might be wiser to ask someone better at mediawiki/wikipedia about that)

Platonides

12:31 p.m.

K. Peachey wrote:

...

...
Taking advantage of this thread. Take a look at http://es.wikipedia.org/wiki/Special:Search Can those images be moved to a server under our control (WMF, Toolserver...)?

Potentially, those systems they could be retrieving the search queries and ips of all visitors via the referer.

Originally, some of those images were hosted by WMF-FR but was stopped because it overloaded their server.

They should be hosted like a standard image with a FUR and then inculded via a interface message, then the servers should cache them so load shouldn't cause a issue. (although it might be wiser to ask someone better at mediawiki/wikipedia about that)

eswiki doesn't allow Fair Use images. It doesn't even allow loacl uploads. I could upload it at commons, but would be deleted in hours. Uploading at enwiki might be an option, but not a good one. English Wikipedia images are not there to serve other projects, and although they could survive more time would finally be deleted.

Another option would be the toolserver but: a) I don't have an account there (yet). b) I was afraid of producing too much load for the toolserver. Also, it might need a special provision, per Duesentrieb's email (that may need to be clarified at the rules page).

Gregory Maxwell

12:40 p.m.

On Sat, Jun 6, 2009 at 3:31 PM, Platonides Platonides@gmail.com wrote:

...

eswiki doesn't allow Fair Use images. It doesn't even allow loacl uploads. I could upload it at commons, but would be deleted in hours. Uploading at enwiki might be an option, but not a good one. English Wikipedia images are not there to serve other projects, and although they could survive more time would finally be deleted.

Another option would be the toolserver but: a) I don't have an account there (yet). b) I was afraid of producing too much load for the toolserver. Also, it might need a special provision, per Duesentrieb's email (that may need to be clarified at the rules page).

It makes absolutely no sense for ES wiki to forbid a particular kind of image but then allow it to be inlined unless its some file the WMF couldn't legally host (and also wouldn't forbid as a part of the WP content).

One of the main purposes in restricting non-free materials is to keep the content of the WP freely available… from the prospective of a user of Wikipedia a remotely loaded inline image is equivalent to a locally hosted image, except that the remotely loaded image violates their privacy.

If ES wouldn't permit the upload for this purpose then they ought not permit the inline injection.

Platonides

2:05 p.m.

Gregory Maxwell wrote:

...

It makes absolutely no sense for ES wiki to forbid a particular kind of image but then allow it to be inlined unless its some file the WMF couldn't legally host (and also wouldn't forbid as a part of the WP content).

One of the main purposes in restricting non-free materials is to keep the content of the WP freely available… from the prospective of a user of Wikipedia a remotely loaded inline image is equivalent to a locally hosted image, except that the remotely loaded image violates their privacy.

If ES wouldn't permit the upload for this purpose then they ought not permit the inline injection.

That's a reasonable point. It's a matter of discussion for the community, though. I agree that those remotely loaded images are undesirable. But the inline images at the search page are conceptually different from those at articles. The images at the search page "belong" to the skin. Not to mention that uploading into a wiki where they shouldn't otherwise be allowed, will inevitably lead to someone trying to use them on articles.

Gregory Maxwell

7 Jun 7 Jun

9:22 a.m.

On Sat, Jun 6, 2009 at 5:05 PM, PlatonidesPlatonides@gmail.com wrote:

...

Gregory Maxwell wrote:

...
It makes absolutely no sense for ES wiki to forbid a particular kind of image but then allow it to be inlined unless its some file the WMF couldn't legally host (and also wouldn't forbid as a part of the WP content).

One of the main purposes in restricting non-free materials is to keep the content of the WP freely available… from the prospective of a user of Wikipedia a remotely loaded inline image is equivalent to a locally hosted image, except that the remotely loaded image violates their privacy.

If ES wouldn't permit the upload for this purpose then they ought not permit the inline injection.

That's a reasonable point. It's a matter of discussion for the community, though. I agree that those remotely loaded images are undesirable. But the inline images at the search page are conceptually different from those at articles. The images at the search page "belong" to the skin. Not to mention that uploading into a wiki where they shouldn't otherwise be allowed, will inevitably lead to someone trying to use them on articles.

Bad image list is your friend.

5689

Age (days ago)

5692

Last active (days ago)

wikitech-l@lists.wikimedia.org

41 comments

17 participants

tags (0)

participants (17)

Andrew Garrett
Aryeh Gregor
Brian
Chad
Daniel Kinzler
David Gerard
Finne Boonen
Gregory Maxwell
jidanni＠jidanni.org
John at Darkstar
K. Peachey
Michael Rosenthal
Mike.lifeguard
Neil Harris
Platonides
Stephen Bain
Thomas Dalton