Whoops, didn't cc. The censorship discussion is around again, but this question is about the feasibility (in terms of not melting the servers) of users being able to block images from particular categories.
---------- Forwarded message ---------- From: David Gerard dgerard@gmail.com Date: 22 July 2010 21:01 Subject: Re: [Foundation-l] Discussion Questions for Potentially-Objectionable Content To: Wikimedia Foundation Mailing List foundation-l@lists.wikimedia.org
On 22 July 2010 20:10, Excirial wp.excirial@gmail.com wrote:
I would, however, strongly support a system that gives users a choice to censor if they wish. It should be possible to categorize commons in such a way that certain images can be blocked. For example, a user might choose to block "images of Muhammad", while allowing surgery related images (Others might swap there if they wish).
This is a perennial proposal. It's an idea I like, as it puts control in the hands of the viewer rather than third parties. All it requires is someone to code something that passes muster as being unlikely to melt the servers.
cc to wikitech-l - how feasible is something that allows users to stop display of arbitrary image categories and/or subcategories?
- d.
On Thu, Jul 22, 2010 at 4:02 PM, David Gerard dgerard@gmail.com wrote:
This is a perennial proposal. It's an idea I like, as it puts control in the hands of the viewer rather than third parties. All it requires is someone to code something that passes muster as being unlikely to melt the servers.
cc to wikitech-l - how feasible is something that allows users to stop display of arbitrary image categories and/or subcategories?
It's entirely feasible. I even have an outline written up:
http://www.mediawiki.org/wiki/User:Simetrical/Censorship
Maybe if I have time left after category sorting this summer, Wikimedia could have me do this.
2010/7/22 Aryeh Gregor Simetrical+wikilist@gmail.com:
It's entirely feasible. I even have an outline written up:
http://www.mediawiki.org/wiki/User:Simetrical/Censorship
Maybe if I have time left after category sorting this summer, Wikimedia could have me do this.
Note that, as Aryeh's proposal mentions, it'll have to rely on JavaScript in order to not totally kill our caching infrastructure. This means censoring will not work for users who have JavaScript disabled or use JavaScript-incapable browsers.
Roan Kattouw (Catrope)
On 7/22/10 2:10 PM, Roan Kattouw wrote:
Note that, as Aryeh's proposal mentions, it'll have to rely on JavaScript in order to not totally kill our caching infrastructure. This means censoring will not work for users who have JavaScript disabled or use JavaScript-incapable browsers.
Half of Aryeh's proposal is about generating CSS rules on the fly. But imagine if you could load content-blocking CSS like we have for skins today.
I imagine a system that lets you pick which categories to block, and then either creates or reuses a simple CSS file that can be cached forever.
So the CSS could set visibility:hidden for the offending content first. Then *if* the JS manages to run you get the click-through-to-view interface.
On 7/22/10 1:56 PM, Aryeh Gregor wrote:
On Thu, Jul 22, 2010 at 4:02 PM, David Gerarddgerard@gmail.com wrote:
This is a perennial proposal. It's an idea I like, as it puts control in the hands of the viewer rather than third parties. All it requires is someone to code something that passes muster as being unlikely to melt the servers.
cc to wikitech-l - how feasible is something that allows users to stop display of arbitrary image categories and/or subcategories?
It's entirely feasible. I even have an outline written up:
http://www.mediawiki.org/wiki/User:Simetrical/Censorship
Maybe if I have time left after category sorting this summer, Wikimedia could have me do this.
Interesting proposal. I think it's on the right track.
Pushing censorship to the browser means that we have to reimplement it where ever our content is viewed -- including mobile sites and other alternative ways of browsing Wikipedia and sister sites. But that seems like it's doable, particularly since you're exploiting CSS classes.
Blurring seems a bit deluxe to me -- it's probably adequate to just block the image and show something in its place with the same dimensions. (At Flickr, they use an image of greyish-black static for this).
But I think any proposal that works is going to look like yours, given the realities of how Wikimedia content is hosted.
On Thu, Jul 22, 2010 at 6:40 PM, Neil Kandalgaonkar neilk@wikimedia.org wrote:
Pushing censorship to the browser means that we have to reimplement it where ever our content is viewed -- including mobile sites and other alternative ways of browsing Wikipedia and sister sites. But that seems like it's doable, particularly since you're exploiting CSS classes.
It would be nice if we could do it on the server side, but it seems infeasible. Even if we didn't have to worry about cache fragmentation, we're still talking about serving many versions of a page based on user preference, so it won't work well on sites that aren't using MediaWiki. However, once we have good-quality categorization of offensive images, third parties could always do their own blocking.
Blurring seems a bit deluxe to me -- it's probably adequate to just block the image and show something in its place with the same dimensions. (At Flickr, they use an image of greyish-black static for this).
This was already pointed out. I just didn't update the proposal. Using a stock image rather than blurring is both safer (you can't see anything about the image), and easier to implement.
On Thu, Jul 22, 2010 at 6:56 PM, Neil Kandalgaonkar neilk@wikimedia.org wrote:
Half of Aryeh's proposal is about generating CSS rules on the fly. But imagine if you could load content-blocking CSS like we have for skins today.
I imagine a system that lets you pick which categories to block, and then either creates or reuses a simple CSS file that can be cached forever.
It can't be cached at all, if it depends on user preferences.
So the CSS could set visibility:hidden for the offending content first. Then *if* the JS manages to run you get the click-through-to-view interface.
So the fallback for no JS would be that some images mysteriously disappear with no reason given, and there's no way to access them? I'd personally be happy with that, but I think the anti-censorship people would have issues with it.
On 7/22/10 4:00 PM, Aryeh Gregor wrote:
On Thu, Jul 22, 2010 at 6:56 PM, Neil Kandalgaonkarneilk@wikimedia.org wrote:
I imagine a system that lets you pick which categories to block, and then either creates or reuses a simple CSS file that can be cached forever.
It can't be cached at all, if it depends on user preferences.
I meant cached in the same way that skins are cached. Not cached in the sense of having a completely static page.
But the more I think about it, my idea of splitting it into CSS and JS was probably misguided. There are a number of problems I won't get into here (particularly, your idea of global blacklists and personal whitelists is hard to do right).
On Thu, Jul 22, 2010 at 8:23 PM, Neil Kandalgaonkar neilk@wikimedia.org wrote:
I meant cached in the same way that skins are cached. Not cached in the sense of having a completely static page.
What do you mean when you talk about skins being cached?
On Fri, Jul 23, 2010 at 8:00 AM, Platonides Platonides@gmail.com wrote:
You would need to reparse on edit (which changes categories) all pages including the image. Even if the image comes from commons or another ForeignRepo. Not as easy, I think, but this is a long wanted proposal, see bug 8298/9616.
Ouch. You're right, I hadn't thought of that. The categories for each image can't be stored in the parsed text, that will mean updates don't propagate until the next reparse.
Okay, so how about this instead: when generating the page, the server retrieves all categories for all images on the page, then checks against site and user preferences to see if any need to be censored, and if so, stick a list into the <head>. That should work fine -- it should be okay to retrieve the categories (local and foreign) for each image on the page on every page load, right? However, the image loading would have to be revised somehow, to not delay the loading of uncensored images . . . as usual, IE is the problem here. Aside from older IE versions, we could use attribute selectors, img[src=http://......], assuming we can get the src easily on the server side for each image.
On 7/23/10 11:28 AM, Aryeh Gregor wrote:
On Thu, Jul 22, 2010 at 8:23 PM, Neil Kandalgaonkarneilk@wikimedia.org wrote:
I meant cached in the same way that skins are cached. Not cached in the sense of having a completely static page.
What do you mean when you talk about skins being cached?
What I mean is they aren't generated on the fly or anything. The related resources can be cached in Squid or on the client as simple URLs. Your proposal adds CSS rules dynamically, in JS.
Knowing the way mailing lists work, I don't think there's any point describing an idea that I now believe to be flawed, as I'll probably get a point by point rebuttal anyway.
I misunderstood some parts of your proposal, and I also was thinking that maybe CSS is a little more likely to work than JS.
Neil Kandalgaonkar wrote:
On 7/23/10 11:28 AM, Aryeh Gregor wrote:
On Thu, Jul 22, 2010 at 8:23 PM, Neil Kandalgaonkarneilk@wikimedia.org wrote:
I meant cached in the same way that skins are cached. Not cached in the sense of having a completely static page.
What do you mean when you talk about skins being cached?
What I mean is they aren't generated on the fly or anything. The related resources can be cached in Squid or on the client as simple URLs. Your proposal adds CSS rules dynamically, in JS.
Knowing the way mailing lists work, I don't think there's any point describing an idea that I now believe to be flawed, as I'll probably get a point by point rebuttal anyway.
I misunderstood some parts of your proposal, and I also was thinking that maybe CSS is a little more likely to work than JS.
Your answer is even more confusing. You can split the html pages in the wiki in two pieces: the content (the part that changes by editing the wiki source) and the chrome (anything else) that comes from the skin (sidebar, edit tabs, portlets...)
For users, the skin is not cached, it is generated on the fly. Whereas for anonymous users, it is cached as a static page in the squids (this is the reason wmf sites don't show p-personal for anons, so that all anonymous users share a single cache).
If the CSS classes are always in the page, you can change censoring level by loading a different stylesheets (eg. alternate stylesheets) Similarly, it could be handled by JavaScript and personal rules stored in localstorage.
It's probably just there, but I don't see where your "cached but dynamic" goes into the structure.
2010/7/24 Platonides Platonides@gmail.com:
Your answer is even more confusing. You can split the html pages in the wiki in two pieces: the content (the part that changes by editing the wiki source) and the chrome (anything else) that comes from the skin (sidebar, edit tabs, portlets...)
For users, the skin is not cached, it is generated on the fly. Whereas for anonymous users, it is cached as a static page in the squids (this is the reason wmf sites don't show p-personal for anons, so that all anonymous users share a single cache).
If the CSS classes are always in the page, you can change censoring level by loading a different stylesheets (eg. alternate stylesheets) Similarly, it could be handled by JavaScript and personal rules stored in localstorage.
It's probably just there, but I don't see where your "cached but dynamic" goes into the structure.
What Neil probably meant was the CSS and JS files in the /skins directory, which are static files and are cached aggressively.
Roan Kattouw (Catrope)
Aryeh Gregor wrote:
On Fri, Jul 23, 2010 at 8:00 AM, Platonides Platonides@gmail.com wrote:
You would need to reparse on edit (which changes categories) all pages including the image. Even if the image comes from commons or another ForeignRepo. Not as easy, I think, but this is a long wanted proposal, see bug 8298/9616.
Ouch. You're right, I hadn't thought of that. The categories for each image can't be stored in the parsed text, that will mean updates don't propagate until the next reparse.
Okay, so how about this instead: when generating the page, the server retrieves all categories for all images on the page, then checks against site and user preferences to see if any need to be censored, and if so, stick a list into the <head>. That should work fine --
Even if you remove censoring ability from anonymous users, you still need to purge from squid cache all pages that include the images when category changes.
it should be okay to retrieve the categories (local and foreign) for each image on the page on every page load, right?
There are some large galleries. For instance contains 746 files each with 4-5 categories. That's nearly 3000 categories being retrieved.
However, the image loading would have to be revised somehow, to not delay the loading of uncensored images . . . as usual, IE is the problem here. Aside from older IE versions, we could use attribute selectors, img[src=http://......], assuming we can get the src easily on the server side for each image.
Getting the urls for each image should be no problem for us. Note that we can ban from src beginning with the thumb folder, we don't need the actual thumb url. I find going that way over-verbose, though.
Aryeh Gregor wrote:
On Thu, Jul 22, 2010 at 4:02 PM, David Gerard dgerard@gmail.com wrote:
This is a perennial proposal. It's an idea I like, as it puts control in the hands of the viewer rather than third parties. All it requires is someone to code something that passes muster as being unlikely to melt the servers.
cc to wikitech-l - how feasible is something that allows users to stop display of arbitrary image categories and/or subcategories?
It's entirely feasible. I even have an outline written up:
http://www.mediawiki.org/wiki/User:Simetrical/Censorship
Maybe if I have time left after category sorting this summer, Wikimedia could have me do this.
You would need to reparse on edit (which changes categories) all pages including the image. Even if the image comes from commons or another ForeignRepo. Not as easy, I think, but this is a long wanted proposal, see bug 8298/9616.
wikitech-l@lists.wikimedia.org