Message: 7 Date: Wed, 12 Oct 2011 11:07:54 -0300 From: Andrew Crawford acrawford@laetabilis.com Subject: Re: [Foundation-l] Image filtering without undermining the category system To: Wikimedia Foundation Mailing List foundation-l@lists.wikimedia.org Message-ID: <CAE0LbZ5M_iN2CiTaObubtWC8Zd3rAf4NDH+Y5+kX+0d=NYgqgw@mail.gmail.com
Content-Type: text/plain; charset=ISO-8859-1
In general I think this is the best and most practical proposal so far.
Hi Andrew,
Thanks I appreciate that.
Having filter users do the classifying is the only practical option. In my opinion, it is unfortunately still problematic.
- It is quite complicated from the user's point of view. Not only do they
have to register an account, but they have to find and understand these options. For the casual reader who just doesn't want to see any more penises, or pictures of Mohammed, that is quite a lot to ask. The effort it would take to implement a system like this might outweigh the benefit to the small number of readers who would actually go through this process.
Yes my wording of the options is not ideal, and I'm hoping we can make it more user friendly. But the process isn't very complex. If we create http://en.wikipedia.org/wiki/Special:Preferences#mw-prefsection-filter
It need be no more complex than http://en.wikipedia.org/wiki/Special:Preferences#mw-prefsection-watchlist
I'm pretty sure we can make it simpler than buying some censorship software with a credit card and then installing it on your PC.
- It is obviously subject to gaming. How long would it take 4chan to
figure out they can create new accounts, and start thumbs-upping newly-uploaded pictures of penises while mass thumbs-downing depictions of Mohammed?
Subject to gaming, well it's bound to be. But vulnerable to gaming, hopefully not. Fans of penises are welcome to add their preferences. That's why I didn't include the option "Hide all images except those that a fellow filterer has whitelisted".
If some people find naked bodies wholesome but crucifixes troubling, and others the reverse, then the filter will pick up on that as an easy scenario, and once you've indicated that you are happy to see one or the other it will start giving a high score to things that have been deemed objectionable to people who've made similar choices to you, or things that were deemed wholesome by people whose tastes run counter to yours. Conversely it will give low scores to images cleared by people whose tastes are highly similar to yours or to images objected to by people whose tastes are the reverse of yours.
- How can we prevent the use of this data for censorship purposes?
We prevent the use of this data for censorship by not releasing the knowledge base, only showing logged in users the results that are relevant to them, and not saying how we've come up with a score. If we only had a small number of images and a limited set of reasons why people could object to them then it would be simple to impute the data in our knowledge base, but we have a large and complex system, and some aspects would be inherently difficult to hack by automated weapons. An experienced human looking at an image with a filter score would sometimes be able to guess what common reasons had caused a filterer or filterers not to want to see it again, but a computer would struggle and often anyone but the filterer who'd applied that score would be baffled. If you had access to that individuals filter list it might be obvious that they were blocking images that triggered their vertigo, depicted people associated with a particular sports team or train engines that lacked a boiler. But without the context of knowing which filter lists an image was on it would be difficult to get meaningful information out of the system.
Would we
keep the reputation information of each image secret? I imagine many Wikipedians would want to access that data for legitimate editorial reasons.
Well of course any of the editors could themselves have the filter set on
and would know what the score was relative to their preferences. But otherwise the information would be secret. I don't see how we could give editors access to the reputation information without it leaking to censors, or indeed divulging it generally. Remember the person with vertigo might not want that publicly known, the pyromaniac who blocked images that might trigger their pyromania would almost certainly not want their filter to be public. As for "legitimate editorial reasons", I think it would be quite contentious if anyone started making editorial decisions based on the filter results, so best not to enable that - but I'll clarify that in the proposal
Thanks for your feedback
WereSpielChequers
Cheers,
Andrew (Thparkth) On Tue, Oct 11, 2011 at 5:55 PM, WereSpielChequers < werespielchequers@gmail.com> wrote:
OK in a spirit of compromise I have designed an Image filter which should meet most of the needs that people have expressed and resolve most of the objections that I'm aware of. Just as importantly it should actually
work.
http://meta.wikimedia.org/wiki/User:WereSpielChequers/filter
WereSpielChequers _______________________
Thanks for that and for your comments on
It need be no more complex than http://en.wikipedia.org/wiki/Special:Preferences#mw-prefsection-watchlist
In my opinion, it will need to be far simpler than that if it is to address the needs of the casual reader. We simply can't expect them to make any significant investment in understanding the process, unlike a casual editor who can be expected to make that investment. It needs to "just work". Primarily this is "just" an issue of interface design but that has not been a historical strong point for us ;)
Subject to gaming, well it's bound to be. But vulnerable to gaming, hopefully not. Fans of penises are welcome to add their preferences. That's why I didn't include the option "Hide all images except those that a fellow filterer has whitelisted".
Well, to be fair, you did initially include that option - but even without it, the system will be gameable as long as "Hide all new images unless they have been OK'd by a fellow filterer with similar preferences to me" exists as an option. Yet, that is the most powerful and potentially most popular option.
I had typed out a long description of precisely how this might be gamed here, but I started boring even myself, so let me just state the principle: if you create enough accounts that have preferences matching a particular statistical user cluster (easy to do by blocking all images of Mohammed), you can exercise disproportionate statistical control over which new images those real users see by having your army of ringers deliberately green-light new images.
Well of course any of the editors could themselves have the filter set on
and would know what the score was relative to their preferences. But otherwise the information would be secret. I don't see how we could give editors access to the reputation information without it leaking to censors, or indeed divulging it generally.
Although the per-user score might be the important one for the operation of the algorithm, it's obvious that per-image data will exist, or at least could be calculated. We *could* calculate statistics for each image like "likelihood of actually being seen by a viewer" without revealing any personal information. It would also be possible to show "likelihood of being seen by readers who have blocked image X" for image Y.
This information *could* be useful to censors, but if it was available only to logged-in users it would be practically difficult for them to access in an automated manner.
As for "legitimate editorial reasons", I think it would be quite contentious if anyone started making editorial decisions based on the filter results, so best not to enable that - but I'll clarify that in the proposal
I certainly don't think it would be more contentious than having the filter in the first place! There are certainly legitimate editorial reasons for wanting to know some of the information I mentioned. I personally believe that images are content, that they hold information, and that they are not mere decoration in an article. There are plenty of situations where the question "what percentage of readers of this page will not be shown this image" would inform the decision about whether to use the image at all, and inform decisions about the wording of the article copy.
Anyway, as I have said, I think your basic idea here is the only practical option if a filter is to be implemented. I am personally not in favour of doing so, for both idealistic and practical reasons (specifically, the amount of effort it will involve vs. the amount of benefit it will deliver, with special consideration for the fact that almost no one seems to have asked for an opt-in filter, but rather for Wikipedia just not to have those images in the first place).
Be all that as it may, if we are to have a filter, we need to have a working filter. We need to have the best filter we possibly can, with the least increase in workload for editors and the most functionality for readers. In my opinion the system you have outlined is not only the best option so far, but also the best option possible. I do encourage others to read your design (http://meta.wikimedia.org/wiki/User:WereSpielChequers/filter) and to get involved in figuring out how it might work.
Cheers,
Andrew (Thparkth)
Hoi, The category system is as far as I am concerned of little interest. It is as far as I am concerned not helpful for Selecting one from a bunch. It is a sick dog and it is in misery. Thanks, GerardM
On 14 October 2011 01:14, WereSpielChequers werespielchequers@gmail.com wrote:
Message: 7 Date: Wed, 12 Oct 2011 11:07:54 -0300 From: Andrew Crawford acrawford@laetabilis.com Subject: Re: [Foundation-l] Image filtering without undermining the category system To: Wikimedia Foundation Mailing List foundation-l@lists.wikimedia.org Message-ID: <CAE0LbZ5M_iN2CiTaObubtWC8Zd3rAf4NDH+Y5+kX+0d=NYgqgw@mail.gmail.com
Content-Type: text/plain; charset=ISO-8859-1
In general I think this is the best and most practical proposal so far.
Hi Andrew,
Thanks I appreciate that.
Having filter users do the classifying is the only practical option. In my opinion, it is unfortunately still problematic.
- It is quite complicated from the user's point of view. Not only do they
have to register an account, but they have to find and understand these options. For the casual reader who just doesn't want to see any more penises, or pictures of Mohammed, that is quite a lot to ask. The effort it would take to implement a system like this might outweigh the benefit to the small number of readers who would actually go through this process.
Yes my wording of the options is not ideal, and I'm hoping we can make it more user friendly. But the process isn't very complex. If we create http://en.wikipedia.org/wiki/Special:Preferences#mw-prefsection-filter
It need be no more complex than http://en.wikipedia.org/wiki/Special:Preferences#mw-prefsection-watchlist
I'm pretty sure we can make it simpler than buying some censorship software with a credit card and then installing it on your PC.
- It is obviously subject to gaming. How long would it take 4chan to
figure out they can create new accounts, and start thumbs-upping newly-uploaded pictures of penises while mass thumbs-downing depictions of Mohammed?
Subject to gaming, well it's bound to be. But vulnerable to gaming, hopefully not. Fans of penises are welcome to add their preferences. That's why I didn't include the option "Hide all images except those that a fellow filterer has whitelisted".
If some people find naked bodies wholesome but crucifixes troubling, and others the reverse, then the filter will pick up on that as an easy scenario, and once you've indicated that you are happy to see one or the other it will start giving a high score to things that have been deemed objectionable to people who've made similar choices to you, or things that were deemed wholesome by people whose tastes run counter to yours. Conversely it will give low scores to images cleared by people whose tastes are highly similar to yours or to images objected to by people whose tastes are the reverse of yours.
- How can we prevent the use of this data for censorship purposes?
We prevent the use of this data for censorship by not releasing the knowledge base, only showing logged in users the results that are relevant to them, and not saying how we've come up with a score. If we only had a small number of images and a limited set of reasons why people could object to them then it would be simple to impute the data in our knowledge base, but we have a large and complex system, and some aspects would be inherently difficult to hack by automated weapons. An experienced human looking at an image with a filter score would sometimes be able to guess what common reasons had caused a filterer or filterers not to want to see it again, but a computer would struggle and often anyone but the filterer who'd applied that score would be baffled. If you had access to that individuals filter list it might be obvious that they were blocking images that triggered their vertigo, depicted people associated with a particular sports team or train engines that lacked a boiler. But without the context of knowing which filter lists an image was on it would be difficult to get meaningful information out of the system.
Would we
keep the reputation information of each image secret? I imagine many Wikipedians would want to access that data for legitimate editorial reasons.
Well of course any of the editors could themselves have the filter set on
and would know what the score was relative to their preferences. But otherwise the information would be secret. I don't see how we could give editors access to the reputation information without it leaking to censors, or indeed divulging it generally. Remember the person with vertigo might not want that publicly known, the pyromaniac who blocked images that might trigger their pyromania would almost certainly not want their filter to be public. As for "legitimate editorial reasons", I think it would be quite contentious if anyone started making editorial decisions based on the filter results, so best not to enable that - but I'll clarify that in the proposal
Thanks for your feedback
WereSpielChequers
Cheers,
Andrew (Thparkth) On Tue, Oct 11, 2011 at 5:55 PM, WereSpielChequers < werespielchequers@gmail.com> wrote:
OK in a spirit of compromise I have designed an Image filter which should meet most of the needs that people have expressed and resolve most of the objections that I'm aware of. Just as importantly it should actually
work.
http://meta.wikimedia.org/wiki/User:WereSpielChequers/filter
WereSpielChequers _______________________
Thanks for that and for your comments on
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
On Sat, Nov 26, 2011 at 6:27 AM, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, The category system is as far as I am concerned of little interest. It is as far as I am concerned not helpful for Selecting one from a bunch. It is a sick dog and it is in misery. Thanks, GerardM
Maybe you have stopped using wikipedia as a reader altogether? I and many happy readers find great use for gategories. I read wikipedia pages on the average three pages a day, and I would say one in five of those page reads, I have to take recourse in the category system. The fact that I get about a 50% usefulnes rate out of the categories, doesn't mean that half the time they are useful for me.
Just because you think something about something, please, give a bit of a thought that you are not the supreme arbiter of what everybody should think. I do not object your stating your personal opinion. But it is just your personal opinion. Do you have references to any pages or other sources that support there is *any* significant portion of wikipedians or wikipedia readers who feel even close to the way you feel?
Or are you just a voice crying out in the wilderness?
wikimedia-l@lists.wikimedia.org