This paper (first reference) is the result of a class project I was part of
almost two years ago for CSCI 5417 Information Retrieval Systems. It builds
on a class project I did in CSCI 5832 Natural Language Processing and which
I presented at Wikimania '07. The project was very late as we didn't send
the final paper in until the day before new years. This technical report was
never really announced that I recall so I thought it would be interesting to
look briefly at the results. The goal of this paper was to break articles
down into surface features and latent features and then use those to study
the rating system being used, predict article quality and rank results in a
search engine. We used the [[random forests]] classifier which allowed us to
analyze the contribution of each feature to performance by looking directly
at the weights that were assigned. While the surface analysis was performed
on the whole english wikipedia, the latent analysis was performed on the
simple english wikipedia (it is more expensive to compute). = Surface
features = * Readability measures are the single best predictor of quality
that I have found, as defined by the Wikipedia Editorial Team (WET). The
[[Automated Readability Index]], [[Gunning Fog Index]] and [[Flesch-Kincaid
Grade Level]] were the strongest predictors, followed by length of article
html, number of paragraphs, [[Flesh Reading Ease]], [[Smog Grading]], number
of internal links, [[Laesbarhedsindex Readability Formula]], number of words
and number of references. Weakly predictive were number of to be's, number
of sentences, [[Coleman-Liau Index]], number of templates, PageRank, number
of external links, number of relative links. Not predictive (overall - see
the end of section 2 for the per-rating score breakdown): Number of h2 or
h3's, number of conjunctions, number of images*, average word length, number
of h4's, number of prepositions, number of pronouns, number of interlanguage
links, average syllables per word, number of nominalizations, article age
(based on page id), proportion of questions, average sentence length. :*
Number of images was actually by far the single strongest predictor of any
class, but only for Featured articles. Because it was so good at picking out
featured articles and somewhat good at picking out A and G articles the
classifier was confused in so many cases that the overall contribution of
this feature to classification performance is zero. :* Number of external
links is strongly predictive of Featured articles. :* The B class is highly
distinctive. It has a strong "signature," with high predictive value
assigned to many features. The Featured class is also very distinctive. F, B
and S (Stop/Stub) contain the most information.
:* A is the least distinct class, not being very different from F or G. =
Latent features = The algorithm used for latent analysis, which is an
analysis of the occurence of words in every document with respect to the
link structure of the encyclopedia ("concepts"), is [[Latent Dirichlet
Allocation]]. This part of the analysis was done by CS PhD student Praful
Mangalath. An example of what can be done with the result of this analysis
is that you provide a word (a search query) such as "hippie". You can then
look at the weight of every article for the word hippie. You can pick the
article with the largest weight, and then look at its link network. You can
pick out the articles that this article links to and/or which link to this
article that are also weighted strongly for the word hippie, while also
contributing maximally to this articles "hippieness". We tried this query in
our system (LDA), Google (site:en.wikipedia.org hippie), and the Simple
English Wikipedia's Lucene search engine. The breakdown of articles occuring
in the top ten search results for this word for those engines is: * LDA
only: [[Acid rock]], [[Aldeburgh Festival]], [[Anne Murray]], [[Carl
Radle]], [[Harry Nilsson]], [[Jack Kerouac]], [[Phil Spector]], [[Plastic
Ono Band]], [[Rock and Roll]], [[Salvador Allende]], [[Smothers brothers]],
[[Stanley Kubrick]]. * Google only: [[Glam Rock]], [[South Park]]. * Simple
only: [[African Americans]], [[Charles Manson]], [[Counterculture]], [[Drug
use]], [[Flower Power]], [[Nuclear weapons]], [[Phish]], [[Sexual
liberation]], [[Summer of Love]] * LDA & Google & Simple: [[Hippie]],
[[Human Be-in]], [[Students for a democratic society]], [[Woodstock
festival]] * LDA & Google: [[Psychedelic Pop]] * Google & Simple: [[Lysergic
acid diethylamide]], [[Summer of Love]] ( See the paper for the articles
produced for the keywords philosophy and economics ) = Discussion /
Conclusion = * The results of the latent analysis are totally up to your
perception. But what is interesting is that the LDA features predict the WET
ratings of quality just as well as the surface level features. Both feature
sets (surface and latent) both pull out all almost of the information that
the rating system bears. * The rating system devised by the WET is not
distinctive. You can best tell the difference between, grouped together,
Featured, A and Good articles vs B articles. Featured, A and Good articles
are also quite distinctive (Figure 1). Note that in this study we didn't
look at Start's and Stubs, but in earlier paper we did. :* This is
interesting when compared to this recent entry on the YouTube blog. "Five
Stars Dominate Ratings"
I think a sane, well researched (with actual subjects) rating system
well within the purview of the Usability Initiative. Helping people find and
create good content is what Wikipedia is all about. Having a solid rating
system allows you to reorganized the user interface, the Wikipedia
namespace, and the main namespace around good content and bad content as
needed. If you don't have a solid, information bearing rating system you
don't know what good content really is (really bad content is easy to spot).
:* My Wikimania talk was all about gathering data from people about articles
and using that to train machines to automatically pick out good content. You
ask people questions along dimensions that make sense to people, and give
the machine access to other surface features (such as a statistical measure
of readability, or length) and latent features (such as can be derived from
document word occurence and encyclopedia link structure). I referenced page
262 of Zen and the Art of Motorcycle Maintenance to give an example of the
kind of qualitative features I would ask people. It really depends on what
features end up bearing information, to be tested in "the lab". Each word is
an example dimension of quality: We have "*unity, vividness, authority,
economy, sensitivity, clarity, emphasis, flow, suspense, brilliance,
precision, proportion, depth and so on.*" You then use surface and latent
features to predict these values for all articles. You can also say, when a
person rates this article as high on the x scale, they also mean that it has
has this much of these surface and these latent features.
= References =
- DeHoust, C., Mangalath, P., Mingus., B. (2008). *Improving search in
Wikipedia through quality and concept discovery*. Technical Report.
- Rassbach, L., Mingus., B, Blackford, T. (2007). *Exploring the
feasibility of automatically rating online article quality*. Technical
I have asked and received permission to forward to you all this most
excellent bit of news.
The linguist list, is a most excellent resource for people interested in the
field of linguistics. As I mentioned some time ago they have had a funding
drive and in that funding drive they asked for a certain amount of money in
a given amount of days and they would then have a project on Wikipedia to
learn what needs doing to get better coverage for the field of linguistics.
What you will read in this mail that the total community of linguists are
asked to cooperate. I am really thrilled as it will also get us more
linguists interested in what we do. My hope is that a fraction will be
interested in the languages that they care for and help it become more
relevant. As a member of the "language prevention committee", I love to get
more knowledgeable people involved in our smaller projects. If it means that
we get more requests for more projects we will really feel embarrassed with
all the new projects we will have to approve because of the quality of the
Incubator content and the quality of the linguistic arguments why we should
approve yet another language :)
NB Is this not a really clever way of raising money; give us this much in
this time frame and we will then do this as a bonus...
---------- Forwarded message ----------
From: LINGUIST Network <linguist(a)linguistlist.org>
Date: Jun 18, 2007 6:53 PM
Subject: 18.1831, All: Call for Participation: Wikipedia Volunteers
LINGUIST List: Vol-18-1831. Mon Jun 18 2007. ISSN: 1068 - 4875.
Subject: 18.1831, All: Call for Participation: Wikipedia Volunteers
Moderators: Anthony Aristar, Eastern Michigan U <aristar(a)linguistlist.org>
Helen Aristar-Dry, Eastern Michigan U <hdry(a)linguistlist.org>
Reviews: Laura Welcher, Rosetta Project
The LINGUIST List is funded by Eastern Michigan University,
and donations from subscribers and publishers.
Editor for this issue: Ann Sawyer <sawyer(a)linguistlist.org>
To post to LINGUIST, use our convenient web form at
From: Hannah Morales < hannah(a)linguistlist.org >
Subject: Wikipedia Volunteers
-------------------------Message 1 ----------------------------------
Date: Mon, 18 Jun 2007 12:49:35
From: Hannah Morales < hannah(a)linguistlist.org >
Subject: Wikipedia Volunteers
As you may recall, one of our Fund Drive 2007 campaigns was called the
"Wikipedia Update Vote." We asked our viewers to consider earmarking their
donations to organize an update project on linguistics entries in the
English-language Wikipedia. You can find more background information on this
The speed with which we met our goal, thanks to the interest and generosity
our readers, was a sure sign that the linguistics community was enthusiastic
about the idea. Now that summer is upon us, and some of you may have a bit
leisure time, we are hoping that you will be able to help us get started on
Wikipedia project. The LINGUIST List's role in this project is a purely
organizational one. We will:
*Help, with your input, to identify major gaps in the Wikipedia materials or
pages that need improvement;
*Compile a list of linguistics pages that Wikipedia editors have identified
"in need of attention from an expert on the subject" or " does not cite any
references or sources," etc;
*Send out periodical calls for volunteer contributors on specific topics or
*Provide simple instructions on how to upload your entries into Wikipedia;
*Keep track of our project Wikipedians;
*Keep track of revisions and new entries;
*Work with Wikimedia Foundation to publicize the linguistics community's
We hope you are as enthusiastic about this effort as we are. Just to help us
get started looking at Wikipedia more critically, and to easily identify an
needing improvement, we suggest that you take a look at the List of
Many people are not listed there; others need to have more facts and
added. If you would like to participate in this exciting update effort,
respond by sending an email to LINGUIST Editor Hannah Morales at
hannah(a)linguistlist.org, suggesting what your role might be or which
entries you feel should be updated or added. Some linguists who saw our
on the Internet have already written us with specific suggestions, which we
share with you soon.
This update project will take major time and effort on all our parts. The
result will be a much richer internet resource of information on the breadth
depth of the field of linguistics. Our efforts should also stimulate
students to consider studying linguistics and to educate a wider public on
we do. Please consider participating.
Editor, Wikipedia Update Project
Linguistic Field(s): Not Applicable
LINGUIST List: Vol-18-1831
There are an increasing number of organisations which have indicated
that their output is Creative Commons by default, however there are
not as many that have a public IP policy which clearly allows staff to
publish "their" work.
i.e. We have moved from the IP policy being the stick used to prevent
openness, and the "work for hire" and "publish process" are the next
A few staff at University of Canberra (UC) have written an IP policy
proposal which clearly gives staff ownership of their work, and
requires CC licensing if their staff use organisational infrastructure
to create their work.
Otago Polytechnic adopted an IP policy like that in 2007.
Are there other examples, within or outside academia, where the
organisation empowers its staff by providing a policy which clarifies
when "work for hire" principle is enforced in this murky world of
Does the WMF have an intellectual property policy for works created by
Employees edit and upload using free licenses under their own name,
but does the copyright belong to the employee or to the WMF?
Is anyone in our community going to:
Global Congress on Intellectual Property and the Public Interest
Washington College of Law
American University, Washington, DC
August 25-27, 2011
One of the side points about the recent image filter survey* was a
discussion of the idea of a "switch" for readers to turn ALL multimedia off
- primarily to reduce the bandwidth required to load a page if you want to.
Then, by chance, I was reading a WP article on my phone this weekend and
noted that there is now a link at the bottom of the mobile gateway
http://en.m.wikipedia.org/ that says "Disable images on mobile
(and the inverse, "enable images on the mobile site", is shown if you click
Is that a new feature or was it always there and I just didn't notice it?
If it's a new feature, was it enabled as a result these recent discussions
or was it just by chance?
Is there any stats (or plans to collect stats) on how many people chose this
option and what kind of mobile device they are browsing from?
Since the "disable images" function is already working, would it be useful
to trial a subtle link to that function on the normal (non-mobile) site for
a couple of weeks? Place it right down the bottom next to the tiny "privacy
policy, about, disclaimers, mobile view" links? This would not be intrusive
and we could see, through this most-subtle of links, whether anyone takes us
up on the offer. If we find that more people click on the feature than
expected then that would potentially justify making the feature more
prominent (either in the toolbox, integrated into the "filter" software, as
a user-preference etc.)?
Just a suggestion.
* Yes, it was a survey not a referendum. A referendum is where, according to
our favourite website "...an entire electorate is asked to either accept or
reject a particular proposal." Personally I'm generally in favour of the
proposed personal image-hiding concept but please let's stop calling it a
Peace, love & metadata
(not responding to anyone in particular) I'm one of the people who tried to
participate in the discussion without taking a strong standpoint
(intentionally - because I'm quite nuanced on the issue, and open for good
arguments of either side) and I have to fully agree with Ryan. I have yet
been unable to participate in this discussion without either being ignored
fully (nothing new to that, I agree) or being put in "the opposite camp". I
basically gave up.
So I do have to say that I agree with the sentiment that the discussion is
not very inviting, and is actually discouraging people who want to find a
solution in the middle to participate. In that respect I do agree with Sue's
analysis. However, considering the background and the 'German issue' I don't
have the feeling it was particularly helpful in resolving that either.
Anyhow, about the filter issue. I think at this stage it is very hard to
determine any opinion about "the filter" because everybody seems to have
their own idea what it will look like, what the consequences will be and how
it will affect their and other people's lives. I myself find it hard to take
a stance based on the little information available and I applaud the
visionaries that can. Information I am even more missing however (and I
think it would have been good to have that information *before* we took any
poll within our own community) is what our average 'reader on the street'
thinks about this. Do they feel they need it? What parts of society are they
from (i.e. is that a group we are representative of? Or one we barely have
any interaction with?) What kind of filter do they want (including the
option: none at all). Obviously this should not be held in the US, but
rather world wide - as widely as possible.
With that information we can make a serious consideration how far we want to
go to give our readers what they want - or not at all. I don't think we
should be making that choice without trying to figure out (unless I missed a
research into that) what they actually do want. We are making way too many
assumptions here which don't strike me as entirely accurate (how do people
get to an article page for example (by Béria), or how many people are
offended by the image on the autofellatio article (by Erik)) - and we don't
have to do that if we would just ask those people we're talking about -
rather than talking about them on our ivory mountain.
One final remark: I couldn't help but laugh a little when I read somewhere
that we are the experts, and we are making decisions for our readers - and
that these readers should have to take that whole complete story, because
what else is the use of having these experts sit together. (probably I
interpreted this with my own thoughts) And I was always thinking that
Wikipedia was about masses participating in their own way - why do we trust
people to 'ruin' an article for others, but not just for themselves?
Hoping for a constructive discussion and more data on what our 'readers'
actually want and/or need...
No dia 30 de Setembro de 2011 11:40, Béria Lima <berialima(a)gmail.com>escreveu:
> I'll go by pieces in your mail Erik.
> *The intro and footer of Sue's post say: "The purpose of this post is not
> > talk specifically about the referendum results or the image hiding
> > (...) So it's perhaps not surprising that she doesn't mention the de.wp
> > regarding the filter in a post that she says is not about the filter. ;-)
> > *
> It is quite surprise yes, since she gave half of the post to de.wiki main
> page "issue". And also, if we decide to
> ABF<http://en.wikipedia.org/wiki/Wikipedia:ABF>of the other side (like
> that post pretty much does) I would say that she
> doesn't mention because would not help her case.
> *Now, it's completely fair to say that the filter issue remains the
> > in the room until it's resolved what will actually be implemented and
> > *
> You forgot the "*IF*": IF the elephant will be or not implemented.
> *What Sue is saying is that we sometimes fail to take the needs and
> > expectations of our readers fully into account
> > *
> Well, if we consider the "referendum" a good place to go see results we
> can say that our readers are in doubt about that issue, pretty much 50%-50%
> in doubt - with the difference that our germans readers are not: They DON'T
> WANT it.
> *Let me be specific. Let's take the good old autofellatio article (...) If
> > you visit http://en.wikipedia.org/wiki/Talk:Autofellatio , you'll
> > notice that there are two big banners: "Wikipedia is not censored" and
> > you find some images offensive you can configure your browser to mask
> > with further instructions. (...) And yet, it's a deeply imperfect
> > The autofellatio page has been viewed 85,000 times in September. The
> > associated discussion page has been viewed 400 times. The "options not
> > see an image" page, which is linked from many many of these pages, has
> > viewed 750 times. We can reasonably hypothesize without digging much
> > into the data that there's a significant number of people who are
> > by images they see in Wikipedia but who don't know how to respond.
> > *
> No we can not. With 85,000 views, would be childish to imagine that only
> people could see the "Discussion" tab over the article. If they got to the
> article (and the article is not on MP) we need to assume that:
> 1. They looked for "*autofellatio*" in Google - thefore they knew what they
> would might find.
> 2. They placed that into the search box - thefore they know at least a bit
> how wikipedia works and know what is a discussion page and how to get
> 3. They got to the article by the links in another article. And by the
> of "What Links here<
> feature there are no article no related with sex and sexuality that links
> this one, so that reader would know what they would find - like the 1. -
> knows how wikipedia works - like 2.
> In any of the cases, I can only imagine that 1 has any reason to be
> and don't know how to find the talk page. Even in that case - if we divide
> by 3 the number of viewers (assuming here that 1, 2 and 3 has exactly the
> same contribution to the number, that is 28,333 people. Which means that -
> from the other 56,667 people - only 400 decided to check what is the talk
> page. Which is 0,7% of the readers. From those, I can only see 3 people
> complaining, which is 0,75% of everyone who goes in the talk page. Can you
> see the idea? Only ~0,7% of all people who say that article is offended by
> it. So, no, we can't assume that people get offended.
> *An alternative would be, for example, to give Wikipedians a piece of wiki
> > syntax that they can use to selectively make images hideable on specific
> > articles. Imagine visiting the article Autofellatio and seeing small
> > at the top that says:
> > "This article contains explicit images that some readers may find
> > objectionable. [[Hide all images on this page]]."*
> That would indeed be a better idea - to be implemented as a gadget to log
> users. - and to be implemented in a way that prevents any kind of
> "censorship categories"
> *Our core community is 91% male, and that does lead to obvious perception
> > biases (and yes, occasional sexism and other -isms). Polls and
> > in our community are typically not only dominated by that core group,
> > they're sometimes in fact explicitly closed to people who aren't meeting
> > sufficient edit count criteria, etc.*
> Yes it is. That does not mean girls get more offend by that. The 9% of the
> girls are not screaming to tire apart all images, are they? In the
> we can see the same 50%-50% pro-oppose in the female community as well. (As
> example: the only 2 girls who commented here - phoebe and me - are in
> opposite sides. Have a vagina don't make us more or less offend for see one
> in the main page.
> : Note there a page who was elected featured article be in the main page
> is not a issue, whatever the subject is.
> : I don't, for the very simple reason that was badly written, as several
> people already said.
> *Béria Lima*
> <http://wikimedia.pt/>(351) 925 171 484
> *Imagine um mundo onde é dada a qualquer pessoa a possibilidade de ter
> acesso ao somatório de todo o conhecimento humano. É isso o que estamos a
> fazer <http://wikimediafoundation.org/wiki/Nossos_projetos>.*
> On 30 September 2011 09:44, Erik Moeller <erik(a)wikimedia.org> wrote:
> > On Wed, Sep 28, 2011 at 11:45 PM, David Gerard <dgerard(a)gmail.com>
> > > The complete absence of mentioning the de:wp poll that was 85% against
> > > any imposed filter is just *weird*.
> > The intro and footer of Sue's post say: "The purpose of this post is
> > not to talk specifically about the referendum results or the image
> > hiding feature"
> > She also wrote in the comments: "What I talk about in this post is
> > completely independent of the filter, and it’s worth discussing (IMO)
> > on its own merits"
> > So it's perhaps not surprising that she doesn't mention the de.wp poll
> > regarding the filter in a post that she says is not about the filter.
> > ;-)
> > Now, it's completely fair to say that the filter issue remains the
> > elephant in the room until it's resolved what will actually be
> > implemented and how. And it's understandable that lots of people are
> > responding accordingly. But I think it's pretty clear that Sue was
> > trying to start a broader conversation in good faith. I know that
> > she's done lots of thinking about the conversations so far including
> > the de.wp poll, and she's also summarized some of this in her report
> > to the Board:
> > The broader conversation she's seeking to kick off in her blog post
> > _can_, IMO, usefully inform the filter conversation.
> > What Sue is saying is that we sometimes fail to take the needs and
> > expectations of our readers fully into account. Whether you agree with
> > her specific examples or not, this is certainly generally true in a
> > community where decisions are generally made by whoever happens to
> > show up, and sometimes the people who show up are biased, stupid or
> > wrong. And even when the people who show up are thoughtful,
> > intelligent and wise, the existing systems, processes and expectations
> > may lead them to only be able to make imperfect decisions.
> > Let me be specific. Let's take the good old autofellatio article,
> > which was one of the first examples of an article with a highly
> > disputed explicit image on the English Wikipedia (cf.
> > http://en.wikipedia.org/wiki/Talk:Autofellatio/Archive_1 ).
> > If you visit http://en.wikipedia.org/wiki/Talk:Autofellatio , you'll
> > notice that there are two big banners: "Wikipedia is not censored" and
> > "If you find some images offensive you can configure your browser to
> > mask them", with further instructions.
> > Often, these kinds of banners come into being because people (readers
> > and active editors) find their way to the talk page and complain about
> > an image being offensive. They are intended to do two things: Explain
> > our philosophy, but also give people support in making more informed
> > choices.
> > This is, in other words, the result of reasonable discussion by
> > thoughtful, intelligent and wise people about how to deal with
> > offensive images (and in some cases, text).
> > And yet, it's a deeply imperfect solution. The autofellatio page has
> > been viewed 85,000 times in September. The associated discussion page
> > has been viewed 400 times. The "options not to see an image" page,
> > which is linked from many many of these pages, has been viewed 750
> > times.
> > We can reasonably hypothesize without digging much further into the
> > data that there's a significant number of people who are offended by
> > images they see in Wikipedia but who don't know how to respond, and we
> > can reasonably hypothesize that the responses that Wikipedians have
> > conceived so far to help them have been overall insufficient in doing
> > so. It would be great to have much more data -- but again, I think
> > these are reasonable hypotheses.
> > The image filter in an incarnation similar to the one that's been
> > discussed to-date is one possible response, but it's not the only one.
> > Indeed, nothing in the Board resolution prescribes a complex system
> > based on categories that exists adjacent to normal mechanisms of
> > editorial control.
> > An alternative would be, for example, to give Wikipedians a piece of
> > wiki syntax that they can use to selectively make images hideable on
> > specific articles. Imagine visiting the article Autofellatio and
> > seeing small print at the top that says:
> > "This article contains explicit images that some readers may find
> > objectionable. [[Hide all images on this page]]."
> > As requested by the Board resolution, it could then be trivial to
> > selectively unhide specific images.
> > If desired, it could be made easy to browse articles with that setting
> > on-by-default, which would be similar to the way the Arabic Wikipedia
> > handles some types of controversial content ( cf.
> > http://ar.wikipedia.org/wiki/%D9%88%D8%B6%D8%B9_%D8%AC%D9%86%D8%B3%D9%8A
> > ).
> > This could possibly be entirely implemented in JS and templates
> > without any complex additional software support, but it would probably
> > be nice to create a standardized tag for it and design the feature
> > itself for maximum usability.
> > Solutions of this type would have the advantage of giving
> > Wiki[mp]edians full editorial judgment and responsibility to use them
> > as they see fit, as opposed to being an imposition from WMF, with an
> > image filter tool showing up on the page about tangential
> > quadrilaterals, and with constant warfare about correct labeling of
> > controversial content. They would also be so broad as to be not very
> > useful for third party censorship.
> > Clearly, one wouldn't just want to tag all articles in this fashion if
> > people complain -- some complaints should be discussed and resolved,
> > not responded to by adding a "Hide it if you don't like it" tag; some
> > should be ignored. By putting the control of when to add the tag fully
> > in the hands of the community, one would also give communities the
> > option to say "Why would we use this feature? We don't need it!" This
> > could then lead to further internal and external conversations.
> > I don't think this would address all the concerns Sue expresses. For
> > example, I think we need to do more to bring readers into
> > conversations, and to treat them respectfully. Our core community is
> > 91% male, and that does lead to obvious perception biases (and yes,
> > occasional sexism and other -isms). Polls and discussions in our
> > community are typically not only dominated by that core group, they're
> > sometimes in fact explicitly closed to people who aren't meeting
> > sufficient edit count criteria, etc. For good reasons, of course --
> > but we need to find ways to hear those voices as well.
> > Overall, I think Sue's post was an effort to move the conversation
> > away from thinking of this issue purely in the terms of the debate as
> > it's taken place so far. I think that's a very worthwhile thing to do.
> > I would also point out that lots of good and thoughtful ideas have
> > been collected at:
> > http://meta.wikimedia.org/wiki/Image_filter_referendum/Next_steps/en
> > IMO the appropriate level of WMF attention to this issue is to 1) look
> > for simple technical help that we can give the community, 2) use the
> > resources that WMF and chapters have (in terms of dedicated, focused
> > attention) to help host conversations in the communities, and bring
> > new voices into the debate, to help us all be the best possible
> > versions of ourselves. And as Sue said, we shouldn't demonize each
> > other in the process. Everyone's trying to think about these topics in
> > a serious fashion, balancing many complex interests, and bringing
> > their own useful perspective.
> > Erik
> > _______________________________________________
> > foundation-l mailing list
> > foundation-l(a)lists.wikimedia.org
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
> foundation-l mailing list
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
> (As > example: the only 2 girls who commented here - phoebe and me - are in
> opposite sides. ...)
Technically, you, Sarah Stierch, Phoebe, and Sue have all commented --
at least 4 women, not just 2.
Volunteer Development Coordinator
I am serious now, please read below as a serious proposal.
I was talking today with a friend about the image filter, and we came
to the possible solution. Of course, if those who are in favor of
censorship have honest intentions to allow to particular people to
access Wikipedia articles despite the problems which they have on
workplace or in country. If they don't have honest intentions, this is
waste of time, but I could say that I tried.
* Create en.safe.wikipedia.org (ar.safe.wikiversity.org and so on).
Those sites would have censored images and/or image filter
implemented. The sites would be a kind of proxies for equivalent
Wikimedia projects without "safe" in the middle. People who access to
those sites would have the same privileges as people who accessed to
the sites without "safe" in the domain name. Thus, everybody who wants
to have "family friendly Wikipedia" would have it on separate site;
everybody who wants to keep Wikipedia free would have it free.
* Create safe.wikimedia.org. That would be the site for
censoring/categorizing Commons images. It shouldn't be Commons itself,
but its virtual fork. The fork would be consisted of hashes of image
names with images themselves. Thus, image on Commons with the name
"Torre_de_H%C3%A9rcules_-_DivesGallaecia2012-62.jpg" would be
"fd37dae713526ee2da82f5a6cf6431de.jpg" on safe.wikimedia.org. The
image preview located on upload.wikimedia.org with the name
it would be translated as "thumb/a1f3216e3344ea115bcac778937947f1.jpg"
on safe.wikimedia.org. (Note: md5 is not likely to be the best hashing
system; some other algorithm could be deployed.)
* Link from the real image name and its hash would be just inside of
the Wikimedia system. It would be easy to find relation image=>hash;
but it would be very hard to find relation into other direction. Thus,
no entity out of Wikimedia would be able to build its censorship
repository in relation to Commons; they would be able to do that just
in relation to safe.wikimedia.org, which is already censored.
Besides the technical benefits, just those interested in censoring
images would have to work on it. Commons community would be spared of
that job. The only reason why such idea would be rejected by those who
are in favor of censorship would be their wet dreams to use Commons
community to censor images for themselves. If they want to censor
images, they should find people interested in doing that; they
shouldn't force one community to do it.
Drawbacks are similar to any abuse of censorship: companies, states
etc. which want to use that system for their own goals would be able
to do that by blocking everything which doesn't have "safe" infix.
But, as said, that's drawback of *any* censorship mechanism. Those who
access through the "safe" wrapper would have to write image names in
their hash format; but that's small price for "family friendliness", I