Wikimedia-l September 2011

wikimedia-l@lists.wikimedia.org

170 participants
142 discussions

[Foundation-l] Improving search in Wikipedia through quality and concept discovery
by Brian J Mingus 09 Sep '22

09 Sep '22

This paper (first reference) is the result of a class project I was part of almost two years ago for CSCI 5417 Information Retrieval Systems. It builds on a class project I did in CSCI 5832 Natural Language Processing and which I presented at Wikimania '07. The project was very late as we didn't send the final paper in until the day before new years. This technical report was never really announced that I recall so I thought it would be interesting to look briefly at the results. The goal of this paper was to break articles down into surface features and latent features and then use those to study the rating system being used, predict article quality and rank results in a search engine. We used the [[random forests]] classifier which allowed us to analyze the contribution of each feature to performance by looking directly at the weights that were assigned. While the surface analysis was performed on the whole english wikipedia, the latent analysis was performed on the simple english wikipedia (it is more expensive to compute). = Surface features = * Readability measures are the single best predictor of quality that I have found, as defined by the Wikipedia Editorial Team (WET). The [[Automated Readability Index]], [[Gunning Fog Index]] and [[Flesch-Kincaid Grade Level]] were the strongest predictors, followed by length of article html, number of paragraphs, [[Flesh Reading Ease]], [[Smog Grading]], number of internal links, [[Laesbarhedsindex Readability Formula]], number of words and number of references. Weakly predictive were number of to be's, number of sentences, [[Coleman-Liau Index]], number of templates, PageRank, number of external links, number of relative links. Not predictive (overall - see the end of section 2 for the per-rating score breakdown): Number of h2 or h3's, number of conjunctions, number of images*, average word length, number of h4's, number of prepositions, number of pronouns, number of interlanguage links, average syllables per word, number of nominalizations, article age (based on page id), proportion of questions, average sentence length. :* Number of images was actually by far the single strongest predictor of any class, but only for Featured articles. Because it was so good at picking out featured articles and somewhat good at picking out A and G articles the classifier was confused in so many cases that the overall contribution of this feature to classification performance is zero. :* Number of external links is strongly predictive of Featured articles. :* The B class is highly distinctive. It has a strong "signature," with high predictive value assigned to many features. The Featured class is also very distinctive. F, B and S (Stop/Stub) contain the most information. :* A is the least distinct class, not being very different from F or G. = Latent features = The algorithm used for latent analysis, which is an analysis of the occurence of words in every document with respect to the link structure of the encyclopedia ("concepts"), is [[Latent Dirichlet Allocation]]. This part of the analysis was done by CS PhD student Praful Mangalath. An example of what can be done with the result of this analysis is that you provide a word (a search query) such as "hippie". You can then look at the weight of every article for the word hippie. You can pick the article with the largest weight, and then look at its link network. You can pick out the articles that this article links to and/or which link to this article that are also weighted strongly for the word hippie, while also contributing maximally to this articles "hippieness". We tried this query in our system (LDA), Google (site:en.wikipedia.org hippie), and the Simple English Wikipedia's Lucene search engine. The breakdown of articles occuring in the top ten search results for this word for those engines is: * LDA only: [[Acid rock]], [[Aldeburgh Festival]], [[Anne Murray]], [[Carl Radle]], [[Harry Nilsson]], [[Jack Kerouac]], [[Phil Spector]], [[Plastic Ono Band]], [[Rock and Roll]], [[Salvador Allende]], [[Smothers brothers]], [[Stanley Kubrick]]. * Google only: [[Glam Rock]], [[South Park]]. * Simple only: [[African Americans]], [[Charles Manson]], [[Counterculture]], [[Drug use]], [[Flower Power]], [[Nuclear weapons]], [[Phish]], [[Sexual liberation]], [[Summer of Love]] * LDA & Google & Simple: [[Hippie]], [[Human Be-in]], [[Students for a democratic society]], [[Woodstock festival]] * LDA & Google: [[Psychedelic Pop]] * Google & Simple: [[Lysergic acid diethylamide]], [[Summer of Love]] ( See the paper for the articles produced for the keywords philosophy and economics ) = Discussion / Conclusion = * The results of the latent analysis are totally up to your perception. But what is interesting is that the LDA features predict the WET ratings of quality just as well as the surface level features. Both feature sets (surface and latent) both pull out all almost of the information that the rating system bears. * The rating system devised by the WET is not distinctive. You can best tell the difference between, grouped together, Featured, A and Good articles vs B articles. Featured, A and Good articles are also quite distinctive (Figure 1). Note that in this study we didn't look at Start's and Stubs, but in earlier paper we did. :* This is interesting when compared to this recent entry on the YouTube blog. "Five Stars Dominate Ratings" http://youtube-global.blogspot.com/2009/09/five-stars-dominate-ratings.html… I think a sane, well researched (with actual subjects) rating system is well within the purview of the Usability Initiative. Helping people find and create good content is what Wikipedia is all about. Having a solid rating system allows you to reorganized the user interface, the Wikipedia namespace, and the main namespace around good content and bad content as needed. If you don't have a solid, information bearing rating system you don't know what good content really is (really bad content is easy to spot). :* My Wikimania talk was all about gathering data from people about articles and using that to train machines to automatically pick out good content. You ask people questions along dimensions that make sense to people, and give the machine access to other surface features (such as a statistical measure of readability, or length) and latent features (such as can be derived from document word occurence and encyclopedia link structure). I referenced page 262 of Zen and the Art of Motorcycle Maintenance to give an example of the kind of qualitative features I would ask people. It really depends on what features end up bearing information, to be tested in "the lab". Each word is an example dimension of quality: We have "*unity, vividness, authority, economy, sensitivity, clarity, emphasis, flow, suspense, brilliance, precision, proportion, depth and so on.*" You then use surface and latent features to predict these values for all articles. You can also say, when a person rates this article as high on the x scale, they also mean that it has has this much of these surface and these latent features. = References = - DeHoust, C., Mangalath, P., Mingus., B. (2008). *Improving search in Wikipedia through quality and concept discovery*. Technical Report. PDF<http://grey.colorado.edu/mediawiki/sites/mingus/images/6/68/DeHoustMangalat…> - Rassbach, L., Mingus., B, Blackford, T. (2007). *Exploring the feasibility of automatically rating online article quality*. Technical Report. PDF<http://grey.colorado.edu/mediawiki/sites/mingus/images/d/d3/RassbachPincock…>

3 2

[Foundation-l] Fwd: 18.1831, All: Call for Participation: Wikipedia Volunteers
by GerardM 09 Aug '22

09 Aug '22

Hoi, I have asked and received permission to forward to you all this most excellent bit of news. The linguist list, is a most excellent resource for people interested in the field of linguistics. As I mentioned some time ago they have had a funding drive and in that funding drive they asked for a certain amount of money in a given amount of days and they would then have a project on Wikipedia to learn what needs doing to get better coverage for the field of linguistics. What you will read in this mail that the total community of linguists are asked to cooperate. I am really thrilled as it will also get us more linguists interested in what we do. My hope is that a fraction will be interested in the languages that they care for and help it become more relevant. As a member of the "language prevention committee", I love to get more knowledgeable people involved in our smaller projects. If it means that we get more requests for more projects we will really feel embarrassed with all the new projects we will have to approve because of the quality of the Incubator content and the quality of the linguistic arguments why we should approve yet another language :) NB Is this not a really clever way of raising money; give us this much in this time frame and we will then do this as a bonus... Thanks, GerardM ---------- Forwarded message ---------- From: LINGUIST Network <linguist(a)linguistlist.org> Date: Jun 18, 2007 6:53 PM Subject: 18.1831, All: Call for Participation: Wikipedia Volunteers To: LINGUIST(a)listserv.linguistlist.org LINGUIST List: Vol-18-1831. Mon Jun 18 2007. ISSN: 1068 - 4875. Subject: 18.1831, All: Call for Participation: Wikipedia Volunteers Moderators: Anthony Aristar, Eastern Michigan U <aristar(a)linguistlist.org> Helen Aristar-Dry, Eastern Michigan U <hdry(a)linguistlist.org> Reviews: Laura Welcher, Rosetta Project <reviews(a)linguistlist.org> Homepage: http://linguistlist.org/ The LINGUIST List is funded by Eastern Michigan University, and donations from subscribers and publishers. Editor for this issue: Ann Sawyer <sawyer(a)linguistlist.org> ================================================================ To post to LINGUIST, use our convenient web form at http://linguistlist.org/LL/posttolinguist.html ===========================Directory============================== 1) Date: 18-Jun-2007 From: Hannah Morales < hannah(a)linguistlist.org > Subject: Wikipedia Volunteers -------------------------Message 1 ---------------------------------- Date: Mon, 18 Jun 2007 12:49:35 From: Hannah Morales < hannah(a)linguistlist.org > Subject: Wikipedia Volunteers Dear subscribers, As you may recall, one of our Fund Drive 2007 campaigns was called the "Wikipedia Update Vote." We asked our viewers to consider earmarking their donations to organize an update project on linguistics entries in the English-language Wikipedia. You can find more background information on this at: http://linguistlist.org/donation/fund-drive2007/wikipedia/index.cfm. The speed with which we met our goal, thanks to the interest and generosity of our readers, was a sure sign that the linguistics community was enthusiastic about the idea. Now that summer is upon us, and some of you may have a bit more leisure time, we are hoping that you will be able to help us get started on the Wikipedia project. The LINGUIST List's role in this project is a purely organizational one. We will: *Help, with your input, to identify major gaps in the Wikipedia materials or pages that need improvement; *Compile a list of linguistics pages that Wikipedia editors have identified as "in need of attention from an expert on the subject" or " does not cite any references or sources," etc; *Send out periodical calls for volunteer contributors on specific topics or articles; *Provide simple instructions on how to upload your entries into Wikipedia; *Keep track of our project Wikipedians; *Keep track of revisions and new entries; *Work with Wikimedia Foundation to publicize the linguistics community's efforts. We hope you are as enthusiastic about this effort as we are. Just to help us all get started looking at Wikipedia more critically, and to easily identify an area needing improvement, we suggest that you take a look at the List of Linguists page at: http://en.wikipedia.org/wiki/List_of_linguists. M Many people are not listed there; others need to have more facts and information added. If you would like to participate in this exciting update effort, please respond by sending an email to LINGUIST Editor Hannah Morales at hannah(a)linguistlist.org, suggesting what your role might be or which linguistics entries you feel should be updated or added. Some linguists who saw our campaign on the Internet have already written us with specific suggestions, which we will share with you soon. This update project will take major time and effort on all our parts. The end result will be a much richer internet resource of information on the breadth and depth of the field of linguistics. Our efforts should also stimulate prospective students to consider studying linguistics and to educate a wider public on what we do. Please consider participating. Sincerely, Hannah Morales Editor, Wikipedia Update Project Linguistic Field(s): Not Applicable ----------------------------------------------------------- LINGUIST List: Vol-18-1831

3 2

[Foundation-l] Intellectual property policy for open organisations
by John Vandenberg 09 Jan '12

09 Jan '12

There are an increasing number of organisations which have indicated that their output is Creative Commons by default, however there are not as many that have a public IP policy which clearly allows staff to publish "their" work. i.e. We have moved from the IP policy being the stick used to prevent openness, and the "work for hire" and "publish process" are the next frontier. A few staff at University of Canberra (UC) have written an IP policy proposal which clearly gives staff ownership of their work, and requires CC licensing if their staff use organisational infrastructure to create their work. http://en.wikiversity.org/wiki/University_of_Canberra/Proposed_policy_on_in… Otago Polytechnic adopted an IP policy like that in 2007. http://wikieducator.org/Otago_Polytechnic/Intellectual_property Are there other examples, within or outside academia, where the organisation empowers its staff by providing a policy which clarifies when "work for hire" principle is enforced in this murky world of online collaboration? Does the WMF have an intellectual property policy for works created by WMF employees? Employees edit and upload using free licenses under their own name, but does the copyright belong to the employee or to the WMF? Is anyone in our community going to: Global Congress on Intellectual Property and the Public Interest Washington College of Law American University, Washington, DC August 25-27, 2011 http://infojustice.org/public-events/global-congress -- John Vandenberg

1 1

[Foundation-l] Year: 2011 Week: 36 Number: 125
by EN Wikizine 18 Oct '11

18 Oct '11

****************************************** __ __ _ _ _ _ / / /\ \ (_) | _(_)___(_)_ __ ___ \ \/ \/ / | |/ / |_ / | '_ \ / _ \ \ /\ /| | <| |/ /| | | | | __/ \/ \/ |_|_|\_\_/___|_|_| |_|\___| .org Year: 2011 Week: 36 Number: 125 ****************************************** An independent internal news bulletin for the members of the Wikimedia community ////////////////////////////////////////// === Community === [Stewards election] - Candidate submission will last up to September 7th. Voting will be held between September 15th and October 6th. http://meta.wikimedia.org/wiki/Stewards/elections_2011-2 [Writing contest] - In 2004 the Dutch language Wikipedia was the first Wikipedia to organize a writing contest. Now at the 1th of September already the 8th edition will start and run for 2 months. Users can work alone or in a team on an article of their choice. At the end the jury will award the prizes to the winners; an image of the trophy they can put on their user pages. http://nl.wikipedia.org/wiki/Wikipedia:Schrijfwedstrijd [Research committee] - Next Research committee meeting will be held on September 2nd. http://meta.wikimedia.org/wiki/Research_Committee/Meetings/Meeting_2011-09-… [Project closures] - Scots Wikipedia proposed for closure; proposal rejected during the same day. Proposals for Inuktitut and Old English Wikipedias closure rejected, while proposal for Asturianu Wikibooks closure accepted using the standard procedure. http://meta.wikimedia.org/wiki/Proposals_for_closing_projects/Closure_of_Sc… http://meta.wikimedia.org/wiki/Proposals_for_closing_projects/Closure_of_In… http://meta.wikimedia.org/wiki/Proposals_for_closing_projects/Closure_of_In… http://meta.wikimedia.org/wiki/Proposals_for_closing_projects/Closure_of_Ol… http://meta.wikimedia.org/wiki/Proposals_for_closing_projects/Closure_of_As… http://meta.wikimedia.org/wiki/Closing_projects_policy === Other news === [WikiLoves Monuments] - Have a date with a monument and send the proof to Commons! And, you never know, you may win one of many nice prizes. WikiLoves Monuments is project in 18 European countries to get quality pictures of important items of cultural heritage. It is organized by national groups, with the exception of the program representing the Kingdom of Belgium and the Grand Duchy of Luxembourg - which is one project that crosses state lines and includes several languages. The project takes the form of an image contest. The user needs to upload their pictures to Commons in September. A national jury, depending on the country where the monument is located, will judge and proclaim the winners and award prizes. But the best prize is to get freely licensed pictures of our heritage out there. http://www.wikilovesmonuments.eu/ --- see menu at the right for list of participating country's http://tinyurl.com/3bzt2oc -- Wikimedia Germany about WikiLoves Monuments (google translation) === Technical news === [AbuseFilter] - this is an extension for MediaWiki, which helps prevent vandalism on wikis, and is now active on all wikis. Before you needed to ask (via a bugzilla request) to enable it for your wiki. http://blog.wikimedia.org/2011/08/24/filter-preventing-abusive-edits-all-wi… http://thread.gmane.org/gmane.org.wikimedia.foundation/54632 http://en.wikipedia.org/wiki/Wikipedia:Edit_filter -- documentation about it === Foundation === [Results are in] - ... of the editor survey of April 2011. The actual report is available as a PDF on Meta. But there is also an extensive summary on Meta. And if that is also too long to read for you - check out the Wikipedia Signpost - they will probably give a short summary of it. http://blog.wikimedia.org/2011/08/29/report-for-editor-survey-april-2011/ http://meta.wikimedia.org/wiki/Editor_Survey_2011 -- actual report here! http://meta.wikimedia.org/wiki/Editor_Survey_2011/Executive_Summary http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost === Chapters === [Wikipedia.no] - YES! - The Chapter Wikimedia Norway has unanimously decide to make the domain wikipedia.no a portal page instead of pointing it to the bokmål version of Wikipedia. In many countries the first thing internet users enter is <name>+national TLD when they look for a website. By sharing this important internet real estate other, mostly very small, Wikipedias in languages of that county get exposure to visitors. Others, like wikipedia.be and wikipedia.be made this change long ago. Some, like wikipedia.de , choose not to. http://www.wikipedia.no http://tinyurl.com/3l7cchr -- WM Norway press release about it (Google translation) [Chapters Planet] - There are many WM Chapters so a special blog aggregator for all the postings by them seemed to be a good idea. User Bence, a student in Hungary, is offering this service at chaptersplanet.org Also is there an applet with the chapters twitter streams. http://www.chaptersplanet.org/ [Wikimedia Washington DC] - Wiki Society of Washington, DC Inc. has been approved by Chapters committee as sub-national chapter under the name Wikimedia District of Columbia. It is now before the Board for discussion and approval. http://meta.wikimedia.org/wiki/Chapters_committee/Resolutions/Approval_of_W… [Wikimedia Hungary] - Wikimedia Hungary published their report for May 2011. http://meta.wikimedia.org/wiki/Wikimedia_chapters/Reports/Wikim%C3%A9dia_Ma… [Wikimedia Sweden] - Wikimedia Sweden published their report for June and July 2011. http://lists.wikimedia.org/pipermail/wikimediaannounce-l/2011-August/000223… [Wikimedia Denmark] - Wikimedia Denmark published their report for July 2011. http://meta.wikimedia.org/wiki/Wikimedia_Danmark/Kvartalsrapport_juli_2011 (in Danish) http://tinyurl.com/3kvnwxm -- Google translation [Iberocoop] - Report from Wikimania published by Wikimedia Argentina (in Spanish). http://www.wikimedia.org.ar/node/45 (in Spanish) http://tinyurl.com/3g3vsyh -- Google translation http://meta.wikimedia.org/wiki/Iberocoop [Women and Wikimedia] - Wikimedia UK published report "Girl Geeks V. Wikimeet ? An exercise in real-time collaboration" http://blog.wikimedia.org.uk/2011/08/girl-geeks-v-wikimeet-an-exercise-in-r… [Wikimedia UK] - Wikimedia UK published report for July 2011. http://uk.wikimedia.org/wiki/Reports/2011/July === Meetups === [RegioWikiCamp Brest (France)] - September 2, 2011 http://wiki.regiowiki.eu/RegioWikiCamp_2011 [Hong Kong 57] - September 16, 2011 http://meta.wikimedia.org/wiki/Meetup/Hong_Kong/57 [Manchester wikimeet] - September 17, 2011 http://meta.wikimedia.org/wiki/Meetup/Manchester [Berkeley Wikipedia Meetup] - September 17, 2011 http://en.wikipedia.org/wiki/Wikipedia:Meetup/Berkeley (Meetups section is filled from Wikipedia:Meetup page at English Wikipedia. Please, add your meetup there and it will be published by Wikizine.) http://en.wikipedia.org/wiki/Wikipedia:Meetup === Media === [Wikipedia Signpost] - Volume 7, Issue 34 ? 22 August 2011 has been published. Stories in this edition: Journalist regrets not checking citation, PR firms issue advice on how to ?survive? Wikipedia (but U.S. Congressman caught red-handed), Girl Geeks edit while they dine, JJ Harrison on avian photography, After eleven moves, name for islands now under arbitration. http://www.wikipediasignpost.com/blog/?p=374 === Stats === [Wikipedia] - July 2011 was the best July ever by the number of very active users for Chinese, Hindi, Spanish, Portuguese, Russian, Indonesian, French and many more Wikipedias. http://stats.wikimedia.org/EN/ChartsWikipediaZH.htm http://stats.wikimedia.org/EN/ChartsWikipediaHI.htm http://stats.wikimedia.org/EN/ChartsWikipediaES.htm http://stats.wikimedia.org/EN/ChartsWikipediaPT.htm http://stats.wikimedia.org/EN/ChartsWikipediaRU.htm http://stats.wikimedia.org/EN/ChartsWikipediaID.htm http://stats.wikimedia.org/EN/ChartsWikipediaFR.htm === In the news === [News.com.au] - Article "The Wikipedia article 'philosophy game' in numbers" published in news.com.au http://www.news.com.au/technology/the-wikipedia-article-philosophy-game-in-… [Daily India] - Daily India published article about Wikimedia India (although wrongly named "Wikipedia-India") http://www.dnaindia.com/academy/report_college-of-engineering-pune-students… [Jimmy Wales on BBC] - Jimmy Wales will appear at BBC Radio 3's Free Thinking Festival 2011 begin November. http://www.newsonnews.net/radio/11213-wikipedia-founder-heads-line-up-at-bb… [Network World] - Network World authors Jim Metzler and Steve Taylor think that Wikipedia "is an open source, peer-reviewed online dictionary". http://www.networkworld.com/newsletters/frame/2011/082211wan1.html [Inquirer News] - Inquirer News author Paolo G. Montecillo thinks that "Wikipedia is our smartest teacher". http://newsinfo.inquirer.net/45417/when-kids-go-online-unguided === From Wikipedia === [Khat] - Khat is a flowering plant native to tropical East Africa and the Arabian Peninsula. It contains the alkaloid called cathinone, an amphetamine-like stimulant which is said to cause excitement, loss of appetite, and euphoria. Khat is so popular in Yemen that its cultivation consumes much of the country's agricultural resources. It is estimated that 40% of the country's water supply goes towards irrigating it, with production increasing by about 10% to 15% every year. http://en.wikipedia.org/wiki/Khat [Kingdom Tower] - Kingdom Tower is a supertall skyscraper approved for construction in Jeddah, Saudi Arabia. Its preliminary cost is 1.23 billion USD and it will be tall at least 1,000 meters, becoming the tallest building in the world. http://en.wikipedia.org/wiki/Kingdom_Tower [Louie Louie] - In February, 1964, an outraged parent wrote to Robert Kennedy, then the Attorney General of the United States, alleging that the lyrics of "Louie Louie" were obscene. The Federal Bureau of Investigation investigated the complaint. In June 1965, the FBI laboratory obtained a copy of the Kingsmen recording and, after two years of investigation, concluded that the recording could not be interpreted, that it was "unintelligible at any speed" [...] http://en.wikipedia.org/wiki/Louie_Louie#Lyrics_investigation ////////////////////////////////////////// @@@@@@@@ Wikizine seeks editors @@@@@@@@@ °°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°° Editor(s): Milos, Walter Corrector(s): Nathan Thanks to: Nathan Contact: http://report.wikizine.org Website: http://www.wikizine.org °°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°°° @@@@@@ Reader feedback is welcome @@@@@@ ////////////////////////////////////////// Wikizine.org makes no guarantee of accuracy, validity and especially but not limited to, correct grammar and spelling. Satisfaction is not guaranteed. Some content can be highly inspired or directly copied from other sources. Those sources are listed above at "Sources-Attributions". Wikizine.org is published by [[meta:user:Walter]]. Content is available under Creative Commons Attribution/Share-Alike License 3.0 http://creativecommons.org/licenses/by-sa/3.0/

5 7

[Foundation-l] all images on/off function
by Liam Wyatt 07 Oct '11

07 Oct '11

One of the side points about the recent image filter survey* was a discussion of the idea of a "switch" for readers to turn ALL multimedia off - primarily to reduce the bandwidth required to load a page if you want to. Then, by chance, I was reading a WP article on my phone this weekend and noted that there is now a link at the bottom of the mobile gateway http://en.m.wikipedia.org/ that says "Disable images on mobile site<http://en.m.wikipedia.org/w/index.php?title=Main_Page&useformat=mobile&disa…>" (and the inverse, "enable images on the mobile site", is shown if you click it). Is that a new feature or was it always there and I just didn't notice it? If it's a new feature, was it enabled as a result these recent discussions or was it just by chance? Is there any stats (or plans to collect stats) on how many people chose this option and what kind of mobile device they are browsing from? Since the "disable images" function is already working, would it be useful to trial a subtle link to that function on the normal (non-mobile) site for a couple of weeks? Place it right down the bottom next to the tiny "privacy policy, about, disclaimers, mobile view" links? This would not be intrusive and we could see, through this most-subtle of links, whether anyone takes us up on the offer. If we find that more people click on the feature than expected then that would potentially justify making the feature more prominent (either in the toolbox, integrated into the "filter" software, as a user-preference etc.)? Just a suggestion. -Liam * Yes, it was a survey not a referendum. A referendum is where, according to our favourite website "...an entire electorate is asked to either accept or reject a particular proposal." Personally I'm generally in favour of the proposed personal image-hiding concept but please let's stop calling it a referendum. wittylama.com/blog Peace, love & metadata

5 6

[Foundation-l] We need more information (was: Blog from Sue about ...)
by Lodewijk 05 Oct '11

05 Oct '11

(not responding to anyone in particular) I'm one of the people who tried to participate in the discussion without taking a strong standpoint (intentionally - because I'm quite nuanced on the issue, and open for good arguments of either side) and I have to fully agree with Ryan. I have yet been unable to participate in this discussion without either being ignored fully (nothing new to that, I agree) or being put in "the opposite camp". I basically gave up. So I do have to say that I agree with the sentiment that the discussion is not very inviting, and is actually discouraging people who want to find a solution in the middle to participate. In that respect I do agree with Sue's analysis. However, considering the background and the 'German issue' I don't have the feeling it was particularly helpful in resolving that either. Anyhow, about the filter issue. I think at this stage it is very hard to determine any opinion about "the filter" because everybody seems to have their own idea what it will look like, what the consequences will be and how it will affect their and other people's lives. I myself find it hard to take a stance based on the little information available and I applaud the visionaries that can. Information I am even more missing however (and I think it would have been good to have that information *before* we took any poll within our own community) is what our average 'reader on the street' thinks about this. Do they feel they need it? What parts of society are they from (i.e. is that a group we are representative of? Or one we barely have any interaction with?) What kind of filter do they want (including the option: none at all). Obviously this should not be held in the US, but rather world wide - as widely as possible. With that information we can make a serious consideration how far we want to go to give our readers what they want - or not at all. I don't think we should be making that choice without trying to figure out (unless I missed a research into that) what they actually do want. We are making way too many assumptions here which don't strike me as entirely accurate (how do people get to an article page for example (by Béria), or how many people are offended by the image on the autofellatio article (by Erik)) - and we don't have to do that if we would just ask those people we're talking about - rather than talking about them on our ivory mountain. One final remark: I couldn't help but laugh a little when I read somewhere that we are the experts, and we are making decisions for our readers - and that these readers should have to take that whole complete story, because what else is the use of having these experts sit together. (probably I interpreted this with my own thoughts) And I was always thinking that Wikipedia was about masses participating in their own way - why do we trust people to 'ruin' an article for others, but not just for themselves? Hoping for a constructive discussion and more data on what our 'readers' actually want and/or need... Lodewijk No dia 30 de Setembro de 2011 11:40, Béria Lima <berialima(a)gmail.com>escreveu: > I'll go by pieces in your mail Erik. > > *The intro and footer of Sue's post say: "The purpose of this post is not > to > > talk specifically about the referendum results or the image hiding > feature" > > (...) So it's perhaps not surprising that she doesn't mention the de.wp > poll > > regarding the filter in a post that she says is not about the filter. ;-) > > * > > > It is quite surprise yes, since she gave half of the post to de.wiki main > page "issue"[1]. And also, if we decide to > ABF<http://en.wikipedia.org/wiki/Wikipedia:ABF>of the other side (like > that post pretty much does) I would say that she > doesn't mention because would not help her case. > > *Now, it's completely fair to say that the filter issue remains the > elephant > > in the room until it's resolved what will actually be implemented and > how. > > * > > > You forgot the "*IF*": IF the elephant will be or not implemented. > > *What Sue is saying is that we sometimes fail to take the needs and > > expectations of our readers fully into account > > * > > > Well, if we consider the "referendum" a good place to go see results[2] we > can say that our readers are in doubt about that issue, pretty much 50%-50% > in doubt - with the difference that our germans readers are not: They DON'T > WANT it. > > *Let me be specific. Let's take the good old autofellatio article (...) If > > you visit http://en.wikipedia.org/wiki/Talk:Autofellatio , you'll > > notice that there are two big banners: "Wikipedia is not censored" and > "If > > you find some images offensive you can configure your browser to mask > them", > > with further instructions. (...) And yet, it's a deeply imperfect > solution. > > The autofellatio page has been viewed 85,000 times in September. The > > associated discussion page has been viewed 400 times. The "options not > to > > see an image" page, which is linked from many many of these pages, has > been > > viewed 750 times. We can reasonably hypothesize without digging much > further > > into the data that there's a significant number of people who are > offended > > by images they see in Wikipedia but who don't know how to respond. > > * > > > No we can not. With 85,000 views, would be childish to imagine that only > 400 > people could see the "Discussion" tab over the article. If they got to the > article (and the article is not on MP) we need to assume that: > 1. They looked for "*autofellatio*" in Google - thefore they knew what they > would might find. > 2. They placed that into the search box - thefore they know at least a bit > how wikipedia works and know what is a discussion page and how to get > there. > 3. They got to the article by the links in another article. And by the > links > of "What Links here< > http://en.wikipedia.org/w/index.php?title=Special:WhatLinksHere/Autofellati… > >" > feature there are no article no related with sex and sexuality that links > to > this one, so that reader would know what they would find - like the 1. - > and > knows how wikipedia works - like 2. > > In any of the cases, I can only imagine that 1 has any reason to be > offended > and don't know how to find the talk page. Even in that case - if we divide > by 3 the number of viewers (assuming here that 1, 2 and 3 has exactly the > same contribution to the number, that is 28,333 people. Which means that - > from the other 56,667 people - only 400 decided to check what is the talk > page. Which is 0,7% of the readers. From those, I can only see 3 people > complaining, which is 0,75% of everyone who goes in the talk page. Can you > see the idea? Only ~0,7% of all people who say that article is offended by > it. So, no, we can't assume that people get offended. > > *An alternative would be, for example, to give Wikipedians a piece of wiki > > syntax that they can use to selectively make images hideable on specific > > articles. Imagine visiting the article Autofellatio and seeing small > print > > at the top that says: > > > > "This article contains explicit images that some readers may find > > objectionable. [[Hide all images on this page]]."* > > > > That would indeed be a better idea - to be implemented as a gadget to log > in > users. - and to be implemented in a way that prevents any kind of > "censorship categories" > > *Our core community is 91% male, and that does lead to obvious perception > > biases (and yes, occasional sexism and other -isms). Polls and > discussions > > in our community are typically not only dominated by that core group, > > they're sometimes in fact explicitly closed to people who aren't meeting > > sufficient edit count criteria, etc.* > > > > Yes it is. That does not mean girls get more offend by that. The 9% of the > girls are not screaming to tire apart all images, are they? In the > opposite, > we can see the same 50%-50% pro-oppose in the female community as well. (As > example: the only 2 girls who commented here - phoebe and me - are in > opposite sides. Have a vagina don't make us more or less offend for see one > in the main page. > > > > [1]: Note there a page who was elected featured article be in the main page > is not a issue, whatever the subject is. > [2]: I don't, for the very simple reason that was badly written, as several > people already said. > _____ > *Béria Lima* > <http://wikimedia.pt/>(351) 925 171 484 > > *Imagine um mundo onde é dada a qualquer pessoa a possibilidade de ter > livre > acesso ao somatório de todo o conhecimento humano. É isso o que estamos a > fazer <http://wikimediafoundation.org/wiki/Nossos_projetos>.* > > > On 30 September 2011 09:44, Erik Moeller <erik(a)wikimedia.org> wrote: > > > On Wed, Sep 28, 2011 at 11:45 PM, David Gerard <dgerard(a)gmail.com> > wrote: > > > The complete absence of mentioning the de:wp poll that was 85% against > > > any imposed filter is just *weird*. > > > > The intro and footer of Sue's post say: "The purpose of this post is > > not to talk specifically about the referendum results or the image > > hiding feature" > > > > She also wrote in the comments: "What I talk about in this post is > > completely independent of the filter, and it’s worth discussing (IMO) > > on its own merits" > > > > So it's perhaps not surprising that she doesn't mention the de.wp poll > > regarding the filter in a post that she says is not about the filter. > > ;-) > > > > Now, it's completely fair to say that the filter issue remains the > > elephant in the room until it's resolved what will actually be > > implemented and how. And it's understandable that lots of people are > > responding accordingly. But I think it's pretty clear that Sue was > > trying to start a broader conversation in good faith. I know that > > she's done lots of thinking about the conversations so far including > > the de.wp poll, and she's also summarized some of this in her report > > to the Board: > > > > > > > http://meta.wikimedia.org/wiki/Image_filter_referendum/Sue%27s_report_to_th… > > > > The broader conversation she's seeking to kick off in her blog post > > _can_, IMO, usefully inform the filter conversation. > > > > What Sue is saying is that we sometimes fail to take the needs and > > expectations of our readers fully into account. Whether you agree with > > her specific examples or not, this is certainly generally true in a > > community where decisions are generally made by whoever happens to > > show up, and sometimes the people who show up are biased, stupid or > > wrong. And even when the people who show up are thoughtful, > > intelligent and wise, the existing systems, processes and expectations > > may lead them to only be able to make imperfect decisions. > > > > Let me be specific. Let's take the good old autofellatio article, > > which was one of the first examples of an article with a highly > > disputed explicit image on the English Wikipedia (cf. > > http://en.wikipedia.org/wiki/Talk:Autofellatio/Archive_1 ). > > > > If you visit http://en.wikipedia.org/wiki/Talk:Autofellatio , you'll > > notice that there are two big banners: "Wikipedia is not censored" and > > "If you find some images offensive you can configure your browser to > > mask them", with further instructions. > > > > Often, these kinds of banners come into being because people (readers > > and active editors) find their way to the talk page and complain about > > an image being offensive. They are intended to do two things: Explain > > our philosophy, but also give people support in making more informed > > choices. > > > > This is, in other words, the result of reasonable discussion by > > thoughtful, intelligent and wise people about how to deal with > > offensive images (and in some cases, text). > > > > And yet, it's a deeply imperfect solution. The autofellatio page has > > been viewed 85,000 times in September. The associated discussion page > > has been viewed 400 times. The "options not to see an image" page, > > which is linked from many many of these pages, has been viewed 750 > > times. > > > > We can reasonably hypothesize without digging much further into the > > data that there's a significant number of people who are offended by > > images they see in Wikipedia but who don't know how to respond, and we > > can reasonably hypothesize that the responses that Wikipedians have > > conceived so far to help them have been overall insufficient in doing > > so. It would be great to have much more data -- but again, I think > > these are reasonable hypotheses. > > > > The image filter in an incarnation similar to the one that's been > > discussed to-date is one possible response, but it's not the only one. > > Indeed, nothing in the Board resolution prescribes a complex system > > based on categories that exists adjacent to normal mechanisms of > > editorial control. > > > > An alternative would be, for example, to give Wikipedians a piece of > > wiki syntax that they can use to selectively make images hideable on > > specific articles. Imagine visiting the article Autofellatio and > > seeing small print at the top that says: > > > > "This article contains explicit images that some readers may find > > objectionable. [[Hide all images on this page]]." > > > > As requested by the Board resolution, it could then be trivial to > > selectively unhide specific images. > > > > If desired, it could be made easy to browse articles with that setting > > on-by-default, which would be similar to the way the Arabic Wikipedia > > handles some types of controversial content ( cf. > > http://ar.wikipedia.org/wiki/%D9%88%D8%B6%D8%B9_%D8%AC%D9%86%D8%B3%D9%8A > > ). > > > > This could possibly be entirely implemented in JS and templates > > without any complex additional software support, but it would probably > > be nice to create a standardized tag for it and design the feature > > itself for maximum usability. > > > > Solutions of this type would have the advantage of giving > > Wiki[mp]edians full editorial judgment and responsibility to use them > > as they see fit, as opposed to being an imposition from WMF, with an > > image filter tool showing up on the page about tangential > > quadrilaterals, and with constant warfare about correct labeling of > > controversial content. They would also be so broad as to be not very > > useful for third party censorship. > > > > Clearly, one wouldn't just want to tag all articles in this fashion if > > people complain -- some complaints should be discussed and resolved, > > not responded to by adding a "Hide it if you don't like it" tag; some > > should be ignored. By putting the control of when to add the tag fully > > in the hands of the community, one would also give communities the > > option to say "Why would we use this feature? We don't need it!" This > > could then lead to further internal and external conversations. > > > > I don't think this would address all the concerns Sue expresses. For > > example, I think we need to do more to bring readers into > > conversations, and to treat them respectfully. Our core community is > > 91% male, and that does lead to obvious perception biases (and yes, > > occasional sexism and other -isms). Polls and discussions in our > > community are typically not only dominated by that core group, they're > > sometimes in fact explicitly closed to people who aren't meeting > > sufficient edit count criteria, etc. For good reasons, of course -- > > but we need to find ways to hear those voices as well. > > > > Overall, I think Sue's post was an effort to move the conversation > > away from thinking of this issue purely in the terms of the debate as > > it's taken place so far. I think that's a very worthwhile thing to do. > > I would also point out that lots of good and thoughtful ideas have > > been collected at: > > http://meta.wikimedia.org/wiki/Image_filter_referendum/Next_steps/en > > > > IMO the appropriate level of WMF attention to this issue is to 1) look > > for simple technical help that we can give the community, 2) use the > > resources that WMF and chapters have (in terms of dedicated, focused > > attention) to help host conversations in the communities, and bring > > new voices into the debate, to help us all be the best possible > > versions of ourselves. And as Sue said, we shouldn't demonize each > > other in the process. Everyone's trying to think about these topics in > > a serious fashion, balancing many complex interests, and bringing > > their own useful perspective. > > > > Erik > > > > _______________________________________________ > > foundation-l mailing list > > foundation-l(a)lists.wikimedia.org > > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l > > > _______________________________________________ > foundation-l mailing list > foundation-l(a)lists.wikimedia.org > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l >

8 10

[Foundation-l] Dead Sea Scrolls
by emijrp 05 Oct '11

05 Oct '11

Hi all; Finally, the Dead Sea Scrolls[1] have copyright[2]. Courtesy of The Israel Museum. Congratulations. By they way: Hi Wikimedia Israel. Regards, emijrp [1] http://dss.collections.imj.org.il/ [2] http://dss.collections.imj.org.il/terms_pg

19 47

Re: [Foundation-l] Blog from Sue about censorship, editorial judgement, and image filters
by Sumana Harihareswara 04 Oct '11

04 Oct '11

> (As > example: the only 2 girls who commented here - phoebe and me - are in > opposite sides. ...) -*B?ria Lima* Technically, you, Sarah Stierch, Phoebe, and Sue have all commented -- at least 4 women, not just 2. -- Sumana Harihareswara Volunteer Development Coordinator Wikimedia Foundation

23 58

[Foundation-l] Possible solution for image filter
by Milos Rancic 02 Oct '11

02 Oct '11

I am serious now, please read below as a serious proposal. I was talking today with a friend about the image filter, and we came to the possible solution. Of course, if those who are in favor of censorship have honest intentions to allow to particular people to access Wikipedia articles despite the problems which they have on workplace or in country. If they don't have honest intentions, this is waste of time, but I could say that I tried. * Create en.safe.wikipedia.org (ar.safe.wikiversity.org and so on). Those sites would have censored images and/or image filter implemented. The sites would be a kind of proxies for equivalent Wikimedia projects without "safe" in the middle. People who access to those sites would have the same privileges as people who accessed to the sites without "safe" in the domain name. Thus, everybody who wants to have "family friendly Wikipedia" would have it on separate site; everybody who wants to keep Wikipedia free would have it free. * Create safe.wikimedia.org. That would be the site for censoring/categorizing Commons images. It shouldn't be Commons itself, but its virtual fork. The fork would be consisted of hashes of image names with images themselves. Thus, image on Commons with the name "Torre_de_H%C3%A9rcules_-_DivesGallaecia2012-62.jpg" would be "fd37dae713526ee2da82f5a6cf6431de.jpg" on safe.wikimedia.org. The image preview located on upload.wikimedia.org with the name "thumb/8/80/Torre_de_H%C3%A9rcules_-_DivesGallaecia2012-62.jpg/800px-Torre_de_H%C3%A9rcules_-_DivesGallaecia2012-62.jpg"; it would be translated as "thumb/a1f3216e3344ea115bcac778937947f1.jpg" on safe.wikimedia.org. (Note: md5 is not likely to be the best hashing system; some other algorithm could be deployed.) * Link from the real image name and its hash would be just inside of the Wikimedia system. It would be easy to find relation image=>hash; but it would be very hard to find relation into other direction. Thus, no entity out of Wikimedia would be able to build its censorship repository in relation to Commons; they would be able to do that just in relation to safe.wikimedia.org, which is already censored. Besides the technical benefits, just those interested in censoring images would have to work on it. Commons community would be spared of that job. The only reason why such idea would be rejected by those who are in favor of censorship would be their wet dreams to use Commons community to censor images for themselves. If they want to censor images, they should find people interested in doing that; they shouldn't force one community to do it. Drawbacks are similar to any abuse of censorship: companies, states etc. which want to use that system for their own goals would be able to do that by blocking everything which doesn't have "safe" infix. But, as said, that's drawback of *any* censorship mechanism. Those who access through the "safe" wrapper would have to write image names in their hash format; but that's small price for "family friendliness", I suppose. Thoughts?

9 12

[Foundation-l] Blog from Sue about censorship, editorial judgement, and image filters
by Keegan Peterzell 01 Oct '11

01 Oct '11

http://suegardner.org/2011/09/28/on-editorial-judgment-and-empathy/ Pretty sound blog, no matter which position you take. Naturally, please discuss the blog on the blog and not thread this too much back to conversation about the image filter. -- ~Keegan http://en.wikipedia.org/wiki/User:Keegan

23 42

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Wikimedia-l September 2011