This paper (first reference) is the result of a class project I was part of
almost two years ago for CSCI 5417 Information Retrieval Systems. It builds
on a class project I did in CSCI 5832 Natural Language Processing and which
I presented at Wikimania '07. The project was very late as we didn't send
the final paper in until the day before new years. This technical report was
never really announced that I recall so I thought it would be interesting to
look briefly at the results. The goal of this paper was to break articles
down into surface features and latent features and then use those to study
the rating system being used, predict article quality and rank results in a
search engine. We used the [[random forests]] classifier which allowed us to
analyze the contribution of each feature to performance by looking directly
at the weights that were assigned. While the surface analysis was performed
on the whole english wikipedia, the latent analysis was performed on the
simple english wikipedia (it is more expensive to compute). = Surface
features = * Readability measures are the single best predictor of quality
that I have found, as defined by the Wikipedia Editorial Team (WET). The
[[Automated Readability Index]], [[Gunning Fog Index]] and [[Flesch-Kincaid
Grade Level]] were the strongest predictors, followed by length of article
html, number of paragraphs, [[Flesh Reading Ease]], [[Smog Grading]], number
of internal links, [[Laesbarhedsindex Readability Formula]], number of words
and number of references. Weakly predictive were number of to be's, number
of sentences, [[Coleman-Liau Index]], number of templates, PageRank, number
of external links, number of relative links. Not predictive (overall - see
the end of section 2 for the per-rating score breakdown): Number of h2 or
h3's, number of conjunctions, number of images*, average word length, number
of h4's, number of prepositions, number of pronouns, number of interlanguage
links, average syllables per word, number of nominalizations, article age
(based on page id), proportion of questions, average sentence length. :*
Number of images was actually by far the single strongest predictor of any
class, but only for Featured articles. Because it was so good at picking out
featured articles and somewhat good at picking out A and G articles the
classifier was confused in so many cases that the overall contribution of
this feature to classification performance is zero. :* Number of external
links is strongly predictive of Featured articles. :* The B class is highly
distinctive. It has a strong "signature," with high predictive value
assigned to many features. The Featured class is also very distinctive. F, B
and S (Stop/Stub) contain the most information.
:* A is the least distinct class, not being very different from F or G. =
Latent features = The algorithm used for latent analysis, which is an
analysis of the occurence of words in every document with respect to the
link structure of the encyclopedia ("concepts"), is [[Latent Dirichlet
Allocation]]. This part of the analysis was done by CS PhD student Praful
Mangalath. An example of what can be done with the result of this analysis
is that you provide a word (a search query) such as "hippie". You can then
look at the weight of every article for the word hippie. You can pick the
article with the largest weight, and then look at its link network. You can
pick out the articles that this article links to and/or which link to this
article that are also weighted strongly for the word hippie, while also
contributing maximally to this articles "hippieness". We tried this query in
our system (LDA), Google (site:en.wikipedia.org hippie), and the Simple
English Wikipedia's Lucene search engine. The breakdown of articles occuring
in the top ten search results for this word for those engines is: * LDA
only: [[Acid rock]], [[Aldeburgh Festival]], [[Anne Murray]], [[Carl
Radle]], [[Harry Nilsson]], [[Jack Kerouac]], [[Phil Spector]], [[Plastic
Ono Band]], [[Rock and Roll]], [[Salvador Allende]], [[Smothers brothers]],
[[Stanley Kubrick]]. * Google only: [[Glam Rock]], [[South Park]]. * Simple
only: [[African Americans]], [[Charles Manson]], [[Counterculture]], [[Drug
use]], [[Flower Power]], [[Nuclear weapons]], [[Phish]], [[Sexual
liberation]], [[Summer of Love]] * LDA & Google & Simple: [[Hippie]],
[[Human Be-in]], [[Students for a democratic society]], [[Woodstock
festival]] * LDA & Google: [[Psychedelic Pop]] * Google & Simple: [[Lysergic
acid diethylamide]], [[Summer of Love]] ( See the paper for the articles
produced for the keywords philosophy and economics ) = Discussion /
Conclusion = * The results of the latent analysis are totally up to your
perception. But what is interesting is that the LDA features predict the WET
ratings of quality just as well as the surface level features. Both feature
sets (surface and latent) both pull out all almost of the information that
the rating system bears. * The rating system devised by the WET is not
distinctive. You can best tell the difference between, grouped together,
Featured, A and Good articles vs B articles. Featured, A and Good articles
are also quite distinctive (Figure 1). Note that in this study we didn't
look at Start's and Stubs, but in earlier paper we did. :* This is
interesting when compared to this recent entry on the YouTube blog. "Five
Stars Dominate Ratings"
I think a sane, well researched (with actual subjects) rating system
well within the purview of the Usability Initiative. Helping people find and
create good content is what Wikipedia is all about. Having a solid rating
system allows you to reorganized the user interface, the Wikipedia
namespace, and the main namespace around good content and bad content as
needed. If you don't have a solid, information bearing rating system you
don't know what good content really is (really bad content is easy to spot).
:* My Wikimania talk was all about gathering data from people about articles
and using that to train machines to automatically pick out good content. You
ask people questions along dimensions that make sense to people, and give
the machine access to other surface features (such as a statistical measure
of readability, or length) and latent features (such as can be derived from
document word occurence and encyclopedia link structure). I referenced page
262 of Zen and the Art of Motorcycle Maintenance to give an example of the
kind of qualitative features I would ask people. It really depends on what
features end up bearing information, to be tested in "the lab". Each word is
an example dimension of quality: We have "*unity, vividness, authority,
economy, sensitivity, clarity, emphasis, flow, suspense, brilliance,
precision, proportion, depth and so on.*" You then use surface and latent
features to predict these values for all articles. You can also say, when a
person rates this article as high on the x scale, they also mean that it has
has this much of these surface and these latent features.
= References =
- DeHoust, C., Mangalath, P., Mingus., B. (2008). *Improving search in
Wikipedia through quality and concept discovery*. Technical Report.
- Rassbach, L., Mingus., B, Blackford, T. (2007). *Exploring the
feasibility of automatically rating online article quality*. Technical
I have asked and received permission to forward to you all this most
excellent bit of news.
The linguist list, is a most excellent resource for people interested in the
field of linguistics. As I mentioned some time ago they have had a funding
drive and in that funding drive they asked for a certain amount of money in
a given amount of days and they would then have a project on Wikipedia to
learn what needs doing to get better coverage for the field of linguistics.
What you will read in this mail that the total community of linguists are
asked to cooperate. I am really thrilled as it will also get us more
linguists interested in what we do. My hope is that a fraction will be
interested in the languages that they care for and help it become more
relevant. As a member of the "language prevention committee", I love to get
more knowledgeable people involved in our smaller projects. If it means that
we get more requests for more projects we will really feel embarrassed with
all the new projects we will have to approve because of the quality of the
Incubator content and the quality of the linguistic arguments why we should
approve yet another language :)
NB Is this not a really clever way of raising money; give us this much in
this time frame and we will then do this as a bonus...
---------- Forwarded message ----------
From: LINGUIST Network <linguist(a)linguistlist.org>
Date: Jun 18, 2007 6:53 PM
Subject: 18.1831, All: Call for Participation: Wikipedia Volunteers
LINGUIST List: Vol-18-1831. Mon Jun 18 2007. ISSN: 1068 - 4875.
Subject: 18.1831, All: Call for Participation: Wikipedia Volunteers
Moderators: Anthony Aristar, Eastern Michigan U <aristar(a)linguistlist.org>
Helen Aristar-Dry, Eastern Michigan U <hdry(a)linguistlist.org>
Reviews: Laura Welcher, Rosetta Project
The LINGUIST List is funded by Eastern Michigan University,
and donations from subscribers and publishers.
Editor for this issue: Ann Sawyer <sawyer(a)linguistlist.org>
To post to LINGUIST, use our convenient web form at
From: Hannah Morales < hannah(a)linguistlist.org >
Subject: Wikipedia Volunteers
-------------------------Message 1 ----------------------------------
Date: Mon, 18 Jun 2007 12:49:35
From: Hannah Morales < hannah(a)linguistlist.org >
Subject: Wikipedia Volunteers
As you may recall, one of our Fund Drive 2007 campaigns was called the
"Wikipedia Update Vote." We asked our viewers to consider earmarking their
donations to organize an update project on linguistics entries in the
English-language Wikipedia. You can find more background information on this
The speed with which we met our goal, thanks to the interest and generosity
our readers, was a sure sign that the linguistics community was enthusiastic
about the idea. Now that summer is upon us, and some of you may have a bit
leisure time, we are hoping that you will be able to help us get started on
Wikipedia project. The LINGUIST List's role in this project is a purely
organizational one. We will:
*Help, with your input, to identify major gaps in the Wikipedia materials or
pages that need improvement;
*Compile a list of linguistics pages that Wikipedia editors have identified
"in need of attention from an expert on the subject" or " does not cite any
references or sources," etc;
*Send out periodical calls for volunteer contributors on specific topics or
*Provide simple instructions on how to upload your entries into Wikipedia;
*Keep track of our project Wikipedians;
*Keep track of revisions and new entries;
*Work with Wikimedia Foundation to publicize the linguistics community's
We hope you are as enthusiastic about this effort as we are. Just to help us
get started looking at Wikipedia more critically, and to easily identify an
needing improvement, we suggest that you take a look at the List of
Many people are not listed there; others need to have more facts and
added. If you would like to participate in this exciting update effort,
respond by sending an email to LINGUIST Editor Hannah Morales at
hannah(a)linguistlist.org, suggesting what your role might be or which
entries you feel should be updated or added. Some linguists who saw our
on the Internet have already written us with specific suggestions, which we
share with you soon.
This update project will take major time and effort on all our parts. The
result will be a much richer internet resource of information on the breadth
depth of the field of linguistics. Our efforts should also stimulate
students to consider studying linguistics and to educate a wider public on
we do. Please consider participating.
Editor, Wikipedia Update Project
Linguistic Field(s): Not Applicable
LINGUIST List: Vol-18-1831
>>> The people who are loudest in their demands for consensus
>>> do not represent the Wikimedia movement.
>> The voices loudest for the WMF doing something against the
>> Trump administration are not representative of the Wikimedia
>> movement either....
> Is the Community Process Steering Committee currently
> prepared to "engage more 'quiet' members of our community"
> with a statistically robust snap survey to resolve this question?
Anyone can go to Recent Changes and send a SurveyMonkey link to the
most recent few hundred editors with contributions at least a year
old, to get an accurate answer.
Will a respected member of the community please do this? I would like
to know what the actual editing community thinks of the travel ban and
their idea of an appropriate response. I don't want to see community
governance by opt-in participation in obscure RFCs.
I would offer to do this myself, but I value keeping my real name
unassociated with my enwiki userid.
This is a request for your input and possible ideas (if any) regarding my management of the fallout from Jimmy's announcing myself as a 2018 Wikimedian of the Year.
Emails below are a copy of my ongoing consultations with Wikimedia Foundation staff and other Wikimedians I personally know, as well as a report on what's already brewing in my region of Russia after this unexpected outcome.
I would be grateful, if you can advise me on how to properly steer the enthusiasm of behalf of regional government, mass-media, NGOs, etc. which have just discovered about the possibility of participation in Wikimedia movement (think anything from U.S. is not getting much in-depth coverage in Russian by sources that regional public figures, NGOs, teachers or general regional journalists read) & are now placing great hopes on teaching whole of Tatarstan about how to Wiki & also engaging all Tatars globally (3/4 outside of Tatarstan, 1/5 outside of Russia).
* Selet WikiSchool got presented to wider public @ the press-conference for Tatar-speaking journalists (July 31), at the poster session in the framework of the World Congress of Tatars Youth Forum (Aug.3), and later at a Tatar projects fair in the Downtown park of Kazan (Aug.5)
* Contribution of those writing into Wikipedia in Tatar was recognized by choosing myself as one of the current year's "For the great service to Tatar nation" medal recipients. (Aug.3)
* We recorded a 40 min interview in Turkish (Aug.4, see https://www.youtube.com/watch?v=HK3tFBWMWcs )
* We didn't yet meet the President of the Republic (his schedule changed once again), but I got another firm request to meet with the Minister for Youth Affairs (another good acquaintance of mine, ex-member of local Comedy Club type activity), as well as with the Head of World Congress of Tatars' Executive Bureau, and more TV & written press interviews are lining up.
* I am trying to manage these guys' optimism & desire to move quickly, keeping in mind they are not really familiar with the Wiki-way or our communities' policies & practices
* Bashkir (ba) & Sakha (sah) communities have shared advice about how to assure that local Wikimedia community stays clearly independent in a cultural environment where neither Education, nor GLAM programs will attract local partners unless those have a strong support of the regional or federal government entities.
* Looking forward to meet with my Wikimania-2017 roommate from the Philippines in Singapore on Aug.16 to collect some more input from Asia
-------- Пересылаемое сообщение--------
03.08.2018, 19:35, "Фархад Фаткуллин / Farkhad Fatkullin" <frhd(a)yandex.com>:
Thursday, 2 August 2018
Republic of Tatarstan Ministry for Informatization & Communication @ IT-park, Kazan
Meeting with Tatarstan Deputy Prime Minister - Minister for Informatization & Communications Roman Shaykhutdinov
on opportunities Wikimedia projects & popularization thereof among the population of the Republic can bring for Tatarstan
* Almaz A. Valiullin, Director General of Tatarstan Center for Information Technologies http://mic.tatarstan.ru/eng/valiullin.htm
* Tatyana S. Kamaletdinova, CEO of Tatarstan Center for Information Technologies http://mic.tatarstan.ru/eng/kamaletdinova.htm
* Anna V. Yakovleva, Head of Ministry's Press-Service http://mic.tatarstan.ru/rus/about/structure_new?department_id=80576
* Farkhad N. Fatkullin, 2018 Wikimedian of the Year, Wikimedia Russia member https://meta.wikimedia.org/wiki/User:Frhdkazan
1. Content creation contests
2. Education projects
3. How can these be organized in a systemic way, engaging students throughout all municipal districts of Tatarstan & Tatar diaspora globally, and assessing effectiveness of measures
4. Larger Tatar language internet development, promoting content creation in the language
5. .tatar domain
7. Generating lists for content creation
8. Free licenses
10. Regional educational Wiki-seminar for beginners, introduction & orientation
11. Smartphone oriented educational projects around Tatar & Tatar Wikipedia (beginning with Tatar version of www.kakprav.com & on to WOK Master type )
12. OSM in Tatar
13. The way forward: visiting WMF to discover other opportunities, MoU with WMF?
1. Almaz Valiullin to head the Working Group on behalf of the Ministry
2. Next meeting set for / around 20th of August
3. Minister requested his staff to draft documents regarding moving all Regional & Municipal budget funded websites to CC-BY type free licenses
4. Minister proposed every organization website place a link to a Wikipedia article about it (in Tatar, Russian or English, depending on the language used)
5. Help is requested about developing simple samples or article creation guidelines for various institutions of Tatarstan & diaspora organizations to learn what is considered appropriate by Wiki-community & how should representatives of respective entities
6. Annual Tatar Internet Awards Ceremony to include awards to leading editors of Wikimedia projects in Tatar
7. Help is requested to develop a framework of Wikipedia Article Contests for Secondary School Children of Tatastan, with article quality assessment schemes (prizes & organization to be funded by Tatarstan goverment, promotion in local mass-media, schools & etc.)
8. Help is requested to develop a framework for a sustainable functioning of a Tatarstan & Tatar language oriented Wikimedia thematic organization to be responsible for systemic work around developing & promoting Wikimedia projects in Tatar, as well as Tatarstan-oriented content creation for various Wikis
9. Readiness to consider awarding *.tatar domain names to those, who develop attractive projects in Tatar that can't be hosted on Wikimedia platform or otherwise need a Tatar digital identity
10. Help is requested regarding drafting a program proposal, necessary budget & expected results for a Tatarstan oriented introductory Wiki-seminar, open for a wide public
11. Help is requested to provide links to Wikidata educational materials, Practical Use Cases & expert responses to the Deputy Minister (in English)
12. Help is requested in organizing a learning visit to WMF Headquarters during Tatarstan Delegation annual fall visit to Silicon Valley & other places in U.S. to discover other opportunities
Farkhad Fatkullin - Фархад Фаткуллин http://sikzn.ru/ Тел.+79274158066 / skype:frhdkazan / Wikipedia:frhdkazan
03.08.2018, 11:42, "Фархад Фаткуллин / Farkhad Fatkullin" <frhd(a)yandex.com>:
> Hi Nochole & Kui,
> Thank you for your responses & readiness to help.
> Events on the ground are developing following predictable course - we had a very constructive 3 hour long discussion last night with Tatarstan's Deputy Prime Minister (IT minister) & his team, whilst earlier Thursday I got an invite to meet with another Deputy Prime Minister (President of the National Council of Trustees for the World Congress of Tatars Association).
> More TV & magazine interviews are expected, some in detail articles in Russian promoting Wikimedia projects are in the pipelines (expecting to see the texts for necessary corrections & adding links to respective WMRU, Meta, WMF & related independent media articles (thanks to ComCom).
> I hope to find time to prepare & email both you (in English) & IT ministry's Working Group (in Russian) my summary of yesterday late evening meeting, with attendees, topics discussed, request & proposals, etc. They want to move as fast as possible, because this will help them to meet promoting Tatar language use online task they are charged with by the region's President.
> Just a few to begin with:
> * Ready to move all regional and municipal government websites into CC-BY, add links to respective WP articles from all these
> * Willing to have Tatarstan & Tatar language oriented thematic organization in place to work with secondary schools & Universities, GLAMS, diaspora
> * Ready to sponsor prizes for article writing contests (either through WMRU or this new entity)
> * Requesting guidance from myself & Wikimedia community on what's the best way to make this all successful
> * Interested in supporting WMRU organized general public educational Wiki-Seminar in Tatarstan Academy of Sciences or other respected facility with a large conference hall open to all public (September-October?)
> * Willing to visit WMF headquarters in November to meet, learn more about what's there (I admit I don't read all mailing lists or Meta discussions, don't have time to see all wonderful YouTube videos available, haven't yet have time to visit a single Wikimedia Conference) & possibly sign MoUs or whatever that would help speed up Wikimedia acceptance & popularity growth among the population of the region.
> I am in touch with Wikimedia RU & Wikimedia Languages of Russia Community's regarding ongoing developments, benefiting from collective wisdom of Wikimedians I know.
> Should you be ready to bring any ideas to the table, please shoot them my way.
> Farkhad Fatkullin - Фархад Фаткуллин http://sikzn.ru/ Тел.+79274158066 / skype:frhdkazan / Wikipedia:frhdkazan
> 02.08.2018, 16:35, "Nichole Saad" <nsaad(a)wikimedia.org>:
>> Hi Farhad,
>> First, congratulations on being named "Wikimedian of the Year!" We have received your letter, and are looking forward to engage with your ideas. Realistically, I'll be able to provide a more in depth response early next week.
>> best regards,
>> On Wed, Aug 1, 2018 at 2:45 PM Фархад Фаткуллин / Farkhad Fatkullin <frhd(a)yandex.com> wrote:
>>> Dear Sirs,
>>> This is from Farhad, User:frhdkazan (ComCom member, now a.k.a. 2018 Wikimedian of the Year) with some proposals about how to leverage a fleeting opportunity I got. Please read the text below this letter ASAP & think how you can help seize the moment. I'm looking forward to first ideas within 24 hours from now.
>>> Any support of yours would be greatly appreciated.
>>> Thanks a million.
>>> P.S. More on where I'm coming from can later be discovered @ http://frhd.narod.ru/resume-en.htm (dated, stopped my retainer contract with the Office of Tatarstan President in September 2013) & more up-to-date but in Russian @ http://sikzn.narod.ru/index/0-4 . I have previously translated & provided digital media with the link to Russian text of Wikimedia Blog post about Jess Wade https://ru.wikinews.org/wiki/?curid=176108 , & now contacted Asian colleagues to get similar content about Nahid Sultan & Wikimedia Bangladesh. Also on my agenda is to record a video-interview in Turkish for Wikimedia Turkey's communication campaign - I speak the language & we are in touch with Turkish community. I also need to get back to interpreting into Russian videos of Wikimania-2018 available on YouTube - maybe later this week, before I join my family for a few days off & then a week-long interpretation assignment in Singapore.
>>> Farkhad Fatkullin - Фархад Фаткуллин http://sikzn.ru/ Тел.+79274158066 / skype:frhdkazan / Wikipedia:frhdkazan
>>> Following Jimmy's unexpected pass during Wikimania-2018 Closing Ceremony, I became an instant celebrity of sort - myself alone I've seen over 40 articles with positive PR about Wikipedia & wider body of Wikimedia projects,
>>> * one by RT in Russian collecting over 1.1 million Facebook likes in under 8 hours from publication
>>> * short positive ones from Russian Federation government's Official gazette & Russkiy Mir Foundation
>>> * first ever mention that people can donate to Wikimedia Russia volunteers organization for us to be able to fund local Wiki-seminars, conferences & contests (WMF is not funding Wikimedia Russia & can't allow us to use donate link from Russian Wikipedia or place a link to "Thank you, but we don't accept donations from Russia" page where users from Russia are being routed
>>> * about to get one with links published to various WMF & Wikimedia Russia projects, including promoting free licenses, Education & GLAM programs, etc.
>>> I'm periodically collecting these @ https://tt.wikipedia.org/wiki/Википедия:Безнең_турында_матбугатта#Ел_викиме… when time permits, but there's more in my Facebook Messenger.
>>> This is also generating very positive attention on behalf of the Republic of Tatarstan government, as well as our regional NGO partner, Selet Youth Education Foundation, with whom yet unrecognized Tatar Wikimedians are jointly running Selet WikiSchool project. https://outreach.wikimedia.org/wiki/Education/News/May_2018/Selet_WikiSchool The latter want to introduce this joint initiative, as well as Wikimedia Russia executive director Stanislav Kozlovsky (Ph.D. in Psychology, Researcher & a Senior lecturer at Moscow State University) and myself to the President of Tatarstan during his visit to their Annual International Forum of Tatar-speaking High-School Age Youth on August 6th. Unrelated to that I was yesterday called up by a good acquaintance of mine (Tatarstan Vice Prime-Minister, Minister for IT https://en.wikipedia.org/wiki/Roman_Shaykhutdinov ), who invited me to have a talk with him tomorrow about how promoting Wikimedia projects in wider Tatar-speaking world (only about 25% of Tatars live in Tatarstan) can help him get whatever current President wants him to do to develop Tatar language use online. There are some other Tatarstan Prime-Minister Office level things in the pipeline, & I am also waiting for a call from Russian Federation ex-IT Minister https://en.wikipedia.org/wiki/Nikolay_Nikiforov , whom I know from his years in Tatarstan government as well (until 2012). FYI: Wikimedia movement is still unknown in Russia, with local mass media outlets are predominantly speaking about Wikipedia in Russian when there is some lapse that can be exploited to show one's Russian patriotism & thus score some points (basically same political circus, as what we are seeing in U.S. with Russia-collusion story in the overdrive, just opposite direction), and here we get something to reverse the situation big time, explaining to the wide public that it's not only Ok, but even desirable they engage with Wikimedia projects.
>>> Keeping in mind that Tatarstan President is the head of both Russia-Islamic World Strategic Vision Group & Association of Innovative Regions of Russia, Jimmy gave me a-hell-of-an-opportunity for an elevator pitch for Wikimedia movement in Russia and some adjacent counties (think Turkey & Central Asia), and I would really hate seeing it go unused. I will do my part here (about CC-BY for regional government controlled websites, Wikipedia Education Program as an extracurricular activity in all secondary school & universities, GLAM, as well as some locally funded carrots for participants), but I also want your help in driving this to a home-run. Every fall Tatarstan President & the delegation is visiting Silicon Valley & other places of interest in U.S. & it would be great if we could set-up a physical visit to Wikimedia Foundation, with a first hand presentation of best international practices the movement if proud of & have some MoU on cooperation signed (as a bureaucratic basis for continuing the conversation) with either some department in Tatarstan government, just like the one you signed with the Mexican Ministry of Culture www.eluniversal.com.mx/cultura/secretaria-de-cultura-y-wikimedia-firman-con…. To set the context, Tatarstan has such formal documents with a number of American companies & institutions, with our private English-speaking IT Univesity @ https://en.wikipedia.org/wiki/Innopolis having been developed jointly with Carnegie Mellon University (with consultancy fees paid by Tatastan), we have Russia's first publicly funded hospital certified to be meeting https://en.wikipedia.org/wiki/Joint_Commission requirements & plenty of U.S. investors @ our https://en.wikipedia.org/wiki/Alabuga_Special_Economic_Zone . On top of this, we could benefit from having Selet Youth Educational movement http://selet.biz/en/ to embrace Wiki even further, so maybe a similar MoU on collaboration between them and Wikiedu.org signed simultaneously would be great (I met LiAnna and Jami in Montreal last year & was really impressed with what these guys are doing).
>>> Please shoot something my way before the same time tomorrow, for me to handle the meeting with Tatarstan Deputy Prime Minister more effectively to progressively open other opportunities I've touched on. On top of what I described above & my last year's ideas @ https://meta.wikimedia.org/wiki/User:Frhdkazan/Wiki4RegionalDevt (in Russian), I'll be bringing to the table topics of:
>>> * Wikidata
>>> * OSM, as well as a
>>> * WOK-like project for all things in Tatar (I was contacted by a local 9th grader who, on his own initiative with a help of a teacher, did something similar for students who want to train for Russian-language SAT/ACT type exam www.kakprav.com & now offered to develop such or bigger thing for Tatar). For more on WOK, see https://www.vanguardngr.com/2018/01/wikipedia-wok-seek-nigerian-content-onl…
>>> * whatever else you or anybody else can help me think of until then & or later opportunities
>> Nichole Saad
>> Wikimedia Foundation | Senior Program Manager, Education
>> user: NSaad (WMF)
-------- Конец пересылаемого сообщения --------
Farkhad Fatkullin - Фархад Фаткуллин http://sikzn.ru/ Тел.+79274158066 / skype:frhdkazan / Wikipedia:frhdkazan
I was asked by a volunteer for help getting stats on the gender gap in
content on a certain Wikipedia, and came up with simple Wikidata Query
Service queries that pulled the total number of articles on a given
Wikipedia about men and about women, to calculate *the proportion of
articles about women out of all articles about humans*.
Then I was curious about how that wiki compared to other wikis, so I ran
the queries on a bunch of languages, and gathered the results into a table,
(please see the *caveat* there.)
I don't have time to fully write-up everything I find interesting in those
results, but I will quickly point out the following:
1. The Nepali statistic is simply astonishing! There must be a story
there. I'm keen on learning more about this, if anyone can shed light.
2. Evidently, ~13%-17% seems like a robust average of the proportion of
articles about women among all biographies.
3. among the top 10 largest wikis, Japanese is the least imbalanced. Good
job, Japanese Wikipedians! I wonder if you have a good sense of what
drives this relatively better balance. (my instinctive guess is pop culture
4. among the top 10 largest wikis, Russian is the most imbalanced.
5. I intend to re-generate these stats every two months or so, to
eventually have some sense of trends and changes.
6. Your efforts, particularly on small-to-medium wikis, can really make a
dent in these numbers! For example, it seems I am personally
responsible for almost 1% of the coverage of women on Hebrew Wikipedia!
7. I encourage you to share these numbers with your communities. Perhaps
you'd like to overtake the wiki just above yours? :)
8. I'm happy to add additional languages to the table, by request. Or you
can do it yourself, too. :)
 Yay #100wikidays :) https://meta.wikimedia.org/wiki/100wikidays
Wikimedia Foundation <http://www.wikimediafoundation.org>
Imagine a world in which every single human being can freely share in the
sum of all knowledge. Help us make it a reality!
As I mentioned in an earlier thread , we will be running reader
surveys across a number of Wikipedia languages to learn about the
reader needs and motivations in these languages as well as some of
their demographic information (and perhaps the correlations between
demographics and user motivations and characteristics).
If your language community is interested to have statistics on the
distribution of reader gender, age, education, native language, and
geographic region (rural/urban) in your language (and depending on how
much data we collect in your language, perhaps more insights), this is
your chance to indicate interest at:
I initially communicated 2019-02-15 as the deadline to sign up. Since
then, we have run a pilot test on enwiki and we are investigating some
of the results to see if any changes in the survey questions are
needed. You have now time until 2019-03-15 to indicate interest.
As always: this call is primarily a service to your language
community. If you like it, take action on it. If you don't, no action
is needed. :)
In an attempt to move the discussion on from unprofitable and
inappropriate speculations about information shared in confidence,
let's look at one of the aspects that is made public. When the WMF
issues a WMF Global Ban in line with
https://meta.wikimedia.org/wiki/WMF_Global_Ban_Policy it has been in
the habit of doing so by login identity or pseudonym as at
This makes perfect sense in terms of blocking users from logging in,
but the bans are not only issued against individuals personally rather
than specific account names ("A Foundation global ban is placed
against an individual instead of against a specific username") but
applies to real-world activities such as events and meetings ("as well
as any in-person events hosted, sponsored or funded by the
Foundation") for which people tyoically register and pay under a real
Has the time not come to for WMF Global Bans to name people under
their real names, where known? In answer to one likely objection:
this is not outing, since that applies only to members of the
Wikimedia community. People subject to WMF Global Bans are no longer
members of that community: the ban pernamentaly and irrevocably
removes them from membership ("Foundation global bans are final; they
are not appealable, not negotiable and not reversible.").
June 30, 2019
Wikimedia District of Columbia is deeply concerned by recent events that
have occurred on the English Wikipedia, including community controversy
regarding a ban imposed by the Wikimedia Foundation.
Protecting editors from harassment is crucial to the continued success of
the Wikimedia movement. Many of us have been targets of harassment as a
result of our contributions to the Wikimedia projects, and have witnessed
harassment of our colleagues, and we are grateful to the Wikimedia
Foundation's Trust & Safety team for their support in those incidents.
We make no judgement on the case at the center of the current controversy
as the Foundation—as per long-standing practice to protect the privacy of
all concerned—did not identify the specifics of the behavior publicly. We
are not endorsing or opposing a specific case, policy, or process. However,
in light of these events, we publicly affirm our support for the following
- We support the Wikimedia Foundation's efforts in general to make the
English Wikipedia welcoming and accessible to people of all backgrounds and
- We believe there are circumstances where the Wikimedia Foundation
when it is necessary to protect people of all backgrounds and gender
- We support collaboration between the Foundation and the English
Wikipedia community to inform the policies and processes surrounding these
- We oppose the use of discriminatory, racist, and homophobic language
in all Wikimedia discussions, and encourage the community to avoid it,
regardless of context or intent.
Board of Directors
Wikimedia District of Columbia
Please keep in mind the plausible scenario that one or more people
contacted T & S, and asserted that editor X is extremely distressed about
harassment arriving from on wiki edits. Fram can be literally telling the
truth when they say that they are unaware of any off wiki commincations,
while T&S Is in possession of information that cannot simply be summarized
as on wiki or off wiki. It may well be off wiki real life observations but
not related to any off wiki communications involving Fram.