This paper (first reference) is the result of a class project I was part of
almost two years ago for CSCI 5417 Information Retrieval Systems. It builds
on a class project I did in CSCI 5832 Natural Language Processing and which
I presented at Wikimania '07. The project was very late as we didn't send
the final paper in until the day before new years. This technical report was
never really announced that I recall so I thought it would be interesting to
look briefly at the results. The goal of this paper was to break articles
down into surface features and latent features and then use those to study
the rating system being used, predict article quality and rank results in a
search engine. We used the [[random forests]] classifier which allowed us to
analyze the contribution of each feature to performance by looking directly
at the weights that were assigned. While the surface analysis was performed
on the whole english wikipedia, the latent analysis was performed on the
simple english wikipedia (it is more expensive to compute). = Surface
features = * Readability measures are the single best predictor of quality
that I have found, as defined by the Wikipedia Editorial Team (WET). The
[[Automated Readability Index]], [[Gunning Fog Index]] and [[Flesch-Kincaid
Grade Level]] were the strongest predictors, followed by length of article
html, number of paragraphs, [[Flesh Reading Ease]], [[Smog Grading]], number
of internal links, [[Laesbarhedsindex Readability Formula]], number of words
and number of references. Weakly predictive were number of to be's, number
of sentences, [[Coleman-Liau Index]], number of templates, PageRank, number
of external links, number of relative links. Not predictive (overall - see
the end of section 2 for the per-rating score breakdown): Number of h2 or
h3's, number of conjunctions, number of images*, average word length, number
of h4's, number of prepositions, number of pronouns, number of interlanguage
links, average syllables per word, number of nominalizations, article age
(based on page id), proportion of questions, average sentence length. :*
Number of images was actually by far the single strongest predictor of any
class, but only for Featured articles. Because it was so good at picking out
featured articles and somewhat good at picking out A and G articles the
classifier was confused in so many cases that the overall contribution of
this feature to classification performance is zero. :* Number of external
links is strongly predictive of Featured articles. :* The B class is highly
distinctive. It has a strong "signature," with high predictive value
assigned to many features. The Featured class is also very distinctive. F, B
and S (Stop/Stub) contain the most information.
:* A is the least distinct class, not being very different from F or G. =
Latent features = The algorithm used for latent analysis, which is an
analysis of the occurence of words in every document with respect to the
link structure of the encyclopedia ("concepts"), is [[Latent Dirichlet
Allocation]]. This part of the analysis was done by CS PhD student Praful
Mangalath. An example of what can be done with the result of this analysis
is that you provide a word (a search query) such as "hippie". You can then
look at the weight of every article for the word hippie. You can pick the
article with the largest weight, and then look at its link network. You can
pick out the articles that this article links to and/or which link to this
article that are also weighted strongly for the word hippie, while also
contributing maximally to this articles "hippieness". We tried this query in
our system (LDA), Google (site:en.wikipedia.org hippie), and the Simple
English Wikipedia's Lucene search engine. The breakdown of articles occuring
in the top ten search results for this word for those engines is: * LDA
only: [[Acid rock]], [[Aldeburgh Festival]], [[Anne Murray]], [[Carl
Radle]], [[Harry Nilsson]], [[Jack Kerouac]], [[Phil Spector]], [[Plastic
Ono Band]], [[Rock and Roll]], [[Salvador Allende]], [[Smothers brothers]],
[[Stanley Kubrick]]. * Google only: [[Glam Rock]], [[South Park]]. * Simple
only: [[African Americans]], [[Charles Manson]], [[Counterculture]], [[Drug
use]], [[Flower Power]], [[Nuclear weapons]], [[Phish]], [[Sexual
liberation]], [[Summer of Love]] * LDA & Google & Simple: [[Hippie]],
[[Human Be-in]], [[Students for a democratic society]], [[Woodstock
festival]] * LDA & Google: [[Psychedelic Pop]] * Google & Simple: [[Lysergic
acid diethylamide]], [[Summer of Love]] ( See the paper for the articles
produced for the keywords philosophy and economics ) = Discussion /
Conclusion = * The results of the latent analysis are totally up to your
perception. But what is interesting is that the LDA features predict the WET
ratings of quality just as well as the surface level features. Both feature
sets (surface and latent) both pull out all almost of the information that
the rating system bears. * The rating system devised by the WET is not
distinctive. You can best tell the difference between, grouped together,
Featured, A and Good articles vs B articles. Featured, A and Good articles
are also quite distinctive (Figure 1). Note that in this study we didn't
look at Start's and Stubs, but in earlier paper we did. :* This is
interesting when compared to this recent entry on the YouTube blog. "Five
Stars Dominate Ratings"
I think a sane, well researched (with actual subjects) rating system
well within the purview of the Usability Initiative. Helping people find and
create good content is what Wikipedia is all about. Having a solid rating
system allows you to reorganized the user interface, the Wikipedia
namespace, and the main namespace around good content and bad content as
needed. If you don't have a solid, information bearing rating system you
don't know what good content really is (really bad content is easy to spot).
:* My Wikimania talk was all about gathering data from people about articles
and using that to train machines to automatically pick out good content. You
ask people questions along dimensions that make sense to people, and give
the machine access to other surface features (such as a statistical measure
of readability, or length) and latent features (such as can be derived from
document word occurence and encyclopedia link structure). I referenced page
262 of Zen and the Art of Motorcycle Maintenance to give an example of the
kind of qualitative features I would ask people. It really depends on what
features end up bearing information, to be tested in "the lab". Each word is
an example dimension of quality: We have "*unity, vividness, authority,
economy, sensitivity, clarity, emphasis, flow, suspense, brilliance,
precision, proportion, depth and so on.*" You then use surface and latent
features to predict these values for all articles. You can also say, when a
person rates this article as high on the x scale, they also mean that it has
has this much of these surface and these latent features.
= References =
- DeHoust, C., Mangalath, P., Mingus., B. (2008). *Improving search in
Wikipedia through quality and concept discovery*. Technical Report.
- Rassbach, L., Mingus., B, Blackford, T. (2007). *Exploring the
feasibility of automatically rating online article quality*. Technical
I have asked and received permission to forward to you all this most
excellent bit of news.
The linguist list, is a most excellent resource for people interested in the
field of linguistics. As I mentioned some time ago they have had a funding
drive and in that funding drive they asked for a certain amount of money in
a given amount of days and they would then have a project on Wikipedia to
learn what needs doing to get better coverage for the field of linguistics.
What you will read in this mail that the total community of linguists are
asked to cooperate. I am really thrilled as it will also get us more
linguists interested in what we do. My hope is that a fraction will be
interested in the languages that they care for and help it become more
relevant. As a member of the "language prevention committee", I love to get
more knowledgeable people involved in our smaller projects. If it means that
we get more requests for more projects we will really feel embarrassed with
all the new projects we will have to approve because of the quality of the
Incubator content and the quality of the linguistic arguments why we should
approve yet another language :)
NB Is this not a really clever way of raising money; give us this much in
this time frame and we will then do this as a bonus...
---------- Forwarded message ----------
From: LINGUIST Network <linguist(a)linguistlist.org>
Date: Jun 18, 2007 6:53 PM
Subject: 18.1831, All: Call for Participation: Wikipedia Volunteers
LINGUIST List: Vol-18-1831. Mon Jun 18 2007. ISSN: 1068 - 4875.
Subject: 18.1831, All: Call for Participation: Wikipedia Volunteers
Moderators: Anthony Aristar, Eastern Michigan U <aristar(a)linguistlist.org>
Helen Aristar-Dry, Eastern Michigan U <hdry(a)linguistlist.org>
Reviews: Laura Welcher, Rosetta Project
The LINGUIST List is funded by Eastern Michigan University,
and donations from subscribers and publishers.
Editor for this issue: Ann Sawyer <sawyer(a)linguistlist.org>
To post to LINGUIST, use our convenient web form at
From: Hannah Morales < hannah(a)linguistlist.org >
Subject: Wikipedia Volunteers
-------------------------Message 1 ----------------------------------
Date: Mon, 18 Jun 2007 12:49:35
From: Hannah Morales < hannah(a)linguistlist.org >
Subject: Wikipedia Volunteers
As you may recall, one of our Fund Drive 2007 campaigns was called the
"Wikipedia Update Vote." We asked our viewers to consider earmarking their
donations to organize an update project on linguistics entries in the
English-language Wikipedia. You can find more background information on this
The speed with which we met our goal, thanks to the interest and generosity
our readers, was a sure sign that the linguistics community was enthusiastic
about the idea. Now that summer is upon us, and some of you may have a bit
leisure time, we are hoping that you will be able to help us get started on
Wikipedia project. The LINGUIST List's role in this project is a purely
organizational one. We will:
*Help, with your input, to identify major gaps in the Wikipedia materials or
pages that need improvement;
*Compile a list of linguistics pages that Wikipedia editors have identified
"in need of attention from an expert on the subject" or " does not cite any
references or sources," etc;
*Send out periodical calls for volunteer contributors on specific topics or
*Provide simple instructions on how to upload your entries into Wikipedia;
*Keep track of our project Wikipedians;
*Keep track of revisions and new entries;
*Work with Wikimedia Foundation to publicize the linguistics community's
We hope you are as enthusiastic about this effort as we are. Just to help us
get started looking at Wikipedia more critically, and to easily identify an
needing improvement, we suggest that you take a look at the List of
Many people are not listed there; others need to have more facts and
added. If you would like to participate in this exciting update effort,
respond by sending an email to LINGUIST Editor Hannah Morales at
hannah(a)linguistlist.org, suggesting what your role might be or which
entries you feel should be updated or added. Some linguists who saw our
on the Internet have already written us with specific suggestions, which we
share with you soon.
This update project will take major time and effort on all our parts. The
result will be a much richer internet resource of information on the breadth
depth of the field of linguistics. Our efforts should also stimulate
students to consider studying linguistics and to educate a wider public on
we do. Please consider participating.
Editor, Wikipedia Update Project
Linguistic Field(s): Not Applicable
LINGUIST List: Vol-18-1831
>>> The people who are loudest in their demands for consensus
>>> do not represent the Wikimedia movement.
>> The voices loudest for the WMF doing something against the
>> Trump administration are not representative of the Wikimedia
>> movement either....
> Is the Community Process Steering Committee currently
> prepared to "engage more 'quiet' members of our community"
> with a statistically robust snap survey to resolve this question?
Anyone can go to Recent Changes and send a SurveyMonkey link to the
most recent few hundred editors with contributions at least a year
old, to get an accurate answer.
Will a respected member of the community please do this? I would like
to know what the actual editing community thinks of the travel ban and
their idea of an appropriate response. I don't want to see community
governance by opt-in participation in obscure RFCs.
I would offer to do this myself, but I value keeping my real name
unassociated with my enwiki userid.
This is a request for your input and possible ideas (if any) regarding my management of the fallout from Jimmy's announcing myself as a 2018 Wikimedian of the Year.
Emails below are a copy of my ongoing consultations with Wikimedia Foundation staff and other Wikimedians I personally know, as well as a report on what's already brewing in my region of Russia after this unexpected outcome.
I would be grateful, if you can advise me on how to properly steer the enthusiasm of behalf of regional government, mass-media, NGOs, etc. which have just discovered about the possibility of participation in Wikimedia movement (think anything from U.S. is not getting much in-depth coverage in Russian by sources that regional public figures, NGOs, teachers or general regional journalists read) & are now placing great hopes on teaching whole of Tatarstan about how to Wiki & also engaging all Tatars globally (3/4 outside of Tatarstan, 1/5 outside of Russia).
* Selet WikiSchool got presented to wider public @ the press-conference for Tatar-speaking journalists (July 31), at the poster session in the framework of the World Congress of Tatars Youth Forum (Aug.3), and later at a Tatar projects fair in the Downtown park of Kazan (Aug.5)
* Contribution of those writing into Wikipedia in Tatar was recognized by choosing myself as one of the current year's "For the great service to Tatar nation" medal recipients. (Aug.3)
* We recorded a 40 min interview in Turkish (Aug.4, see https://www.youtube.com/watch?v=HK3tFBWMWcs )
* We didn't yet meet the President of the Republic (his schedule changed once again), but I got another firm request to meet with the Minister for Youth Affairs (another good acquaintance of mine, ex-member of local Comedy Club type activity), as well as with the Head of World Congress of Tatars' Executive Bureau, and more TV & written press interviews are lining up.
* I am trying to manage these guys' optimism & desire to move quickly, keeping in mind they are not really familiar with the Wiki-way or our communities' policies & practices
* Bashkir (ba) & Sakha (sah) communities have shared advice about how to assure that local Wikimedia community stays clearly independent in a cultural environment where neither Education, nor GLAM programs will attract local partners unless those have a strong support of the regional or federal government entities.
* Looking forward to meet with my Wikimania-2017 roommate from the Philippines in Singapore on Aug.16 to collect some more input from Asia
-------- Пересылаемое сообщение--------
03.08.2018, 19:35, "Фархад Фаткуллин / Farkhad Fatkullin" <frhd(a)yandex.com>:
Thursday, 2 August 2018
Republic of Tatarstan Ministry for Informatization & Communication @ IT-park, Kazan
Meeting with Tatarstan Deputy Prime Minister - Minister for Informatization & Communications Roman Shaykhutdinov
on opportunities Wikimedia projects & popularization thereof among the population of the Republic can bring for Tatarstan
* Almaz A. Valiullin, Director General of Tatarstan Center for Information Technologies http://mic.tatarstan.ru/eng/valiullin.htm
* Tatyana S. Kamaletdinova, CEO of Tatarstan Center for Information Technologies http://mic.tatarstan.ru/eng/kamaletdinova.htm
* Anna V. Yakovleva, Head of Ministry's Press-Service http://mic.tatarstan.ru/rus/about/structure_new?department_id=80576
* Farkhad N. Fatkullin, 2018 Wikimedian of the Year, Wikimedia Russia member https://meta.wikimedia.org/wiki/User:Frhdkazan
1. Content creation contests
2. Education projects
3. How can these be organized in a systemic way, engaging students throughout all municipal districts of Tatarstan & Tatar diaspora globally, and assessing effectiveness of measures
4. Larger Tatar language internet development, promoting content creation in the language
5. .tatar domain
7. Generating lists for content creation
8. Free licenses
10. Regional educational Wiki-seminar for beginners, introduction & orientation
11. Smartphone oriented educational projects around Tatar & Tatar Wikipedia (beginning with Tatar version of www.kakprav.com & on to WOK Master type )
12. OSM in Tatar
13. The way forward: visiting WMF to discover other opportunities, MoU with WMF?
1. Almaz Valiullin to head the Working Group on behalf of the Ministry
2. Next meeting set for / around 20th of August
3. Minister requested his staff to draft documents regarding moving all Regional & Municipal budget funded websites to CC-BY type free licenses
4. Minister proposed every organization website place a link to a Wikipedia article about it (in Tatar, Russian or English, depending on the language used)
5. Help is requested about developing simple samples or article creation guidelines for various institutions of Tatarstan & diaspora organizations to learn what is considered appropriate by Wiki-community & how should representatives of respective entities
6. Annual Tatar Internet Awards Ceremony to include awards to leading editors of Wikimedia projects in Tatar
7. Help is requested to develop a framework of Wikipedia Article Contests for Secondary School Children of Tatastan, with article quality assessment schemes (prizes & organization to be funded by Tatarstan goverment, promotion in local mass-media, schools & etc.)
8. Help is requested to develop a framework for a sustainable functioning of a Tatarstan & Tatar language oriented Wikimedia thematic organization to be responsible for systemic work around developing & promoting Wikimedia projects in Tatar, as well as Tatarstan-oriented content creation for various Wikis
9. Readiness to consider awarding *.tatar domain names to those, who develop attractive projects in Tatar that can't be hosted on Wikimedia platform or otherwise need a Tatar digital identity
10. Help is requested regarding drafting a program proposal, necessary budget & expected results for a Tatarstan oriented introductory Wiki-seminar, open for a wide public
11. Help is requested to provide links to Wikidata educational materials, Practical Use Cases & expert responses to the Deputy Minister (in English)
12. Help is requested in organizing a learning visit to WMF Headquarters during Tatarstan Delegation annual fall visit to Silicon Valley & other places in U.S. to discover other opportunities
Farkhad Fatkullin - Фархад Фаткуллин http://sikzn.ru/ Тел.+79274158066 / skype:frhdkazan / Wikipedia:frhdkazan
03.08.2018, 11:42, "Фархад Фаткуллин / Farkhad Fatkullin" <frhd(a)yandex.com>:
> Hi Nochole & Kui,
> Thank you for your responses & readiness to help.
> Events on the ground are developing following predictable course - we had a very constructive 3 hour long discussion last night with Tatarstan's Deputy Prime Minister (IT minister) & his team, whilst earlier Thursday I got an invite to meet with another Deputy Prime Minister (President of the National Council of Trustees for the World Congress of Tatars Association).
> More TV & magazine interviews are expected, some in detail articles in Russian promoting Wikimedia projects are in the pipelines (expecting to see the texts for necessary corrections & adding links to respective WMRU, Meta, WMF & related independent media articles (thanks to ComCom).
> I hope to find time to prepare & email both you (in English) & IT ministry's Working Group (in Russian) my summary of yesterday late evening meeting, with attendees, topics discussed, request & proposals, etc. They want to move as fast as possible, because this will help them to meet promoting Tatar language use online task they are charged with by the region's President.
> Just a few to begin with:
> * Ready to move all regional and municipal government websites into CC-BY, add links to respective WP articles from all these
> * Willing to have Tatarstan & Tatar language oriented thematic organization in place to work with secondary schools & Universities, GLAMS, diaspora
> * Ready to sponsor prizes for article writing contests (either through WMRU or this new entity)
> * Requesting guidance from myself & Wikimedia community on what's the best way to make this all successful
> * Interested in supporting WMRU organized general public educational Wiki-Seminar in Tatarstan Academy of Sciences or other respected facility with a large conference hall open to all public (September-October?)
> * Willing to visit WMF headquarters in November to meet, learn more about what's there (I admit I don't read all mailing lists or Meta discussions, don't have time to see all wonderful YouTube videos available, haven't yet have time to visit a single Wikimedia Conference) & possibly sign MoUs or whatever that would help speed up Wikimedia acceptance & popularity growth among the population of the region.
> I am in touch with Wikimedia RU & Wikimedia Languages of Russia Community's regarding ongoing developments, benefiting from collective wisdom of Wikimedians I know.
> Should you be ready to bring any ideas to the table, please shoot them my way.
> Farkhad Fatkullin - Фархад Фаткуллин http://sikzn.ru/ Тел.+79274158066 / skype:frhdkazan / Wikipedia:frhdkazan
> 02.08.2018, 16:35, "Nichole Saad" <nsaad(a)wikimedia.org>:
>> Hi Farhad,
>> First, congratulations on being named "Wikimedian of the Year!" We have received your letter, and are looking forward to engage with your ideas. Realistically, I'll be able to provide a more in depth response early next week.
>> best regards,
>> On Wed, Aug 1, 2018 at 2:45 PM Фархад Фаткуллин / Farkhad Fatkullin <frhd(a)yandex.com> wrote:
>>> Dear Sirs,
>>> This is from Farhad, User:frhdkazan (ComCom member, now a.k.a. 2018 Wikimedian of the Year) with some proposals about how to leverage a fleeting opportunity I got. Please read the text below this letter ASAP & think how you can help seize the moment. I'm looking forward to first ideas within 24 hours from now.
>>> Any support of yours would be greatly appreciated.
>>> Thanks a million.
>>> P.S. More on where I'm coming from can later be discovered @ http://frhd.narod.ru/resume-en.htm (dated, stopped my retainer contract with the Office of Tatarstan President in September 2013) & more up-to-date but in Russian @ http://sikzn.narod.ru/index/0-4 . I have previously translated & provided digital media with the link to Russian text of Wikimedia Blog post about Jess Wade https://ru.wikinews.org/wiki/?curid=176108 , & now contacted Asian colleagues to get similar content about Nahid Sultan & Wikimedia Bangladesh. Also on my agenda is to record a video-interview in Turkish for Wikimedia Turkey's communication campaign - I speak the language & we are in touch with Turkish community. I also need to get back to interpreting into Russian videos of Wikimania-2018 available on YouTube - maybe later this week, before I join my family for a few days off & then a week-long interpretation assignment in Singapore.
>>> Farkhad Fatkullin - Фархад Фаткуллин http://sikzn.ru/ Тел.+79274158066 / skype:frhdkazan / Wikipedia:frhdkazan
>>> Following Jimmy's unexpected pass during Wikimania-2018 Closing Ceremony, I became an instant celebrity of sort - myself alone I've seen over 40 articles with positive PR about Wikipedia & wider body of Wikimedia projects,
>>> * one by RT in Russian collecting over 1.1 million Facebook likes in under 8 hours from publication
>>> * short positive ones from Russian Federation government's Official gazette & Russkiy Mir Foundation
>>> * first ever mention that people can donate to Wikimedia Russia volunteers organization for us to be able to fund local Wiki-seminars, conferences & contests (WMF is not funding Wikimedia Russia & can't allow us to use donate link from Russian Wikipedia or place a link to "Thank you, but we don't accept donations from Russia" page where users from Russia are being routed
>>> * about to get one with links published to various WMF & Wikimedia Russia projects, including promoting free licenses, Education & GLAM programs, etc.
>>> I'm periodically collecting these @ https://tt.wikipedia.org/wiki/Википедия:Безнең_турында_матбугатта#Ел_викиме… when time permits, but there's more in my Facebook Messenger.
>>> This is also generating very positive attention on behalf of the Republic of Tatarstan government, as well as our regional NGO partner, Selet Youth Education Foundation, with whom yet unrecognized Tatar Wikimedians are jointly running Selet WikiSchool project. https://outreach.wikimedia.org/wiki/Education/News/May_2018/Selet_WikiSchool The latter want to introduce this joint initiative, as well as Wikimedia Russia executive director Stanislav Kozlovsky (Ph.D. in Psychology, Researcher & a Senior lecturer at Moscow State University) and myself to the President of Tatarstan during his visit to their Annual International Forum of Tatar-speaking High-School Age Youth on August 6th. Unrelated to that I was yesterday called up by a good acquaintance of mine (Tatarstan Vice Prime-Minister, Minister for IT https://en.wikipedia.org/wiki/Roman_Shaykhutdinov ), who invited me to have a talk with him tomorrow about how promoting Wikimedia projects in wider Tatar-speaking world (only about 25% of Tatars live in Tatarstan) can help him get whatever current President wants him to do to develop Tatar language use online. There are some other Tatarstan Prime-Minister Office level things in the pipeline, & I am also waiting for a call from Russian Federation ex-IT Minister https://en.wikipedia.org/wiki/Nikolay_Nikiforov , whom I know from his years in Tatarstan government as well (until 2012). FYI: Wikimedia movement is still unknown in Russia, with local mass media outlets are predominantly speaking about Wikipedia in Russian when there is some lapse that can be exploited to show one's Russian patriotism & thus score some points (basically same political circus, as what we are seeing in U.S. with Russia-collusion story in the overdrive, just opposite direction), and here we get something to reverse the situation big time, explaining to the wide public that it's not only Ok, but even desirable they engage with Wikimedia projects.
>>> Keeping in mind that Tatarstan President is the head of both Russia-Islamic World Strategic Vision Group & Association of Innovative Regions of Russia, Jimmy gave me a-hell-of-an-opportunity for an elevator pitch for Wikimedia movement in Russia and some adjacent counties (think Turkey & Central Asia), and I would really hate seeing it go unused. I will do my part here (about CC-BY for regional government controlled websites, Wikipedia Education Program as an extracurricular activity in all secondary school & universities, GLAM, as well as some locally funded carrots for participants), but I also want your help in driving this to a home-run. Every fall Tatarstan President & the delegation is visiting Silicon Valley & other places of interest in U.S. & it would be great if we could set-up a physical visit to Wikimedia Foundation, with a first hand presentation of best international practices the movement if proud of & have some MoU on cooperation signed (as a bureaucratic basis for continuing the conversation) with either some department in Tatarstan government, just like the one you signed with the Mexican Ministry of Culture www.eluniversal.com.mx/cultura/secretaria-de-cultura-y-wikimedia-firman-con…. To set the context, Tatarstan has such formal documents with a number of American companies & institutions, with our private English-speaking IT Univesity @ https://en.wikipedia.org/wiki/Innopolis having been developed jointly with Carnegie Mellon University (with consultancy fees paid by Tatastan), we have Russia's first publicly funded hospital certified to be meeting https://en.wikipedia.org/wiki/Joint_Commission requirements & plenty of U.S. investors @ our https://en.wikipedia.org/wiki/Alabuga_Special_Economic_Zone . On top of this, we could benefit from having Selet Youth Educational movement http://selet.biz/en/ to embrace Wiki even further, so maybe a similar MoU on collaboration between them and Wikiedu.org signed simultaneously would be great (I met LiAnna and Jami in Montreal last year & was really impressed with what these guys are doing).
>>> Please shoot something my way before the same time tomorrow, for me to handle the meeting with Tatarstan Deputy Prime Minister more effectively to progressively open other opportunities I've touched on. On top of what I described above & my last year's ideas @ https://meta.wikimedia.org/wiki/User:Frhdkazan/Wiki4RegionalDevt (in Russian), I'll be bringing to the table topics of:
>>> * Wikidata
>>> * OSM, as well as a
>>> * WOK-like project for all things in Tatar (I was contacted by a local 9th grader who, on his own initiative with a help of a teacher, did something similar for students who want to train for Russian-language SAT/ACT type exam www.kakprav.com & now offered to develop such or bigger thing for Tatar). For more on WOK, see https://www.vanguardngr.com/2018/01/wikipedia-wok-seek-nigerian-content-onl…
>>> * whatever else you or anybody else can help me think of until then & or later opportunities
>> Nichole Saad
>> Wikimedia Foundation | Senior Program Manager, Education
>> user: NSaad (WMF)
-------- Конец пересылаемого сообщения --------
Farkhad Fatkullin - Фархад Фаткуллин http://sikzn.ru/ Тел.+79274158066 / skype:frhdkazan / Wikipedia:frhdkazan
I was asked by a volunteer for help getting stats on the gender gap in
content on a certain Wikipedia, and came up with simple Wikidata Query
Service queries that pulled the total number of articles on a given
Wikipedia about men and about women, to calculate *the proportion of
articles about women out of all articles about humans*.
Then I was curious about how that wiki compared to other wikis, so I ran
the queries on a bunch of languages, and gathered the results into a table,
(please see the *caveat* there.)
I don't have time to fully write-up everything I find interesting in those
results, but I will quickly point out the following:
1. The Nepali statistic is simply astonishing! There must be a story
there. I'm keen on learning more about this, if anyone can shed light.
2. Evidently, ~13%-17% seems like a robust average of the proportion of
articles about women among all biographies.
3. among the top 10 largest wikis, Japanese is the least imbalanced. Good
job, Japanese Wikipedians! I wonder if you have a good sense of what
drives this relatively better balance. (my instinctive guess is pop culture
4. among the top 10 largest wikis, Russian is the most imbalanced.
5. I intend to re-generate these stats every two months or so, to
eventually have some sense of trends and changes.
6. Your efforts, particularly on small-to-medium wikis, can really make a
dent in these numbers! For example, it seems I am personally
responsible for almost 1% of the coverage of women on Hebrew Wikipedia!
7. I encourage you to share these numbers with your communities. Perhaps
you'd like to overtake the wiki just above yours? :)
8. I'm happy to add additional languages to the table, by request. Or you
can do it yourself, too. :)
 Yay #100wikidays :) https://meta.wikimedia.org/wiki/100wikidays
Wikimedia Foundation <http://www.wikimediafoundation.org>
Imagine a world in which every single human being can freely share in the
sum of all knowledge. Help us make it a reality!
As I mentioned in an earlier thread , we will be running reader
surveys across a number of Wikipedia languages to learn about the
reader needs and motivations in these languages as well as some of
their demographic information (and perhaps the correlations between
demographics and user motivations and characteristics).
If your language community is interested to have statistics on the
distribution of reader gender, age, education, native language, and
geographic region (rural/urban) in your language (and depending on how
much data we collect in your language, perhaps more insights), this is
your chance to indicate interest at:
I initially communicated 2019-02-15 as the deadline to sign up. Since
then, we have run a pilot test on enwiki and we are investigating some
of the results to see if any changes in the survey questions are
needed. You have now time until 2019-03-15 to indicate interest.
As always: this call is primarily a service to your language
community. If you like it, take action on it. If you don't, no action
is needed. :)
Two weeks ago I sent this email to my strategy working group (resource
allocation). I didn't plan to send a public email, just to share with the
rest of the group my reason to leave and just to disappear.
I receive feedbacks with many of the group members and also requesting
permissions to transfer it with others outside of the group, which leads to
more conversations that I had around it.
Last week we had our weekly phone call, during which we discussed our
feelings and opinions about the process so far. From our long conversation
and the conversations with the others, I learned that many of these
feelings exist among the other members, as well some ideas on how to make
it easier and less demanding and at the same time publishing the
Yesterday, following a good conversation with one of the WMF's board
members about it, I was asked to share these thoughts with the movement's
list, so that it may also involve the community's feedback as well.
---------- Forwarded message ---------
From: Itzik - Wikimedia Israel <itzik(a)wikimedia.org.il>
Date: Wed, Mar 13, 2019 at 2:08 PM
Subject: I decided to leave the working group
For a long time I have been considering leaving the working group but each
time I decided to give it another chance. Yesterday, after long
consideration, I decided to write this email.
I must be honest - I was skeptical from the first moment about this
process. The huge amount of money which the board allocated to this process
together with the complicated and (very) long process planned for it - make
me doubt the ability to really have a real outcome in a reasonable time.
For the past two years, it seems to me like the strategy took over almost
every movement event and activity. I feel bad for investing millions of
dollars from our donations and uncounted hours of volunteer time into this
I also felt hypocritical in the way the foundation acts - while "freezing"
grant programs (such as APG) and holding affiliates from increasing their
programs and budgets, "because of the strategy process" while
simultaneously approving itself to increase its budget and staff year after
year by tens of percentage.
Despite my distrust of the chances of this process and the criticism I felt
for it, I instructed my organization to give it the full support we been
asked, as all our movement did. Later on, I decided to join this working
group as I felt we almost reached the final step of the process and I
wanted to help shape the recommendations. I was totally wrong.
In the first months of the workgroups, I felt it was completely wasted of
time. I saw how wonderful volunteers tried to lead the process within each
group (thank you Daria!) - but it wasn't their job, nor none of us. I felt
like I was returning to university, and every few weeks I received
instructions and homework from the lecturer, with assignments to the
following week - and in between, that we need to lead it and solve things
by ourselves. It took the core team a few months to change it and bring
external support, but even after the (right) change, it continues to feel
like I came *to work for *the strategic process, not with.
I felt like nothing happened for the past year(or years?) before the
working groups started to operate. As if we didn't have hundreds of
meetings around the world, with a total of tens of thousands of people and
an enormous amount of hours of conversations - and aside from a short few
sentences of a strategic direction, we started from scratch. A completely
>From scratch to have discussions about what this process is, definitions
and concepts. What is the problem with the current system? What are the
challenges? What people shared during the first phase? Information which
wasn't available and ready for the group, and still isn't. Eight months
after we start, the real conversation about the subject which I joined to
discuss about and help shape recommendations around it, is far, far away
from even to start.
The more I spoke to more and more people who are part of the process, I
realized that this despair is not only with me but with many. But we are a
real Wikimedians, and we are committed to the things we start. We are bad
with stopping things when they don't work or have real reviews of the
things we do when we have the belief that this is the right thing. I
completely stopped thinking it is the right thing to our movement.
Last month, in our in-person meeting in Berlin, one of the opening
activities was to sum up the number of years we were all members of the
movement. Just think about doing the same, and sum up the number of
volunteer (and staff) hours invested until now in this process. We are
talking about tens of thousands of hours of work not even taking into
consideration the huge amount of money involved.
And the end of the process is very far away.
In one of our discussions, we doubt if to include volunteers as a resource
which can be allocated. We decided at the end it can't as such, but just
try to imagine it was, and try to think about a future whatever-will-be the
resource allocation body/structure: how he would deal with the decision
whether to approve such a huge amount of volunteer time and money in the
process. Did the WMF's board even consider and discuss these resources and
how it will affect the movement during the process years? I doubt.
We tend to say that the movement newest project is WikiData. I think we may
need to start address WikiStrategy as the newest project. Just think about
what we could do with that amount of resources.
The idea to massively involve the wide community within this process was
the right decision - but the implementations from my point of view were
If the last strategy process was totally handled by outsiders - we took
this one completely to its opposite, without finding the right balance.
A strategy process is important, there is no doubt. And our movement needs
one, there is no doubt.
But a strategy process can't take over the organization' activities for
I want to warmly thank you, my teammates. It is heartwarming to see the
commitment and amazing energy of all the members of this process, and of
course, the core team which is dedicated to bringing a change. I have no
doubt that we all want to secure the future of our movement to years to
come and I don't know of such a high level of engagement and commitment
anywhere else. But at the same time, I think we should put limits to it and
reconsider it - and think how to make it shorter, lighter, less demanding
and expensive - both from the perspective of staff/volunteer time and money.
:: Apologies for cross-posting to multiple mailing lists. We want to ensure
we spread the word about this opportunity to as many people as possible. ::
We are writing today to invite you to be a part of a community review on
Wikimedia brand research and strategy.
Recently, the Wikimedia Foundation set out to better understand how the
world sees Wikimedia and Wikimedia projects as brands. We wanted to get
a sense of the general visibility of our different projects, and evaluate
public support of our mission to spread free knowledge.
We launched a global brand study to research these questions, as part of
our planning toward our 2030 strategic goals. The study was commissioned
by the Board, carried out by the brand consultancy Wolff Olins, and
directed by the Foundation’s Communications team. It collected
perspectives from the internet users of seven countries (India, China,
Nigeria, Egypt, Germany, Mexico and the US) on Wikimedia projects and
The study revealed some interesting trends:
- Awareness of Wikipedia is above 80% in Western Europe and North America.
- Awareness of Wikipedia averages above 40% in emerging markets, and is
- There is awareness of other projects, but was significantly lower. For
example, awareness of Wikisource was at 30%, Wiktionary at 25%, Wikidata at
20%, and Wikivoyage at 8%.
- There was significant confusion around the name Wikimedia. Respondents
reported they had either not heard of it, or extrapolated its relationship
- In spite of lack of awareness about Wikimedia, respondents showed a high
level of support for our mission.
Following from these research insights, the Wolff Olins team also made a
strategic suggestion to refine the Wikimedia brand system. The
- Use Wikipedia as the central movement brand rather than Wikimedia.
- Provide clearer connections to the Movement projects from Wikipedia to
drive increased awareness, usage and contributions to smaller projects.
- Retain Wikimedia project names, with the exception of Wikimedia Commons
which is recommended to be shortened to Wikicommons to be consistent with
- Explore new naming conventions for the Foundation and affiliate groups
that use Wikipedia rather than Wikimedia.
- Consider expository taglines and other naming conventions to reassert the
connections between projects (e.g. “______ - A Wikipedia project”).
This is not a new idea.
By definition, Wikimedia brands are shared among the communities who give
them meaning. So in considering this change, the Wikimedia Foundation is
collecting feedback from across our communities. Our goal is to speak with
more than 80% of affiliates and as many individual contributors as possible
before May 2019, when we will offer the Board of Trustees a summary of
We invite you to look at a project summary , the brand research ,
and the brand strategy suggestion  Wolff Olins prepared working with us.
For feedback, please add comments on the Community Review talk page  or
email brandproject(a)wikimedia.org with direct feedback. You can also use
either of these channels to request to join a group meeting.
We know this is big topic and we’re excited to hear from you!
- Zack McCune and the Wikimedia Foundation Communications department
Zack McCune (he/him)
Senior Global Brand Manager
Wikimedia Foundation <https://wikimediafoundation.org/>
Could anyone please have a look at what's going on with the Wikimedia
India blog? There has been spam for dating sites there for several
days (if not weeks).
This is not exactly obvious when you look at the main page¹, except if
you scroll down a bit, but “ctrl+f dating” in the rss feed², or in the
planet rss feed³ and you'll see it.
Thanks and best regards,
Wikimedia Sverige is proud to be the recipient of three new grants totaling
around USD 500,000. We hope to work with many of you as part of these
projects. If you are interested in getting involved or receiving updates
please let me know.
Furthermore, the chapter also has a new heavily subsidized agreement for
our office space.
Project 1: Wikispeech – The Speech Data Collector
The first project is a continuation of the Wikispeech project, a
text-to-speech (TTS) system that converts written text into speech. From
September 2019 to April 2021 we aim to finalize building the MediaWiki
extension and to build tools to collect speech data to add pronunciations
to Wikipedia, Wiktionary and Wikidata and to add more languages to the
text-to-speech solution. The tools should also be possible to use for oral
The work happens in partnership with the Royal Technical Institute, STTS (a
language processing company), Mozilla Foundation, Wikimedia Deutschland and
the Swedish Dyslexia Association.
As always, you can find the full application on our wiki (in Swedish):
Project 2: Wikipedia in Libraries
>From 2019 to 2020 Wikimedia Sverige, together with the National Library of
Sweden, will develop an online training module for Swedish librarians
focused around free knowledge and the Wikimedia platforms. This will be a
mandatory training for all of Sweden's 5,000 public librarians. Our hope is
to give all of them a basic understanding of the Wikimedia projects, as
well as to complement the online training with advanced courses for the
most dedicated. The advanced courses will give them the tools to ongoingly
organize activities and events independently at their libraries across the
Furthermore, the librarians will be engaged in the #1Lib1Ref and
There is a great potential to receive continuous funding over the coming 3
years if successful.
As always, you can find the full application on our wiki (in Swedish):
Project 3: Bibliographical data on Wikidata
We continue our work to include bibliographical data on Wikidata. The
project details are still being negotiated with the funder. The project
will start in mid-2019 and last until 2020.
Starting from March 2019 we have a new agreement in place for a heavily
subsidized coworking space office from the Swedish Internet Foundation.
Through the agreement we will save us around USD 30,000 per year compared
to when we had an office of our own.
We have received this generous subsidy because Wikipedia is considered so
important for the infrastructure of the Internet. We are very happy that
the agreement does not have an end date and that we have the possibility to
grow significantly over time as well (while keeping the generous subsidy).
Please contact John Andersson (john.andersson(a)wikimedia.se) if you have any
- - - -
Visiting address: Goto10, Hammarby Kaj 10D, 120 32 Stockholm