This paper (first reference) is the result of a class project I was part of
almost two years ago for CSCI 5417 Information Retrieval Systems. It builds
on a class project I did in CSCI 5832 Natural Language Processing and which
I presented at Wikimania '07. The project was very late as we didn't send
the final paper in until the day before new years. This technical report was
never really announced that I recall so I thought it would be interesting to
look briefly at the results. The goal of this paper was to break articles
down into surface features and latent features and then use those to study
the rating system being used, predict article quality and rank results in a
search engine. We used the [[random forests]] classifier which allowed us to
analyze the contribution of each feature to performance by looking directly
at the weights that were assigned. While the surface analysis was performed
on the whole english wikipedia, the latent analysis was performed on the
simple english wikipedia (it is more expensive to compute). = Surface
features = * Readability measures are the single best predictor of quality
that I have found, as defined by the Wikipedia Editorial Team (WET). The
[[Automated Readability Index]], [[Gunning Fog Index]] and [[Flesch-Kincaid
Grade Level]] were the strongest predictors, followed by length of article
html, number of paragraphs, [[Flesh Reading Ease]], [[Smog Grading]], number
of internal links, [[Laesbarhedsindex Readability Formula]], number of words
and number of references. Weakly predictive were number of to be's, number
of sentences, [[Coleman-Liau Index]], number of templates, PageRank, number
of external links, number of relative links. Not predictive (overall - see
the end of section 2 for the per-rating score breakdown): Number of h2 or
h3's, number of conjunctions, number of images*, average word length, number
of h4's, number of prepositions, number of pronouns, number of interlanguage
links, average syllables per word, number of nominalizations, article age
(based on page id), proportion of questions, average sentence length. :*
Number of images was actually by far the single strongest predictor of any
class, but only for Featured articles. Because it was so good at picking out
featured articles and somewhat good at picking out A and G articles the
classifier was confused in so many cases that the overall contribution of
this feature to classification performance is zero. :* Number of external
links is strongly predictive of Featured articles. :* The B class is highly
distinctive. It has a strong "signature," with high predictive value
assigned to many features. The Featured class is also very distinctive. F, B
and S (Stop/Stub) contain the most information.
:* A is the least distinct class, not being very different from F or G. =
Latent features = The algorithm used for latent analysis, which is an
analysis of the occurence of words in every document with respect to the
link structure of the encyclopedia ("concepts"), is [[Latent Dirichlet
Allocation]]. This part of the analysis was done by CS PhD student Praful
Mangalath. An example of what can be done with the result of this analysis
is that you provide a word (a search query) such as "hippie". You can then
look at the weight of every article for the word hippie. You can pick the
article with the largest weight, and then look at its link network. You can
pick out the articles that this article links to and/or which link to this
article that are also weighted strongly for the word hippie, while also
contributing maximally to this articles "hippieness". We tried this query in
our system (LDA), Google (site:en.wikipedia.org hippie), and the Simple
English Wikipedia's Lucene search engine. The breakdown of articles occuring
in the top ten search results for this word for those engines is: * LDA
only: [[Acid rock]], [[Aldeburgh Festival]], [[Anne Murray]], [[Carl
Radle]], [[Harry Nilsson]], [[Jack Kerouac]], [[Phil Spector]], [[Plastic
Ono Band]], [[Rock and Roll]], [[Salvador Allende]], [[Smothers brothers]],
[[Stanley Kubrick]]. * Google only: [[Glam Rock]], [[South Park]]. * Simple
only: [[African Americans]], [[Charles Manson]], [[Counterculture]], [[Drug
use]], [[Flower Power]], [[Nuclear weapons]], [[Phish]], [[Sexual
liberation]], [[Summer of Love]] * LDA & Google & Simple: [[Hippie]],
[[Human Be-in]], [[Students for a democratic society]], [[Woodstock
festival]] * LDA & Google: [[Psychedelic Pop]] * Google & Simple: [[Lysergic
acid diethylamide]], [[Summer of Love]] ( See the paper for the articles
produced for the keywords philosophy and economics ) = Discussion /
Conclusion = * The results of the latent analysis are totally up to your
perception. But what is interesting is that the LDA features predict the WET
ratings of quality just as well as the surface level features. Both feature
sets (surface and latent) both pull out all almost of the information that
the rating system bears. * The rating system devised by the WET is not
distinctive. You can best tell the difference between, grouped together,
Featured, A and Good articles vs B articles. Featured, A and Good articles
are also quite distinctive (Figure 1). Note that in this study we didn't
look at Start's and Stubs, but in earlier paper we did. :* This is
interesting when compared to this recent entry on the YouTube blog. "Five
Stars Dominate Ratings"
I think a sane, well researched (with actual subjects) rating system
well within the purview of the Usability Initiative. Helping people find and
create good content is what Wikipedia is all about. Having a solid rating
system allows you to reorganized the user interface, the Wikipedia
namespace, and the main namespace around good content and bad content as
needed. If you don't have a solid, information bearing rating system you
don't know what good content really is (really bad content is easy to spot).
:* My Wikimania talk was all about gathering data from people about articles
and using that to train machines to automatically pick out good content. You
ask people questions along dimensions that make sense to people, and give
the machine access to other surface features (such as a statistical measure
of readability, or length) and latent features (such as can be derived from
document word occurence and encyclopedia link structure). I referenced page
262 of Zen and the Art of Motorcycle Maintenance to give an example of the
kind of qualitative features I would ask people. It really depends on what
features end up bearing information, to be tested in "the lab". Each word is
an example dimension of quality: We have "*unity, vividness, authority,
economy, sensitivity, clarity, emphasis, flow, suspense, brilliance,
precision, proportion, depth and so on.*" You then use surface and latent
features to predict these values for all articles. You can also say, when a
person rates this article as high on the x scale, they also mean that it has
has this much of these surface and these latent features.
= References =
- DeHoust, C., Mangalath, P., Mingus., B. (2008). *Improving search in
Wikipedia through quality and concept discovery*. Technical Report.
- Rassbach, L., Mingus., B, Blackford, T. (2007). *Exploring the
feasibility of automatically rating online article quality*. Technical
I have asked and received permission to forward to you all this most
excellent bit of news.
The linguist list, is a most excellent resource for people interested in the
field of linguistics. As I mentioned some time ago they have had a funding
drive and in that funding drive they asked for a certain amount of money in
a given amount of days and they would then have a project on Wikipedia to
learn what needs doing to get better coverage for the field of linguistics.
What you will read in this mail that the total community of linguists are
asked to cooperate. I am really thrilled as it will also get us more
linguists interested in what we do. My hope is that a fraction will be
interested in the languages that they care for and help it become more
relevant. As a member of the "language prevention committee", I love to get
more knowledgeable people involved in our smaller projects. If it means that
we get more requests for more projects we will really feel embarrassed with
all the new projects we will have to approve because of the quality of the
Incubator content and the quality of the linguistic arguments why we should
approve yet another language :)
NB Is this not a really clever way of raising money; give us this much in
this time frame and we will then do this as a bonus...
---------- Forwarded message ----------
From: LINGUIST Network <linguist(a)linguistlist.org>
Date: Jun 18, 2007 6:53 PM
Subject: 18.1831, All: Call for Participation: Wikipedia Volunteers
LINGUIST List: Vol-18-1831. Mon Jun 18 2007. ISSN: 1068 - 4875.
Subject: 18.1831, All: Call for Participation: Wikipedia Volunteers
Moderators: Anthony Aristar, Eastern Michigan U <aristar(a)linguistlist.org>
Helen Aristar-Dry, Eastern Michigan U <hdry(a)linguistlist.org>
Reviews: Laura Welcher, Rosetta Project
The LINGUIST List is funded by Eastern Michigan University,
and donations from subscribers and publishers.
Editor for this issue: Ann Sawyer <sawyer(a)linguistlist.org>
To post to LINGUIST, use our convenient web form at
From: Hannah Morales < hannah(a)linguistlist.org >
Subject: Wikipedia Volunteers
-------------------------Message 1 ----------------------------------
Date: Mon, 18 Jun 2007 12:49:35
From: Hannah Morales < hannah(a)linguistlist.org >
Subject: Wikipedia Volunteers
As you may recall, one of our Fund Drive 2007 campaigns was called the
"Wikipedia Update Vote." We asked our viewers to consider earmarking their
donations to organize an update project on linguistics entries in the
English-language Wikipedia. You can find more background information on this
The speed with which we met our goal, thanks to the interest and generosity
our readers, was a sure sign that the linguistics community was enthusiastic
about the idea. Now that summer is upon us, and some of you may have a bit
leisure time, we are hoping that you will be able to help us get started on
Wikipedia project. The LINGUIST List's role in this project is a purely
organizational one. We will:
*Help, with your input, to identify major gaps in the Wikipedia materials or
pages that need improvement;
*Compile a list of linguistics pages that Wikipedia editors have identified
"in need of attention from an expert on the subject" or " does not cite any
references or sources," etc;
*Send out periodical calls for volunteer contributors on specific topics or
*Provide simple instructions on how to upload your entries into Wikipedia;
*Keep track of our project Wikipedians;
*Keep track of revisions and new entries;
*Work with Wikimedia Foundation to publicize the linguistics community's
We hope you are as enthusiastic about this effort as we are. Just to help us
get started looking at Wikipedia more critically, and to easily identify an
needing improvement, we suggest that you take a look at the List of
Many people are not listed there; others need to have more facts and
added. If you would like to participate in this exciting update effort,
respond by sending an email to LINGUIST Editor Hannah Morales at
hannah(a)linguistlist.org, suggesting what your role might be or which
entries you feel should be updated or added. Some linguists who saw our
on the Internet have already written us with specific suggestions, which we
share with you soon.
This update project will take major time and effort on all our parts. The
result will be a much richer internet resource of information on the breadth
depth of the field of linguistics. Our efforts should also stimulate
students to consider studying linguistics and to educate a wider public on
we do. Please consider participating.
Editor, Wikipedia Update Project
Linguistic Field(s): Not Applicable
LINGUIST List: Vol-18-1831
>>> The people who are loudest in their demands for consensus
>>> do not represent the Wikimedia movement.
>> The voices loudest for the WMF doing something against the
>> Trump administration are not representative of the Wikimedia
>> movement either....
> Is the Community Process Steering Committee currently
> prepared to "engage more 'quiet' members of our community"
> with a statistically robust snap survey to resolve this question?
Anyone can go to Recent Changes and send a SurveyMonkey link to the
most recent few hundred editors with contributions at least a year
old, to get an accurate answer.
Will a respected member of the community please do this? I would like
to know what the actual editing community thinks of the travel ban and
their idea of an appropriate response. I don't want to see community
governance by opt-in participation in obscure RFCs.
I would offer to do this myself, but I value keeping my real name
unassociated with my enwiki userid.
This is a request for your input and possible ideas (if any) regarding my management of the fallout from Jimmy's announcing myself as a 2018 Wikimedian of the Year.
Emails below are a copy of my ongoing consultations with Wikimedia Foundation staff and other Wikimedians I personally know, as well as a report on what's already brewing in my region of Russia after this unexpected outcome.
I would be grateful, if you can advise me on how to properly steer the enthusiasm of behalf of regional government, mass-media, NGOs, etc. which have just discovered about the possibility of participation in Wikimedia movement (think anything from U.S. is not getting much in-depth coverage in Russian by sources that regional public figures, NGOs, teachers or general regional journalists read) & are now placing great hopes on teaching whole of Tatarstan about how to Wiki & also engaging all Tatars globally (3/4 outside of Tatarstan, 1/5 outside of Russia).
* Selet WikiSchool got presented to wider public @ the press-conference for Tatar-speaking journalists (July 31), at the poster session in the framework of the World Congress of Tatars Youth Forum (Aug.3), and later at a Tatar projects fair in the Downtown park of Kazan (Aug.5)
* Contribution of those writing into Wikipedia in Tatar was recognized by choosing myself as one of the current year's "For the great service to Tatar nation" medal recipients. (Aug.3)
* We recorded a 40 min interview in Turkish (Aug.4, see https://www.youtube.com/watch?v=HK3tFBWMWcs )
* We didn't yet meet the President of the Republic (his schedule changed once again), but I got another firm request to meet with the Minister for Youth Affairs (another good acquaintance of mine, ex-member of local Comedy Club type activity), as well as with the Head of World Congress of Tatars' Executive Bureau, and more TV & written press interviews are lining up.
* I am trying to manage these guys' optimism & desire to move quickly, keeping in mind they are not really familiar with the Wiki-way or our communities' policies & practices
* Bashkir (ba) & Sakha (sah) communities have shared advice about how to assure that local Wikimedia community stays clearly independent in a cultural environment where neither Education, nor GLAM programs will attract local partners unless those have a strong support of the regional or federal government entities.
* Looking forward to meet with my Wikimania-2017 roommate from the Philippines in Singapore on Aug.16 to collect some more input from Asia
-------- Пересылаемое сообщение--------
03.08.2018, 19:35, "Фархад Фаткуллин / Farkhad Fatkullin" <frhd(a)yandex.com>:
Thursday, 2 August 2018
Republic of Tatarstan Ministry for Informatization & Communication @ IT-park, Kazan
Meeting with Tatarstan Deputy Prime Minister - Minister for Informatization & Communications Roman Shaykhutdinov
on opportunities Wikimedia projects & popularization thereof among the population of the Republic can bring for Tatarstan
* Almaz A. Valiullin, Director General of Tatarstan Center for Information Technologies http://mic.tatarstan.ru/eng/valiullin.htm
* Tatyana S. Kamaletdinova, CEO of Tatarstan Center for Information Technologies http://mic.tatarstan.ru/eng/kamaletdinova.htm
* Anna V. Yakovleva, Head of Ministry's Press-Service http://mic.tatarstan.ru/rus/about/structure_new?department_id=80576
* Farkhad N. Fatkullin, 2018 Wikimedian of the Year, Wikimedia Russia member https://meta.wikimedia.org/wiki/User:Frhdkazan
1. Content creation contests
2. Education projects
3. How can these be organized in a systemic way, engaging students throughout all municipal districts of Tatarstan & Tatar diaspora globally, and assessing effectiveness of measures
4. Larger Tatar language internet development, promoting content creation in the language
5. .tatar domain
7. Generating lists for content creation
8. Free licenses
10. Regional educational Wiki-seminar for beginners, introduction & orientation
11. Smartphone oriented educational projects around Tatar & Tatar Wikipedia (beginning with Tatar version of www.kakprav.com & on to WOK Master type )
12. OSM in Tatar
13. The way forward: visiting WMF to discover other opportunities, MoU with WMF?
1. Almaz Valiullin to head the Working Group on behalf of the Ministry
2. Next meeting set for / around 20th of August
3. Minister requested his staff to draft documents regarding moving all Regional & Municipal budget funded websites to CC-BY type free licenses
4. Minister proposed every organization website place a link to a Wikipedia article about it (in Tatar, Russian or English, depending on the language used)
5. Help is requested about developing simple samples or article creation guidelines for various institutions of Tatarstan & diaspora organizations to learn what is considered appropriate by Wiki-community & how should representatives of respective entities
6. Annual Tatar Internet Awards Ceremony to include awards to leading editors of Wikimedia projects in Tatar
7. Help is requested to develop a framework of Wikipedia Article Contests for Secondary School Children of Tatastan, with article quality assessment schemes (prizes & organization to be funded by Tatarstan goverment, promotion in local mass-media, schools & etc.)
8. Help is requested to develop a framework for a sustainable functioning of a Tatarstan & Tatar language oriented Wikimedia thematic organization to be responsible for systemic work around developing & promoting Wikimedia projects in Tatar, as well as Tatarstan-oriented content creation for various Wikis
9. Readiness to consider awarding *.tatar domain names to those, who develop attractive projects in Tatar that can't be hosted on Wikimedia platform or otherwise need a Tatar digital identity
10. Help is requested regarding drafting a program proposal, necessary budget & expected results for a Tatarstan oriented introductory Wiki-seminar, open for a wide public
11. Help is requested to provide links to Wikidata educational materials, Practical Use Cases & expert responses to the Deputy Minister (in English)
12. Help is requested in organizing a learning visit to WMF Headquarters during Tatarstan Delegation annual fall visit to Silicon Valley & other places in U.S. to discover other opportunities
Farkhad Fatkullin - Фархад Фаткуллин http://sikzn.ru/ Тел.+79274158066 / skype:frhdkazan / Wikipedia:frhdkazan
03.08.2018, 11:42, "Фархад Фаткуллин / Farkhad Fatkullin" <frhd(a)yandex.com>:
> Hi Nochole & Kui,
> Thank you for your responses & readiness to help.
> Events on the ground are developing following predictable course - we had a very constructive 3 hour long discussion last night with Tatarstan's Deputy Prime Minister (IT minister) & his team, whilst earlier Thursday I got an invite to meet with another Deputy Prime Minister (President of the National Council of Trustees for the World Congress of Tatars Association).
> More TV & magazine interviews are expected, some in detail articles in Russian promoting Wikimedia projects are in the pipelines (expecting to see the texts for necessary corrections & adding links to respective WMRU, Meta, WMF & related independent media articles (thanks to ComCom).
> I hope to find time to prepare & email both you (in English) & IT ministry's Working Group (in Russian) my summary of yesterday late evening meeting, with attendees, topics discussed, request & proposals, etc. They want to move as fast as possible, because this will help them to meet promoting Tatar language use online task they are charged with by the region's President.
> Just a few to begin with:
> * Ready to move all regional and municipal government websites into CC-BY, add links to respective WP articles from all these
> * Willing to have Tatarstan & Tatar language oriented thematic organization in place to work with secondary schools & Universities, GLAMS, diaspora
> * Ready to sponsor prizes for article writing contests (either through WMRU or this new entity)
> * Requesting guidance from myself & Wikimedia community on what's the best way to make this all successful
> * Interested in supporting WMRU organized general public educational Wiki-Seminar in Tatarstan Academy of Sciences or other respected facility with a large conference hall open to all public (September-October?)
> * Willing to visit WMF headquarters in November to meet, learn more about what's there (I admit I don't read all mailing lists or Meta discussions, don't have time to see all wonderful YouTube videos available, haven't yet have time to visit a single Wikimedia Conference) & possibly sign MoUs or whatever that would help speed up Wikimedia acceptance & popularity growth among the population of the region.
> I am in touch with Wikimedia RU & Wikimedia Languages of Russia Community's regarding ongoing developments, benefiting from collective wisdom of Wikimedians I know.
> Should you be ready to bring any ideas to the table, please shoot them my way.
> Farkhad Fatkullin - Фархад Фаткуллин http://sikzn.ru/ Тел.+79274158066 / skype:frhdkazan / Wikipedia:frhdkazan
> 02.08.2018, 16:35, "Nichole Saad" <nsaad(a)wikimedia.org>:
>> Hi Farhad,
>> First, congratulations on being named "Wikimedian of the Year!" We have received your letter, and are looking forward to engage with your ideas. Realistically, I'll be able to provide a more in depth response early next week.
>> best regards,
>> On Wed, Aug 1, 2018 at 2:45 PM Фархад Фаткуллин / Farkhad Fatkullin <frhd(a)yandex.com> wrote:
>>> Dear Sirs,
>>> This is from Farhad, User:frhdkazan (ComCom member, now a.k.a. 2018 Wikimedian of the Year) with some proposals about how to leverage a fleeting opportunity I got. Please read the text below this letter ASAP & think how you can help seize the moment. I'm looking forward to first ideas within 24 hours from now.
>>> Any support of yours would be greatly appreciated.
>>> Thanks a million.
>>> P.S. More on where I'm coming from can later be discovered @ http://frhd.narod.ru/resume-en.htm (dated, stopped my retainer contract with the Office of Tatarstan President in September 2013) & more up-to-date but in Russian @ http://sikzn.narod.ru/index/0-4 . I have previously translated & provided digital media with the link to Russian text of Wikimedia Blog post about Jess Wade https://ru.wikinews.org/wiki/?curid=176108 , & now contacted Asian colleagues to get similar content about Nahid Sultan & Wikimedia Bangladesh. Also on my agenda is to record a video-interview in Turkish for Wikimedia Turkey's communication campaign - I speak the language & we are in touch with Turkish community. I also need to get back to interpreting into Russian videos of Wikimania-2018 available on YouTube - maybe later this week, before I join my family for a few days off & then a week-long interpretation assignment in Singapore.
>>> Farkhad Fatkullin - Фархад Фаткуллин http://sikzn.ru/ Тел.+79274158066 / skype:frhdkazan / Wikipedia:frhdkazan
>>> Following Jimmy's unexpected pass during Wikimania-2018 Closing Ceremony, I became an instant celebrity of sort - myself alone I've seen over 40 articles with positive PR about Wikipedia & wider body of Wikimedia projects,
>>> * one by RT in Russian collecting over 1.1 million Facebook likes in under 8 hours from publication
>>> * short positive ones from Russian Federation government's Official gazette & Russkiy Mir Foundation
>>> * first ever mention that people can donate to Wikimedia Russia volunteers organization for us to be able to fund local Wiki-seminars, conferences & contests (WMF is not funding Wikimedia Russia & can't allow us to use donate link from Russian Wikipedia or place a link to "Thank you, but we don't accept donations from Russia" page where users from Russia are being routed
>>> * about to get one with links published to various WMF & Wikimedia Russia projects, including promoting free licenses, Education & GLAM programs, etc.
>>> I'm periodically collecting these @ https://tt.wikipedia.org/wiki/Википедия:Безнең_турында_матбугатта#Ел_викиме… when time permits, but there's more in my Facebook Messenger.
>>> This is also generating very positive attention on behalf of the Republic of Tatarstan government, as well as our regional NGO partner, Selet Youth Education Foundation, with whom yet unrecognized Tatar Wikimedians are jointly running Selet WikiSchool project. https://outreach.wikimedia.org/wiki/Education/News/May_2018/Selet_WikiSchool The latter want to introduce this joint initiative, as well as Wikimedia Russia executive director Stanislav Kozlovsky (Ph.D. in Psychology, Researcher & a Senior lecturer at Moscow State University) and myself to the President of Tatarstan during his visit to their Annual International Forum of Tatar-speaking High-School Age Youth on August 6th. Unrelated to that I was yesterday called up by a good acquaintance of mine (Tatarstan Vice Prime-Minister, Minister for IT https://en.wikipedia.org/wiki/Roman_Shaykhutdinov ), who invited me to have a talk with him tomorrow about how promoting Wikimedia projects in wider Tatar-speaking world (only about 25% of Tatars live in Tatarstan) can help him get whatever current President wants him to do to develop Tatar language use online. There are some other Tatarstan Prime-Minister Office level things in the pipeline, & I am also waiting for a call from Russian Federation ex-IT Minister https://en.wikipedia.org/wiki/Nikolay_Nikiforov , whom I know from his years in Tatarstan government as well (until 2012). FYI: Wikimedia movement is still unknown in Russia, with local mass media outlets are predominantly speaking about Wikipedia in Russian when there is some lapse that can be exploited to show one's Russian patriotism & thus score some points (basically same political circus, as what we are seeing in U.S. with Russia-collusion story in the overdrive, just opposite direction), and here we get something to reverse the situation big time, explaining to the wide public that it's not only Ok, but even desirable they engage with Wikimedia projects.
>>> Keeping in mind that Tatarstan President is the head of both Russia-Islamic World Strategic Vision Group & Association of Innovative Regions of Russia, Jimmy gave me a-hell-of-an-opportunity for an elevator pitch for Wikimedia movement in Russia and some adjacent counties (think Turkey & Central Asia), and I would really hate seeing it go unused. I will do my part here (about CC-BY for regional government controlled websites, Wikipedia Education Program as an extracurricular activity in all secondary school & universities, GLAM, as well as some locally funded carrots for participants), but I also want your help in driving this to a home-run. Every fall Tatarstan President & the delegation is visiting Silicon Valley & other places of interest in U.S. & it would be great if we could set-up a physical visit to Wikimedia Foundation, with a first hand presentation of best international practices the movement if proud of & have some MoU on cooperation signed (as a bureaucratic basis for continuing the conversation) with either some department in Tatarstan government, just like the one you signed with the Mexican Ministry of Culture www.eluniversal.com.mx/cultura/secretaria-de-cultura-y-wikimedia-firman-con…. To set the context, Tatarstan has such formal documents with a number of American companies & institutions, with our private English-speaking IT Univesity @ https://en.wikipedia.org/wiki/Innopolis having been developed jointly with Carnegie Mellon University (with consultancy fees paid by Tatastan), we have Russia's first publicly funded hospital certified to be meeting https://en.wikipedia.org/wiki/Joint_Commission requirements & plenty of U.S. investors @ our https://en.wikipedia.org/wiki/Alabuga_Special_Economic_Zone . On top of this, we could benefit from having Selet Youth Educational movement http://selet.biz/en/ to embrace Wiki even further, so maybe a similar MoU on collaboration between them and Wikiedu.org signed simultaneously would be great (I met LiAnna and Jami in Montreal last year & was really impressed with what these guys are doing).
>>> Please shoot something my way before the same time tomorrow, for me to handle the meeting with Tatarstan Deputy Prime Minister more effectively to progressively open other opportunities I've touched on. On top of what I described above & my last year's ideas @ https://meta.wikimedia.org/wiki/User:Frhdkazan/Wiki4RegionalDevt (in Russian), I'll be bringing to the table topics of:
>>> * Wikidata
>>> * OSM, as well as a
>>> * WOK-like project for all things in Tatar (I was contacted by a local 9th grader who, on his own initiative with a help of a teacher, did something similar for students who want to train for Russian-language SAT/ACT type exam www.kakprav.com & now offered to develop such or bigger thing for Tatar). For more on WOK, see https://www.vanguardngr.com/2018/01/wikipedia-wok-seek-nigerian-content-onl…
>>> * whatever else you or anybody else can help me think of until then & or later opportunities
>> Nichole Saad
>> Wikimedia Foundation | Senior Program Manager, Education
>> user: NSaad (WMF)
-------- Конец пересылаемого сообщения --------
Farkhad Fatkullin - Фархад Фаткуллин http://sikzn.ru/ Тел.+79274158066 / skype:frhdkazan / Wikipedia:frhdkazan
I was asked by a volunteer for help getting stats on the gender gap in
content on a certain Wikipedia, and came up with simple Wikidata Query
Service queries that pulled the total number of articles on a given
Wikipedia about men and about women, to calculate *the proportion of
articles about women out of all articles about humans*.
Then I was curious about how that wiki compared to other wikis, so I ran
the queries on a bunch of languages, and gathered the results into a table,
(please see the *caveat* there.)
I don't have time to fully write-up everything I find interesting in those
results, but I will quickly point out the following:
1. The Nepali statistic is simply astonishing! There must be a story
there. I'm keen on learning more about this, if anyone can shed light.
2. Evidently, ~13%-17% seems like a robust average of the proportion of
articles about women among all biographies.
3. among the top 10 largest wikis, Japanese is the least imbalanced. Good
job, Japanese Wikipedians! I wonder if you have a good sense of what
drives this relatively better balance. (my instinctive guess is pop culture
4. among the top 10 largest wikis, Russian is the most imbalanced.
5. I intend to re-generate these stats every two months or so, to
eventually have some sense of trends and changes.
6. Your efforts, particularly on small-to-medium wikis, can really make a
dent in these numbers! For example, it seems I am personally
responsible for almost 1% of the coverage of women on Hebrew Wikipedia!
7. I encourage you to share these numbers with your communities. Perhaps
you'd like to overtake the wiki just above yours? :)
8. I'm happy to add additional languages to the table, by request. Or you
can do it yourself, too. :)
 Yay #100wikidays :) https://meta.wikimedia.org/wiki/100wikidays
Wikimedia Foundation <http://www.wikimediafoundation.org>
Imagine a world in which every single human being can freely share in the
sum of all knowledge. Help us make it a reality!
As I mentioned in an earlier thread , we will be running reader
surveys across a number of Wikipedia languages to learn about the
reader needs and motivations in these languages as well as some of
their demographic information (and perhaps the correlations between
demographics and user motivations and characteristics).
If your language community is interested to have statistics on the
distribution of reader gender, age, education, native language, and
geographic region (rural/urban) in your language (and depending on how
much data we collect in your language, perhaps more insights), this is
your chance to indicate interest at:
I initially communicated 2019-02-15 as the deadline to sign up. Since
then, we have run a pilot test on enwiki and we are investigating some
of the results to see if any changes in the survey questions are
needed. You have now time until 2019-03-15 to indicate interest.
As always: this call is primarily a service to your language
community. If you like it, take action on it. If you don't, no action
is needed. :)
Recently, the "draft recommendations" of the strategy working groups have
been published. As Nicole informed us, they are "key tools" for the future
of the movement. These documents are the result of one year of work of the
If I am not mistaken, the Wikimedia volunteers now have one month to give
feedback. In October, the process of refining and finalizing has to be
ready, and in November, the movement will have to start with implementing
Having seen now more of the documents, my conclusion can only be one: the
documents are simply not ready for this stage of the process. They are much
more unready than they should be for being put to the eyes of the Wikimeda
There are documents in which there is only one question answered, by one
sentence. Other documents don't show that any research has been used to
back the statements. Many obvious arguments and links are missing. At least
at one occasion I read as an answer to an important question: "todo".
The proposals often give the impression that they are not thought through.
There should be quotas for admins, but we see nowhere an explanation how
that would relate to the right to remain anonymous. There is the statement
that minorities sometimes can only express themselves with ND and NC
content, but the two links in the document hardly back that claim. After
years in which the Wikimedia organizations and other free and open content
organizations taught us that NC is problematic, now such a drastic change?
And there is this already infamous sentence: Instead of being informed
about the possible negative impacts of NC and ND, we only read: "All change
has negative connotations to some members of the community."
I find it stunning that there was nobody who went through the documents
before publication and said: we cannot publish this sentence, it is giving
a very bad impression about our attitude towards the community (= the very
same people we are asking to invest their time for giving feedback).
This does not mean that all documents or all sections and recommendations
are unusable or damaging. I also cannot judge about the efforts invested,
as I have no insight in the inner workings. But it is very frustrating for
me to read the documents and often have to guess what they actually mean.
And it seems to me, given the comments on the user pages on Meta Wiki, on
this list, on de:WP:Kurier and on Facebook, that I am not the only one who
feels this frustration.
Therefore, I ask the people responsible: please reconsider the timeline. If
these documents are the result of one year work, then the documents will
not be ready within two and a half months. Consider several months for the
working groups to use the present feedback for a redraft, and then give the
Wikimedia volunteers at least the same amount of time for giving feedback
Dear fellow Wikimedians,
They’re here!  We are delighted to announce the first round of
draft recommendations for structural change within our movement have
been published. The recommendations have been developed by the nine
Wikimedia 2030 working groups and are a key tool to help us build the
future of our movement.
Working group members have been working tirelessly for a year to
research the movement, analyze community input shared via community
conversations, and gain insight into external trends. A huge thank you
to each and every member for helping us reach this key milestone.
The draft recommendations are a first look at ways we can adapt our
movement’s structures to help us advance in our strategic direction.
They are the starting point for conversations about what kind of
future we want to create together.
The recommendations are not final. In order to get them to that stage,
your input is needed! We would like to hear from you all what these
changes would mean for you in your local or thematic context, what do
you like about them, and where you potentially see any red flags. And
of course, always critically question whether these recommendations
support the strategic direction.
There are a few ways to do this:
* Read through the recommendations online and provide your input
directly on Meta. 
* If you will be at Wikimania, join us in the Wikimedia 2030 space. 
* Attend a Strategy Salon hosted by an affiliate where you live. 
* Reach out to a Strategy Liaison in your language to share feedback,
or lead a conversation of your own. 
Over the next month, working groups will take the input they receive
into the recommendations, alongside external advice and research, and
use it to refine and finalize them. Share your views, and help shape
what Wikimedia will look like in 2030 and beyond.
If you have any questions or feedback, please feel free to get in touch.
Adviser International Relations
Program Manager Wikimedia 2030 Movement Strategy
Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin
Tel. (030) 219 158 26-0
Unsere Vision ist eine Welt, in der alle Menschen am Wissen der
Menschheit teilhaben, es nutzen und mehren können. Helfen Sie uns
Wikimedia Deutschland — Gesellschaft zur Förderung Freien Wissens e.
V. Eingetragen im Vereinsregister des Amtsgerichts
Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig
anerkannt durch das Finanzamt für Körperschaften I Berlin,
Some of you might be recovering from the Wikimania fatigue. Those of you
who have already recovered, I wanted to pick your brain about something
that came up multiple times during discussions but none really seem to have
a clear answer.
Which script (writing system) an oral language speaker would use for
creating an entry on (gateway ) projects like Wiktionary or Wikibooks or
even uploading a list of words on Commons using a tool like Lingua Libre?
Will it be the script used for the official language of the region where
the former language is from? This is a bit controversial as native
speakers of many indigenous languages would see this as a form of
colonization. Will it be the w:International Phonetic Alphabet (IPA)? This
is probably the least controversial but a common and average user might not
be able to read IPA as the latter was created by linguists and was created
for linguistic and scholarly studies rather than for everyday use.
Wikimedians who are native speakers of languages with less written/recorded
documentation and individuals who work on such languages are more
encouraged to share their inputs based on past experience.
1. Gateway project: This is a made-up term to define the Wikimedia projects
that are more welcoming to newbies and do not require stringent citation as
almost all oral languages would lack that. It was fascinating to see Amir
challenging that it only takes about 30 seconds to add an entry to
On Sunday 25 August 2019, Wikimedia Australia held its Annual General
Two long term Committee members, Gideon Digby and Tom Hograth, have taken a
step back from WMAU as Committee members but will be still involved within
Current WMAU Committee as of 25 August 2019
- Pru Mitchell - President
- Alex Lum - Vice-President
- Robert Myers - Secretary
- Steven Crossin - Treasurer
Ordinary Committee Members
- Caddie Brain
- Matthew Moore
- Jacinta Sutton
- Sam Wilson
Secretary - Wikimedia Australia
M: +61 400 670 288
Wikimedia Australia Inc. is an independent charitable organisation which
supports the efforts of the Wikimedia Foundation in Australia. We welcome
your support by membership or donations to keep the Wikimedia mission alive.