This paper (first reference) is the result of a class project I was part of
almost two years ago for CSCI 5417 Information Retrieval Systems. It builds
on a class project I did in CSCI 5832 Natural Language Processing and which
I presented at Wikimania '07. The project was very late as we didn't send
the final paper in until the day before new years. This technical report was
never really announced that I recall so I thought it would be interesting to
look briefly at the results. The goal of this paper was to break articles
down into surface features and latent features and then use those to study
the rating system being used, predict article quality and rank results in a
search engine. We used the [[random forests]] classifier which allowed us to
analyze the contribution of each feature to performance by looking directly
at the weights that were assigned. While the surface analysis was performed
on the whole english wikipedia, the latent analysis was performed on the
simple english wikipedia (it is more expensive to compute). = Surface
features = * Readability measures are the single best predictor of quality
that I have found, as defined by the Wikipedia Editorial Team (WET). The
[[Automated Readability Index]], [[Gunning Fog Index]] and [[Flesch-Kincaid
Grade Level]] were the strongest predictors, followed by length of article
html, number of paragraphs, [[Flesh Reading Ease]], [[Smog Grading]], number
of internal links, [[Laesbarhedsindex Readability Formula]], number of words
and number of references. Weakly predictive were number of to be's, number
of sentences, [[Coleman-Liau Index]], number of templates, PageRank, number
of external links, number of relative links. Not predictive (overall - see
the end of section 2 for the per-rating score breakdown): Number of h2 or
h3's, number of conjunctions, number of images*, average word length, number
of h4's, number of prepositions, number of pronouns, number of interlanguage
links, average syllables per word, number of nominalizations, article age
(based on page id), proportion of questions, average sentence length. :*
Number of images was actually by far the single strongest predictor of any
class, but only for Featured articles. Because it was so good at picking out
featured articles and somewhat good at picking out A and G articles the
classifier was confused in so many cases that the overall contribution of
this feature to classification performance is zero. :* Number of external
links is strongly predictive of Featured articles. :* The B class is highly
distinctive. It has a strong "signature," with high predictive value
assigned to many features. The Featured class is also very distinctive. F, B
and S (Stop/Stub) contain the most information.
:* A is the least distinct class, not being very different from F or G. =
Latent features = The algorithm used for latent analysis, which is an
analysis of the occurence of words in every document with respect to the
link structure of the encyclopedia ("concepts"), is [[Latent Dirichlet
Allocation]]. This part of the analysis was done by CS PhD student Praful
Mangalath. An example of what can be done with the result of this analysis
is that you provide a word (a search query) such as "hippie". You can then
look at the weight of every article for the word hippie. You can pick the
article with the largest weight, and then look at its link network. You can
pick out the articles that this article links to and/or which link to this
article that are also weighted strongly for the word hippie, while also
contributing maximally to this articles "hippieness". We tried this query in
our system (LDA), Google (site:en.wikipedia.org hippie), and the Simple
English Wikipedia's Lucene search engine. The breakdown of articles occuring
in the top ten search results for this word for those engines is: * LDA
only: [[Acid rock]], [[Aldeburgh Festival]], [[Anne Murray]], [[Carl
Radle]], [[Harry Nilsson]], [[Jack Kerouac]], [[Phil Spector]], [[Plastic
Ono Band]], [[Rock and Roll]], [[Salvador Allende]], [[Smothers brothers]],
[[Stanley Kubrick]]. * Google only: [[Glam Rock]], [[South Park]]. * Simple
only: [[African Americans]], [[Charles Manson]], [[Counterculture]], [[Drug
use]], [[Flower Power]], [[Nuclear weapons]], [[Phish]], [[Sexual
liberation]], [[Summer of Love]] * LDA & Google & Simple: [[Hippie]],
[[Human Be-in]], [[Students for a democratic society]], [[Woodstock
festival]] * LDA & Google: [[Psychedelic Pop]] * Google & Simple: [[Lysergic
acid diethylamide]], [[Summer of Love]] ( See the paper for the articles
produced for the keywords philosophy and economics ) = Discussion /
Conclusion = * The results of the latent analysis are totally up to your
perception. But what is interesting is that the LDA features predict the WET
ratings of quality just as well as the surface level features. Both feature
sets (surface and latent) both pull out all almost of the information that
the rating system bears. * The rating system devised by the WET is not
distinctive. You can best tell the difference between, grouped together,
Featured, A and Good articles vs B articles. Featured, A and Good articles
are also quite distinctive (Figure 1). Note that in this study we didn't
look at Start's and Stubs, but in earlier paper we did. :* This is
interesting when compared to this recent entry on the YouTube blog. "Five
Stars Dominate Ratings"
I think a sane, well researched (with actual subjects) rating system
well within the purview of the Usability Initiative. Helping people find and
create good content is what Wikipedia is all about. Having a solid rating
system allows you to reorganized the user interface, the Wikipedia
namespace, and the main namespace around good content and bad content as
needed. If you don't have a solid, information bearing rating system you
don't know what good content really is (really bad content is easy to spot).
:* My Wikimania talk was all about gathering data from people about articles
and using that to train machines to automatically pick out good content. You
ask people questions along dimensions that make sense to people, and give
the machine access to other surface features (such as a statistical measure
of readability, or length) and latent features (such as can be derived from
document word occurence and encyclopedia link structure). I referenced page
262 of Zen and the Art of Motorcycle Maintenance to give an example of the
kind of qualitative features I would ask people. It really depends on what
features end up bearing information, to be tested in "the lab". Each word is
an example dimension of quality: We have "*unity, vividness, authority,
economy, sensitivity, clarity, emphasis, flow, suspense, brilliance,
precision, proportion, depth and so on.*" You then use surface and latent
features to predict these values for all articles. You can also say, when a
person rates this article as high on the x scale, they also mean that it has
has this much of these surface and these latent features.
= References =
- DeHoust, C., Mangalath, P., Mingus., B. (2008). *Improving search in
Wikipedia through quality and concept discovery*. Technical Report.
- Rassbach, L., Mingus., B, Blackford, T. (2007). *Exploring the
feasibility of automatically rating online article quality*. Technical
I have asked and received permission to forward to you all this most
excellent bit of news.
The linguist list, is a most excellent resource for people interested in the
field of linguistics. As I mentioned some time ago they have had a funding
drive and in that funding drive they asked for a certain amount of money in
a given amount of days and they would then have a project on Wikipedia to
learn what needs doing to get better coverage for the field of linguistics.
What you will read in this mail that the total community of linguists are
asked to cooperate. I am really thrilled as it will also get us more
linguists interested in what we do. My hope is that a fraction will be
interested in the languages that they care for and help it become more
relevant. As a member of the "language prevention committee", I love to get
more knowledgeable people involved in our smaller projects. If it means that
we get more requests for more projects we will really feel embarrassed with
all the new projects we will have to approve because of the quality of the
Incubator content and the quality of the linguistic arguments why we should
approve yet another language :)
NB Is this not a really clever way of raising money; give us this much in
this time frame and we will then do this as a bonus...
---------- Forwarded message ----------
From: LINGUIST Network <linguist(a)linguistlist.org>
Date: Jun 18, 2007 6:53 PM
Subject: 18.1831, All: Call for Participation: Wikipedia Volunteers
LINGUIST List: Vol-18-1831. Mon Jun 18 2007. ISSN: 1068 - 4875.
Subject: 18.1831, All: Call for Participation: Wikipedia Volunteers
Moderators: Anthony Aristar, Eastern Michigan U <aristar(a)linguistlist.org>
Helen Aristar-Dry, Eastern Michigan U <hdry(a)linguistlist.org>
Reviews: Laura Welcher, Rosetta Project
The LINGUIST List is funded by Eastern Michigan University,
and donations from subscribers and publishers.
Editor for this issue: Ann Sawyer <sawyer(a)linguistlist.org>
To post to LINGUIST, use our convenient web form at
From: Hannah Morales < hannah(a)linguistlist.org >
Subject: Wikipedia Volunteers
-------------------------Message 1 ----------------------------------
Date: Mon, 18 Jun 2007 12:49:35
From: Hannah Morales < hannah(a)linguistlist.org >
Subject: Wikipedia Volunteers
As you may recall, one of our Fund Drive 2007 campaigns was called the
"Wikipedia Update Vote." We asked our viewers to consider earmarking their
donations to organize an update project on linguistics entries in the
English-language Wikipedia. You can find more background information on this
The speed with which we met our goal, thanks to the interest and generosity
our readers, was a sure sign that the linguistics community was enthusiastic
about the idea. Now that summer is upon us, and some of you may have a bit
leisure time, we are hoping that you will be able to help us get started on
Wikipedia project. The LINGUIST List's role in this project is a purely
organizational one. We will:
*Help, with your input, to identify major gaps in the Wikipedia materials or
pages that need improvement;
*Compile a list of linguistics pages that Wikipedia editors have identified
"in need of attention from an expert on the subject" or " does not cite any
references or sources," etc;
*Send out periodical calls for volunteer contributors on specific topics or
*Provide simple instructions on how to upload your entries into Wikipedia;
*Keep track of our project Wikipedians;
*Keep track of revisions and new entries;
*Work with Wikimedia Foundation to publicize the linguistics community's
We hope you are as enthusiastic about this effort as we are. Just to help us
get started looking at Wikipedia more critically, and to easily identify an
needing improvement, we suggest that you take a look at the List of
Many people are not listed there; others need to have more facts and
added. If you would like to participate in this exciting update effort,
respond by sending an email to LINGUIST Editor Hannah Morales at
hannah(a)linguistlist.org, suggesting what your role might be or which
entries you feel should be updated or added. Some linguists who saw our
on the Internet have already written us with specific suggestions, which we
share with you soon.
This update project will take major time and effort on all our parts. The
result will be a much richer internet resource of information on the breadth
depth of the field of linguistics. Our efforts should also stimulate
students to consider studying linguistics and to educate a wider public on
we do. Please consider participating.
Editor, Wikipedia Update Project
Linguistic Field(s): Not Applicable
LINGUIST List: Vol-18-1831
>>> The people who are loudest in their demands for consensus
>>> do not represent the Wikimedia movement.
>> The voices loudest for the WMF doing something against the
>> Trump administration are not representative of the Wikimedia
>> movement either....
> Is the Community Process Steering Committee currently
> prepared to "engage more 'quiet' members of our community"
> with a statistically robust snap survey to resolve this question?
Anyone can go to Recent Changes and send a SurveyMonkey link to the
most recent few hundred editors with contributions at least a year
old, to get an accurate answer.
Will a respected member of the community please do this? I would like
to know what the actual editing community thinks of the travel ban and
their idea of an appropriate response. I don't want to see community
governance by opt-in participation in obscure RFCs.
I would offer to do this myself, but I value keeping my real name
unassociated with my enwiki userid.
This is a request for your input and possible ideas (if any) regarding my management of the fallout from Jimmy's announcing myself as a 2018 Wikimedian of the Year.
Emails below are a copy of my ongoing consultations with Wikimedia Foundation staff and other Wikimedians I personally know, as well as a report on what's already brewing in my region of Russia after this unexpected outcome.
I would be grateful, if you can advise me on how to properly steer the enthusiasm of behalf of regional government, mass-media, NGOs, etc. which have just discovered about the possibility of participation in Wikimedia movement (think anything from U.S. is not getting much in-depth coverage in Russian by sources that regional public figures, NGOs, teachers or general regional journalists read) & are now placing great hopes on teaching whole of Tatarstan about how to Wiki & also engaging all Tatars globally (3/4 outside of Tatarstan, 1/5 outside of Russia).
* Selet WikiSchool got presented to wider public @ the press-conference for Tatar-speaking journalists (July 31), at the poster session in the framework of the World Congress of Tatars Youth Forum (Aug.3), and later at a Tatar projects fair in the Downtown park of Kazan (Aug.5)
* Contribution of those writing into Wikipedia in Tatar was recognized by choosing myself as one of the current year's "For the great service to Tatar nation" medal recipients. (Aug.3)
* We recorded a 40 min interview in Turkish (Aug.4, see https://www.youtube.com/watch?v=HK3tFBWMWcs )
* We didn't yet meet the President of the Republic (his schedule changed once again), but I got another firm request to meet with the Minister for Youth Affairs (another good acquaintance of mine, ex-member of local Comedy Club type activity), as well as with the Head of World Congress of Tatars' Executive Bureau, and more TV & written press interviews are lining up.
* I am trying to manage these guys' optimism & desire to move quickly, keeping in mind they are not really familiar with the Wiki-way or our communities' policies & practices
* Bashkir (ba) & Sakha (sah) communities have shared advice about how to assure that local Wikimedia community stays clearly independent in a cultural environment where neither Education, nor GLAM programs will attract local partners unless those have a strong support of the regional or federal government entities.
* Looking forward to meet with my Wikimania-2017 roommate from the Philippines in Singapore on Aug.16 to collect some more input from Asia
-------- Пересылаемое сообщение--------
03.08.2018, 19:35, "Фархад Фаткуллин / Farkhad Fatkullin" <frhd(a)yandex.com>:
Thursday, 2 August 2018
Republic of Tatarstan Ministry for Informatization & Communication @ IT-park, Kazan
Meeting with Tatarstan Deputy Prime Minister - Minister for Informatization & Communications Roman Shaykhutdinov
on opportunities Wikimedia projects & popularization thereof among the population of the Republic can bring for Tatarstan
* Almaz A. Valiullin, Director General of Tatarstan Center for Information Technologies http://mic.tatarstan.ru/eng/valiullin.htm
* Tatyana S. Kamaletdinova, CEO of Tatarstan Center for Information Technologies http://mic.tatarstan.ru/eng/kamaletdinova.htm
* Anna V. Yakovleva, Head of Ministry's Press-Service http://mic.tatarstan.ru/rus/about/structure_new?department_id=80576
* Farkhad N. Fatkullin, 2018 Wikimedian of the Year, Wikimedia Russia member https://meta.wikimedia.org/wiki/User:Frhdkazan
1. Content creation contests
2. Education projects
3. How can these be organized in a systemic way, engaging students throughout all municipal districts of Tatarstan & Tatar diaspora globally, and assessing effectiveness of measures
4. Larger Tatar language internet development, promoting content creation in the language
5. .tatar domain
7. Generating lists for content creation
8. Free licenses
10. Regional educational Wiki-seminar for beginners, introduction & orientation
11. Smartphone oriented educational projects around Tatar & Tatar Wikipedia (beginning with Tatar version of www.kakprav.com & on to WOK Master type )
12. OSM in Tatar
13. The way forward: visiting WMF to discover other opportunities, MoU with WMF?
1. Almaz Valiullin to head the Working Group on behalf of the Ministry
2. Next meeting set for / around 20th of August
3. Minister requested his staff to draft documents regarding moving all Regional & Municipal budget funded websites to CC-BY type free licenses
4. Minister proposed every organization website place a link to a Wikipedia article about it (in Tatar, Russian or English, depending on the language used)
5. Help is requested about developing simple samples or article creation guidelines for various institutions of Tatarstan & diaspora organizations to learn what is considered appropriate by Wiki-community & how should representatives of respective entities
6. Annual Tatar Internet Awards Ceremony to include awards to leading editors of Wikimedia projects in Tatar
7. Help is requested to develop a framework of Wikipedia Article Contests for Secondary School Children of Tatastan, with article quality assessment schemes (prizes & organization to be funded by Tatarstan goverment, promotion in local mass-media, schools & etc.)
8. Help is requested to develop a framework for a sustainable functioning of a Tatarstan & Tatar language oriented Wikimedia thematic organization to be responsible for systemic work around developing & promoting Wikimedia projects in Tatar, as well as Tatarstan-oriented content creation for various Wikis
9. Readiness to consider awarding *.tatar domain names to those, who develop attractive projects in Tatar that can't be hosted on Wikimedia platform or otherwise need a Tatar digital identity
10. Help is requested regarding drafting a program proposal, necessary budget & expected results for a Tatarstan oriented introductory Wiki-seminar, open for a wide public
11. Help is requested to provide links to Wikidata educational materials, Practical Use Cases & expert responses to the Deputy Minister (in English)
12. Help is requested in organizing a learning visit to WMF Headquarters during Tatarstan Delegation annual fall visit to Silicon Valley & other places in U.S. to discover other opportunities
Farkhad Fatkullin - Фархад Фаткуллин http://sikzn.ru/ Тел.+79274158066 / skype:frhdkazan / Wikipedia:frhdkazan
03.08.2018, 11:42, "Фархад Фаткуллин / Farkhad Fatkullin" <frhd(a)yandex.com>:
> Hi Nochole & Kui,
> Thank you for your responses & readiness to help.
> Events on the ground are developing following predictable course - we had a very constructive 3 hour long discussion last night with Tatarstan's Deputy Prime Minister (IT minister) & his team, whilst earlier Thursday I got an invite to meet with another Deputy Prime Minister (President of the National Council of Trustees for the World Congress of Tatars Association).
> More TV & magazine interviews are expected, some in detail articles in Russian promoting Wikimedia projects are in the pipelines (expecting to see the texts for necessary corrections & adding links to respective WMRU, Meta, WMF & related independent media articles (thanks to ComCom).
> I hope to find time to prepare & email both you (in English) & IT ministry's Working Group (in Russian) my summary of yesterday late evening meeting, with attendees, topics discussed, request & proposals, etc. They want to move as fast as possible, because this will help them to meet promoting Tatar language use online task they are charged with by the region's President.
> Just a few to begin with:
> * Ready to move all regional and municipal government websites into CC-BY, add links to respective WP articles from all these
> * Willing to have Tatarstan & Tatar language oriented thematic organization in place to work with secondary schools & Universities, GLAMS, diaspora
> * Ready to sponsor prizes for article writing contests (either through WMRU or this new entity)
> * Requesting guidance from myself & Wikimedia community on what's the best way to make this all successful
> * Interested in supporting WMRU organized general public educational Wiki-Seminar in Tatarstan Academy of Sciences or other respected facility with a large conference hall open to all public (September-October?)
> * Willing to visit WMF headquarters in November to meet, learn more about what's there (I admit I don't read all mailing lists or Meta discussions, don't have time to see all wonderful YouTube videos available, haven't yet have time to visit a single Wikimedia Conference) & possibly sign MoUs or whatever that would help speed up Wikimedia acceptance & popularity growth among the population of the region.
> I am in touch with Wikimedia RU & Wikimedia Languages of Russia Community's regarding ongoing developments, benefiting from collective wisdom of Wikimedians I know.
> Should you be ready to bring any ideas to the table, please shoot them my way.
> Farkhad Fatkullin - Фархад Фаткуллин http://sikzn.ru/ Тел.+79274158066 / skype:frhdkazan / Wikipedia:frhdkazan
> 02.08.2018, 16:35, "Nichole Saad" <nsaad(a)wikimedia.org>:
>> Hi Farhad,
>> First, congratulations on being named "Wikimedian of the Year!" We have received your letter, and are looking forward to engage with your ideas. Realistically, I'll be able to provide a more in depth response early next week.
>> best regards,
>> On Wed, Aug 1, 2018 at 2:45 PM Фархад Фаткуллин / Farkhad Fatkullin <frhd(a)yandex.com> wrote:
>>> Dear Sirs,
>>> This is from Farhad, User:frhdkazan (ComCom member, now a.k.a. 2018 Wikimedian of the Year) with some proposals about how to leverage a fleeting opportunity I got. Please read the text below this letter ASAP & think how you can help seize the moment. I'm looking forward to first ideas within 24 hours from now.
>>> Any support of yours would be greatly appreciated.
>>> Thanks a million.
>>> P.S. More on where I'm coming from can later be discovered @ http://frhd.narod.ru/resume-en.htm (dated, stopped my retainer contract with the Office of Tatarstan President in September 2013) & more up-to-date but in Russian @ http://sikzn.narod.ru/index/0-4 . I have previously translated & provided digital media with the link to Russian text of Wikimedia Blog post about Jess Wade https://ru.wikinews.org/wiki/?curid=176108 , & now contacted Asian colleagues to get similar content about Nahid Sultan & Wikimedia Bangladesh. Also on my agenda is to record a video-interview in Turkish for Wikimedia Turkey's communication campaign - I speak the language & we are in touch with Turkish community. I also need to get back to interpreting into Russian videos of Wikimania-2018 available on YouTube - maybe later this week, before I join my family for a few days off & then a week-long interpretation assignment in Singapore.
>>> Farkhad Fatkullin - Фархад Фаткуллин http://sikzn.ru/ Тел.+79274158066 / skype:frhdkazan / Wikipedia:frhdkazan
>>> Following Jimmy's unexpected pass during Wikimania-2018 Closing Ceremony, I became an instant celebrity of sort - myself alone I've seen over 40 articles with positive PR about Wikipedia & wider body of Wikimedia projects,
>>> * one by RT in Russian collecting over 1.1 million Facebook likes in under 8 hours from publication
>>> * short positive ones from Russian Federation government's Official gazette & Russkiy Mir Foundation
>>> * first ever mention that people can donate to Wikimedia Russia volunteers organization for us to be able to fund local Wiki-seminars, conferences & contests (WMF is not funding Wikimedia Russia & can't allow us to use donate link from Russian Wikipedia or place a link to "Thank you, but we don't accept donations from Russia" page where users from Russia are being routed
>>> * about to get one with links published to various WMF & Wikimedia Russia projects, including promoting free licenses, Education & GLAM programs, etc.
>>> I'm periodically collecting these @ https://tt.wikipedia.org/wiki/Википедия:Безнең_турында_матбугатта#Ел_викиме… when time permits, but there's more in my Facebook Messenger.
>>> This is also generating very positive attention on behalf of the Republic of Tatarstan government, as well as our regional NGO partner, Selet Youth Education Foundation, with whom yet unrecognized Tatar Wikimedians are jointly running Selet WikiSchool project. https://outreach.wikimedia.org/wiki/Education/News/May_2018/Selet_WikiSchool The latter want to introduce this joint initiative, as well as Wikimedia Russia executive director Stanislav Kozlovsky (Ph.D. in Psychology, Researcher & a Senior lecturer at Moscow State University) and myself to the President of Tatarstan during his visit to their Annual International Forum of Tatar-speaking High-School Age Youth on August 6th. Unrelated to that I was yesterday called up by a good acquaintance of mine (Tatarstan Vice Prime-Minister, Minister for IT https://en.wikipedia.org/wiki/Roman_Shaykhutdinov ), who invited me to have a talk with him tomorrow about how promoting Wikimedia projects in wider Tatar-speaking world (only about 25% of Tatars live in Tatarstan) can help him get whatever current President wants him to do to develop Tatar language use online. There are some other Tatarstan Prime-Minister Office level things in the pipeline, & I am also waiting for a call from Russian Federation ex-IT Minister https://en.wikipedia.org/wiki/Nikolay_Nikiforov , whom I know from his years in Tatarstan government as well (until 2012). FYI: Wikimedia movement is still unknown in Russia, with local mass media outlets are predominantly speaking about Wikipedia in Russian when there is some lapse that can be exploited to show one's Russian patriotism & thus score some points (basically same political circus, as what we are seeing in U.S. with Russia-collusion story in the overdrive, just opposite direction), and here we get something to reverse the situation big time, explaining to the wide public that it's not only Ok, but even desirable they engage with Wikimedia projects.
>>> Keeping in mind that Tatarstan President is the head of both Russia-Islamic World Strategic Vision Group & Association of Innovative Regions of Russia, Jimmy gave me a-hell-of-an-opportunity for an elevator pitch for Wikimedia movement in Russia and some adjacent counties (think Turkey & Central Asia), and I would really hate seeing it go unused. I will do my part here (about CC-BY for regional government controlled websites, Wikipedia Education Program as an extracurricular activity in all secondary school & universities, GLAM, as well as some locally funded carrots for participants), but I also want your help in driving this to a home-run. Every fall Tatarstan President & the delegation is visiting Silicon Valley & other places of interest in U.S. & it would be great if we could set-up a physical visit to Wikimedia Foundation, with a first hand presentation of best international practices the movement if proud of & have some MoU on cooperation signed (as a bureaucratic basis for continuing the conversation) with either some department in Tatarstan government, just like the one you signed with the Mexican Ministry of Culture www.eluniversal.com.mx/cultura/secretaria-de-cultura-y-wikimedia-firman-con…. To set the context, Tatarstan has such formal documents with a number of American companies & institutions, with our private English-speaking IT Univesity @ https://en.wikipedia.org/wiki/Innopolis having been developed jointly with Carnegie Mellon University (with consultancy fees paid by Tatastan), we have Russia's first publicly funded hospital certified to be meeting https://en.wikipedia.org/wiki/Joint_Commission requirements & plenty of U.S. investors @ our https://en.wikipedia.org/wiki/Alabuga_Special_Economic_Zone . On top of this, we could benefit from having Selet Youth Educational movement http://selet.biz/en/ to embrace Wiki even further, so maybe a similar MoU on collaboration between them and Wikiedu.org signed simultaneously would be great (I met LiAnna and Jami in Montreal last year & was really impressed with what these guys are doing).
>>> Please shoot something my way before the same time tomorrow, for me to handle the meeting with Tatarstan Deputy Prime Minister more effectively to progressively open other opportunities I've touched on. On top of what I described above & my last year's ideas @ https://meta.wikimedia.org/wiki/User:Frhdkazan/Wiki4RegionalDevt (in Russian), I'll be bringing to the table topics of:
>>> * Wikidata
>>> * OSM, as well as a
>>> * WOK-like project for all things in Tatar (I was contacted by a local 9th grader who, on his own initiative with a help of a teacher, did something similar for students who want to train for Russian-language SAT/ACT type exam www.kakprav.com & now offered to develop such or bigger thing for Tatar). For more on WOK, see https://www.vanguardngr.com/2018/01/wikipedia-wok-seek-nigerian-content-onl…
>>> * whatever else you or anybody else can help me think of until then & or later opportunities
>> Nichole Saad
>> Wikimedia Foundation | Senior Program Manager, Education
>> user: NSaad (WMF)
-------- Конец пересылаемого сообщения --------
Farkhad Fatkullin - Фархад Фаткуллин http://sikzn.ru/ Тел.+79274158066 / skype:frhdkazan / Wikipedia:frhdkazan
I was asked by a volunteer for help getting stats on the gender gap in
content on a certain Wikipedia, and came up with simple Wikidata Query
Service queries that pulled the total number of articles on a given
Wikipedia about men and about women, to calculate *the proportion of
articles about women out of all articles about humans*.
Then I was curious about how that wiki compared to other wikis, so I ran
the queries on a bunch of languages, and gathered the results into a table,
(please see the *caveat* there.)
I don't have time to fully write-up everything I find interesting in those
results, but I will quickly point out the following:
1. The Nepali statistic is simply astonishing! There must be a story
there. I'm keen on learning more about this, if anyone can shed light.
2. Evidently, ~13%-17% seems like a robust average of the proportion of
articles about women among all biographies.
3. among the top 10 largest wikis, Japanese is the least imbalanced. Good
job, Japanese Wikipedians! I wonder if you have a good sense of what
drives this relatively better balance. (my instinctive guess is pop culture
4. among the top 10 largest wikis, Russian is the most imbalanced.
5. I intend to re-generate these stats every two months or so, to
eventually have some sense of trends and changes.
6. Your efforts, particularly on small-to-medium wikis, can really make a
dent in these numbers! For example, it seems I am personally
responsible for almost 1% of the coverage of women on Hebrew Wikipedia!
7. I encourage you to share these numbers with your communities. Perhaps
you'd like to overtake the wiki just above yours? :)
8. I'm happy to add additional languages to the table, by request. Or you
can do it yourself, too. :)
 Yay #100wikidays :) https://meta.wikimedia.org/wiki/100wikidays
Wikimedia Foundation <http://www.wikimediafoundation.org>
Imagine a world in which every single human being can freely share in the
sum of all knowledge. Help us make it a reality!
Based on comments that I received on Wikimedia-l, I would like to invite
people to a casual online meetup one hour before the monthly WMF Metrics
and Activities Meeting.
There will be no set agenda. You can come with questions or ideas that you
would like to discuss. Please be willing to listen to questions and ideas
from other Wikimedians.
I will host the meeting with the Zoom software. You can join with software
or by using your phone. If you join by phone then your phone number will be
visible to other participants.
The primary language of the meeting will be English, but if people would
like to communicate in diverse languages then that is okay too. We can
facilitate translation by text chat. Many Wikimedians, myself included, are
multilingual in varying degrees, so we might try to have live
Here is information about how to connect:
Join from PC, Mac, Linux, iOS or Android: https://zoom.us/j/136978210
Or iPhone one-tap :
Dial (for higher quality, dial a number based on your current
Argentina: +54 341 512 2188
Australia: +61 (0) 2 8015 2088 or +61 (0) 8 7150 1149
Canada: +1 647 558 0588
Hong Kong, China: +852 5808 6088
France: +33 (0) 1 8288 0188 or +33 (0) 7 5678 4048
Germany: +49 (0) 30 3080 6188 or +49 (0) 30 5679 5800
Israel: +972 (0) 3 978 6688
Italy: +39 069 480 6488
Japan: +81 (0) 3 4578 1488 or +81 524 564 439
Mexico: +52 229 910 0061 or +52 554 161 4288
Spain: +34 84 368 5025 or +34 91 198 0188
Sweden: +46 (0) 7 6692 0434 or +46 (0) 8 4468 2488
Russia: +7 495 283 9788
United Kingdom: +44 (0) 20 3051 2874 or +44 (0) 20 3695 0088
US: +1 408 638 0986 or +1 646 558 8665
Meeting ID: 136 978 210
International numbers available: https://zoom.us/u/ekaPibJIy
The first "Wikimedia Café" meetup will be on 30 August 2018, at 17:00 UTC /
Let me emphasize that the environment won't be like this
so please don't feel intimated if you are nervous about public speaking.
(If a conversation feels to me like it is becoming uncivil or intimidating,
then I will ask the debaters to quiet themselves or to move to somewhere
else.) The meeting will generally have an environment that is more like this
I anticipate that few people will come, which is okay. I hope that if you
come then you will enjoy the environment and conversation.
Until next time,
( https://meta.wikimedia.org/wiki/User:Pine )
In November 2016, I presented the result of a joint research that
helped us understand English Wikipedia readers better. (Presentation
at https://www.youtube.com/watch?v=xIaMuWA84bY ). I talked about how
we used English, Persian, and Spanish Wikipedia readers' inputs to
build a taxonomy of Wikipedia use-cases along several dimensions,
capturing users’ motivations to visit Wikipedia, the depth of
knowledge they are seeking, and their knowledge of the topic of
interest prior to visiting Wikipedia. I also talked about the results
of the study we did to quantify the prevalence of these use-cases via
a large-scale user survey conducted on English Wikipedia. In that
study, we also matched survey responses to the respondents’ digital
traces in Wikipedia’s server logs which enabled us in discovering
behavioral patterns associated with specific use-cases. You can read
the full study at https://arxiv.org/abs/1702.05379 .
==What do we want to do now?==
There are quite a few directions this research can continue on, and
the most immediate one is to understand whether the results that we
observe (in English Wikipeida) is robust across languages/cultures.
For this, we are going to repeat the study, but this time in more
languages. Here are the languages on our list: Arabic, Dutch, English,
Hindi, Japanese, Spanish (thanks to all the volunteers who have been
helping us translating all survey related documents to these
==What about your language?==
If your language is not one of the six languages above and you'd like
to learn about the readers of Wikipedia in it (in the specific ways
described above), please get back to me by Monday, April 24, AoE. I
cannot guarantee that we can run the study in your language, however,
I guarantee that we will give it a good try if you're interested. The
decision to include more languages will depend on: our capacity to do
the analysis, the speed at which your community can help us translate
the material to the language, the traffic to that language, a couple
of sentences on how you'd think the result can help your community,
and your willingness to help us document the results for your language
(Quite some work will need to go to have readable/usable
documentations available and we are too small to be able to guarantee
that on our own for many languages.)
Senior Research Scientist
Mindful of the ongoing discussions about conferences, I think that it
would be helpful to have a big picture understanding of the goals, plans,
and budgets for conferences collectively.
As far as I know, these are the types of recurring conferences:
(1) Wikimania, which seems to be a multi-purpose international conference,
with somewhat open admission if someone can afford to attend, is willing to
attend, and can get the necessary legal permissions;
(2) the Wikimedia Summit (which I hope will get a name change to reflect
its actual scope, because it's not an all-Wikimedia summit) which will
focus on WMF, WMF committees that work with WMF affiliate organizations,
and WMF affiliate organizations;
(3) thematic conferences, such as the Wikisource Conference;
(4) regional conferences, such as WikiConference North America;
(5) organization-specific meetings of various kinds, including affiliate
organizations' annual general meetings and WMF All Hands, and
(6) the Wikimedia Technical Conference.
I believe that WMF intended to do some strategic planning for the
collection of conferences as a part of the larger WMF-led strategic
planning process. Is this type of planning underway for conferences, and if
so can we get an update from someone who is familiar with the situation? If
the person who will respond is a paid staff member, then please feel free
to wait to respond until a convenient workday next week. In the meantime,
other people may wish to comment or ask questions.
( https://meta.wikimedia.org/wiki/User:Pine )
Semantic Web languages allow to express ontologies and knowledge bases in a
way meant to be particularly amenable to the Web. Ontologies formalize the
shared understanding of a domain. But the most expressive and widespread
languages that we know of are human natural languages, and the largest
knowledge base we have is the wealth of text written in human languages.
We looks for a path to bridge the gap between knowledge representation
languages such as OWL and human natural languages such as English. We
propose a project to simultaneously expose that gap, allow to collaborate
on closing it, make progress widely visible, and is highly attractive and
valuable in its own right: a Wikipedia written in an abstract language to
be rendered into any natural language on request. This would make current
Wikipedia editors about 100x more productive, and increase the content of
Wikipedia by 10x. For billions of users this will unlock knowledge they
currently do not have access to.
My first talk on this topic will be on October 10, 2018, 16:45-17:00, at
the Asilomar in Monterey, CA during the Blue Sky track of ISWC. My second,
longer talk on the topic will be at the DL workshop in Tempe, AZ, October
27-29. Comments are very welcome as I prepare the slides and the talk.
Link to the paper: http://simia.net/download/abstractwikipedia.pdf