Hello
The Wiki Loves Women team launched a podcast a few weeks ago.
We have released 5 episodes so far, with a frequency of two episodes per
month.
All episodes are available on the usual podcast platforms, or may be
accessed on Wiki Loves Women website with additional notes about each
episode.
https://podcast.wikiloveswomen.org
The latest episode features Angela Lungati, current CEO of Ushaidi.
If you are interested to receive a brief message on your talk each time
a new episode is published, please drop your name here :
https://meta.wikimedia.org/wiki/Wiki_Loves_Women/Podcast#Subscribe
Anthere
------------------
About Inspiring Open
Inspiring Open is a podcast series about women from Wiki Loves Women
that celebrates the inspirational women whose careers and personal
ethics intersect with the Open movement. Each episode features a dynamic
woman from Africa who has pushed the boundaries of what it means to
build communities and succeed as a collective. As a podcast series, it
is available at anytime, anywhere to amplify the motivational stories of
each guest, as spoken in their own voice. Listen to their personal
journeys in conversation with host Betty Kankam-Boadu.
Join Inspiring Open as we raise the global visibility and profiles of
women who are redefining and reclaiming the Open sector.
Be inspired • Be challenged • Be bold!
This paper (first reference) is the result of a class project I was part of
almost two years ago for CSCI 5417 Information Retrieval Systems. It builds
on a class project I did in CSCI 5832 Natural Language Processing and which
I presented at Wikimania '07. The project was very late as we didn't send
the final paper in until the day before new years. This technical report was
never really announced that I recall so I thought it would be interesting to
look briefly at the results. The goal of this paper was to break articles
down into surface features and latent features and then use those to study
the rating system being used, predict article quality and rank results in a
search engine. We used the [[random forests]] classifier which allowed us to
analyze the contribution of each feature to performance by looking directly
at the weights that were assigned. While the surface analysis was performed
on the whole english wikipedia, the latent analysis was performed on the
simple english wikipedia (it is more expensive to compute). = Surface
features = * Readability measures are the single best predictor of quality
that I have found, as defined by the Wikipedia Editorial Team (WET). The
[[Automated Readability Index]], [[Gunning Fog Index]] and [[Flesch-Kincaid
Grade Level]] were the strongest predictors, followed by length of article
html, number of paragraphs, [[Flesh Reading Ease]], [[Smog Grading]], number
of internal links, [[Laesbarhedsindex Readability Formula]], number of words
and number of references. Weakly predictive were number of to be's, number
of sentences, [[Coleman-Liau Index]], number of templates, PageRank, number
of external links, number of relative links. Not predictive (overall - see
the end of section 2 for the per-rating score breakdown): Number of h2 or
h3's, number of conjunctions, number of images*, average word length, number
of h4's, number of prepositions, number of pronouns, number of interlanguage
links, average syllables per word, number of nominalizations, article age
(based on page id), proportion of questions, average sentence length. :*
Number of images was actually by far the single strongest predictor of any
class, but only for Featured articles. Because it was so good at picking out
featured articles and somewhat good at picking out A and G articles the
classifier was confused in so many cases that the overall contribution of
this feature to classification performance is zero. :* Number of external
links is strongly predictive of Featured articles. :* The B class is highly
distinctive. It has a strong "signature," with high predictive value
assigned to many features. The Featured class is also very distinctive. F, B
and S (Stop/Stub) contain the most information.
:* A is the least distinct class, not being very different from F or G. =
Latent features = The algorithm used for latent analysis, which is an
analysis of the occurence of words in every document with respect to the
link structure of the encyclopedia ("concepts"), is [[Latent Dirichlet
Allocation]]. This part of the analysis was done by CS PhD student Praful
Mangalath. An example of what can be done with the result of this analysis
is that you provide a word (a search query) such as "hippie". You can then
look at the weight of every article for the word hippie. You can pick the
article with the largest weight, and then look at its link network. You can
pick out the articles that this article links to and/or which link to this
article that are also weighted strongly for the word hippie, while also
contributing maximally to this articles "hippieness". We tried this query in
our system (LDA), Google (site:en.wikipedia.org hippie), and the Simple
English Wikipedia's Lucene search engine. The breakdown of articles occuring
in the top ten search results for this word for those engines is: * LDA
only: [[Acid rock]], [[Aldeburgh Festival]], [[Anne Murray]], [[Carl
Radle]], [[Harry Nilsson]], [[Jack Kerouac]], [[Phil Spector]], [[Plastic
Ono Band]], [[Rock and Roll]], [[Salvador Allende]], [[Smothers brothers]],
[[Stanley Kubrick]]. * Google only: [[Glam Rock]], [[South Park]]. * Simple
only: [[African Americans]], [[Charles Manson]], [[Counterculture]], [[Drug
use]], [[Flower Power]], [[Nuclear weapons]], [[Phish]], [[Sexual
liberation]], [[Summer of Love]] * LDA & Google & Simple: [[Hippie]],
[[Human Be-in]], [[Students for a democratic society]], [[Woodstock
festival]] * LDA & Google: [[Psychedelic Pop]] * Google & Simple: [[Lysergic
acid diethylamide]], [[Summer of Love]] ( See the paper for the articles
produced for the keywords philosophy and economics ) = Discussion /
Conclusion = * The results of the latent analysis are totally up to your
perception. But what is interesting is that the LDA features predict the WET
ratings of quality just as well as the surface level features. Both feature
sets (surface and latent) both pull out all almost of the information that
the rating system bears. * The rating system devised by the WET is not
distinctive. You can best tell the difference between, grouped together,
Featured, A and Good articles vs B articles. Featured, A and Good articles
are also quite distinctive (Figure 1). Note that in this study we didn't
look at Start's and Stubs, but in earlier paper we did. :* This is
interesting when compared to this recent entry on the YouTube blog. "Five
Stars Dominate Ratings"
http://youtube-global.blogspot.com/2009/09/five-stars-dominate-ratings.html…
I think a sane, well researched (with actual subjects) rating system
is
well within the purview of the Usability Initiative. Helping people find and
create good content is what Wikipedia is all about. Having a solid rating
system allows you to reorganized the user interface, the Wikipedia
namespace, and the main namespace around good content and bad content as
needed. If you don't have a solid, information bearing rating system you
don't know what good content really is (really bad content is easy to spot).
:* My Wikimania talk was all about gathering data from people about articles
and using that to train machines to automatically pick out good content. You
ask people questions along dimensions that make sense to people, and give
the machine access to other surface features (such as a statistical measure
of readability, or length) and latent features (such as can be derived from
document word occurence and encyclopedia link structure). I referenced page
262 of Zen and the Art of Motorcycle Maintenance to give an example of the
kind of qualitative features I would ask people. It really depends on what
features end up bearing information, to be tested in "the lab". Each word is
an example dimension of quality: We have "*unity, vividness, authority,
economy, sensitivity, clarity, emphasis, flow, suspense, brilliance,
precision, proportion, depth and so on.*" You then use surface and latent
features to predict these values for all articles. You can also say, when a
person rates this article as high on the x scale, they also mean that it has
has this much of these surface and these latent features.
= References =
- DeHoust, C., Mangalath, P., Mingus., B. (2008). *Improving search in
Wikipedia through quality and concept discovery*. Technical Report.
PDF<http://grey.colorado.edu/mediawiki/sites/mingus/images/6/68/DeHoustMangalat…>
- Rassbach, L., Mingus., B, Blackford, T. (2007). *Exploring the
feasibility of automatically rating online article quality*. Technical
Report. PDF<http://grey.colorado.edu/mediawiki/sites/mingus/images/d/d3/RassbachPincock…>
Hello everyone,
Wikimedia is participating in the winter edition of this year's Outreachy <
https://www.outreachy.org/> [1] (December 2022–March 2023)! The deadline to
submit projects on the Outreachy website is September 30th, 2022. We are
currently working on a list of interesting project ideas. If you have some
ideas for coding or non-coding (design, documentation, translation,
outreach, research) projects, share them here: <
https://phabricator.wikimedia.org/T313361> [2].
*About the Outreachy program*
Outreachy offers three-month internships to work remotely in Free and Open
Source Software (FOSS), coding, and non-coding projects with experienced
mentors. These internships run twice a year–from May to August and December
to March. Interns are paid a stipend of USD 7000 for the three months of
work. Interns often find employment after their internship with Outreachy
sponsors or jobs that use the skills they learned during their internship.
This program is open to both students and non-students. Outreachy expressly
invites the following people to apply:
* Women (both cis and trans), trans men, and genderqueer people.
* Anyone who faces under-representation, systematic bias, or discrimination
in the technology industry in their country of residence.
* Residents and nationals of the United States of any gender who are
Black/African American, Hispanic/Latinx, Native American/American Indian,
Alaska Native, Native Hawaiian, or Pacific Islander.
See a blog post highlighting the experiences and outcomes of interns who
participated in a previous round of Outreachy with Wikimedia <
https://techblog.wikimedia.org/2021/06/02/outreachy-round-21-experiences-an…>
[3]
*Tips for mentors for proposing projects*
* Follow this task description template when you propose a project in
Phabricator: <
https://phabricator.wikimedia.org/tag/outreach-programs-projects> [4]. Add
#Outreachy-Round-25 tag.
* Project should require an experienced developer ~15 days and a newcomer
~3 months to complete.
* Each project should have at least two mentors, with one of them holding a
technical background.
* Ideally, the project has no tight deadlines, a moderate learning curve,
and fewer dependencies on Wikimedia's core infrastructure. Projects
addressing the needs of a language community are most welcome.
* If you don't have an idea in mind and would like to pick one from an
existing list, check out these projects: <
https://phabricator.wikimedia.org/tag/outreach-programs-projects/> [4]
* To learn more about the roles and responsibilities of mentors, visit our
resources on MediaWiki.org: <
https://www.mediawiki.org/wiki/Outreachy/Mentors> [5].
We look forward to your participation!
Cheers,
Srishti
[1] https://www.outreachy.org/
[2] https://phabricator.wikimedia.org/T313361
[3]
https://techblog.wikimedia.org/2021/06/02/outreachy-round-21-experiences-an…
[4] https://phabricator.wikimedia.org/tag/outreach-programs-projects/
[5] https://www.mediawiki.org/wiki/Outreachy/Mentors
*Srishti Sethi*
Senior Developer Advocate
Wikimedia Foundation <https://wikimediafoundation.org/>
Hi all,
I want to bring a legal concern here on Google's misuse of our content. [It came up today on Twitter](https://twitter.com/epineda/status/1564143156702199813?s=20&t=z2xu… that the GoogleTV app had linked a movie description text in Catalan language (which in principle it should be good news regarding language normalization). However, shortly after a wikipedian colleague realised that the text was fully taken by the Catalan Wikipedia. Once I downloaded the app by myself, I double-checked that Google does not specify anywhere (or at least that I could find minimally visible) that those lines belong to Wikipedia: neither the origin, the license, nor a link to the full article or to the CC license.
I'd like to recall the licensing footpage on Wikipedia(Text is available under the [Creative Commons Attribution-ShareAlike License 3.0](https://en.wikipedia.org/wiki/Wikipedia:Text_of_Creative_Commons_Attri…) and its conditions, as well as to ask others to check whether there's more situations like this one. It's worth noting how wrong this is to minoritised language Wikipedias: not only the legal issue itself, but also the lack of legitimate clicks and views that we end up losing, the confusion and misunderstandings from the readers that think this is a win by Google (the example I shared, with both screenshots enclosed), and even a subsequent chicken-and-egg situation that can lead to deleted articles by some users thinking that the content was stolen from Google and not actually the opposite.
I remember that there was a previous thread here, not so long ago, about the problems of Google taking over our data and therefore diminishing clicks to the Wikimedia projects. Considering that I am fully against the GAFAM-drift that the WMF is increasingly adopting by benefiting from Google in our human, economical and digital structures, I prefer to share it here as well -and not only to the legal team of the WMF (cced).
Kind regards,
Xavier Dengra
Dear Wikimedians,
I'm delighted to invite you to episode 15 of WikiAfrica Hour, titled
*Wikimedians
In Residence*.
The session is focused on shining light on the marriage between Wikimedia
and some host organisations and its members who are interested in a
productive relationship with the encyclopedia and its community.
To make the experience a fun and memorable one, we have invited the
following guest speakers:
1. Bobby Shabangu - WiR, United Nations Development Programme,South
Africa
2. Florence Devouard - WiR, World Intellectual Property Organization
(WIPO)
3. Nicolas Vigneron - WiR, Clermont Auvergne University
4. Alice Kibombo - WiR, African Library and Information Associations
(AfLIA)
5. Daniel Obiokeke - WiR, The Africa Narrative
Date: 2nd September 2022
Time: 4pm UTC
Details: https://w.wiki/5dft <https://t.co/Z5FogXbr5l>
Regards,
Ceslause Ogbonnaya
*Host,WikiAfrica Hour *
Dear all,
The Africa Knowledge Initiative (AKI) working group is happy to announce a
call for applications for Implementing partners for the various campaigns
in the project
<https://diff.wikimedia.org/2022/08/04/africa-knowledge-initiative-welcomes-…>
.
The Implementing partners will be responsible for organizing a campaign in
one of these African Union Holidays:
- Africa Youth Day – Nov 1, 2022
- Environment/Wangari Maathai Day – Mar 3, 2023
- Africa Day – May 25, 2023
Interested organizations, affiliates (user groups/chapters) and groups are
welcome to submit applications via this application form
<https://docs.google.com/forms/d/e/1FAIpQLSfyaySvNhTPF-F0WvwMdRHIUcWGgyYAtDc…>
.
*Minimum Requirements*
- Must be an organization, affiliate (user groups or chapters) or a group
- Experience organizing international campaigns
- Experience organizing around the topical area of interest
- Must be in good standing with the grants system/affcom requirements at
the WMF
*NB:* Only shortlisted applicants will be contacted.
*Application deadline is 4th September 2022*.
Enquiries about the application or the Africa Knowledge Initiative (AKI)
project should be sent to campaigns(a)wikimedia.org.
Regards,
Ceslause Ogbonnaya
*Wikimedian In Residence, Africa Knowledge Initiative (AKI)*
Chers Wikimédien.ne.s,
Le groupe de travail Africa Knowledge Initiative (AKI) est heureux
d'annoncer un appel à candidatures pour les partenaires d'exécution des
différentes campagnes du projet
<https://diff.wikimedia.org/2022/08/04/africa-knowledge-initiative-welcomes-…>
.
Les partenaires d'exécution seront responsables de l'organisation d'une
campagne dans l'un de ces jours fériés de l'Union Africaine:
- Journée de jeunesse africaine – 1er novembre 2022
- Journée d'environnement/Wangari Maathai - 3 mars 2023
- Journée d'Afrique – 25 mai 2023
Les organisations intéressées, les affiliés (groupes
d'utilisateurs/chapitres) et les groupes sont invités à soumettre des
candidatures via ce formulaire de candidature
<https://docs.google.com/forms/d/e/1FAIpQLSfyaySvNhTPF-F0WvwMdRHIUcWGgyYAtDc…>
.
*Exigences minimales*
- Doit être une organisation, une affilié (groupes d'utilisateurs ou
chapitres) ou un groupe
- Expérience dans l'organisation de campagnes internationales
- Expérience d'organisation autour du domaine d'intérêt
- Doit être en règle avec les exigences du système de subventions /
d’Affcom à la Fondation Wikimedia
*NOTEZ:* Seuls les candidats présélectionnés seront contactés.
*La date limite de candidature est le 4 septembre 2022*.
Les demandes de renseignements sur la candidature ou le projet Africa
Knowledge Initiative (AKI) doivent être envoyées à campagnes(a)wikimedia.org.
Cordialement,
Ceslause Ogbonnaya
*Wikimédien en R**é**sidence, Africa Knowledge Initiative (AKI)*
Hi,
Tomorrow at the HOPE 2022 conference, I'm giving a talk titled, "How to
Run a Top-10 Website, Publicly and Transparently", discussing the impact
of transparency in Wikimedia's technical spaces. A number of people have
expressed interest in watching, including non-technical users, so I'm
advertising it a bit more broadly.
I apologize for the short notice, I didn't realize the stream would be
free to watch until yesterday (thanks Ori!).
Time: 2022-07-23 17:00 UTC (1pm ET) -
https://zonestamp.toolforge.org/1658595637
Stream: https://hope.net/416dac.html
If you can't watch it live, a recording will be uploaded later on.
I've documented all of this on-wiki, including the full abstract:
<https://meta.wikimedia.org/wiki/User:Legoktm/HOPE_2022>.
I am of course happy to answer any questions people might have after the
talk!
Thanks,
-- Kunal / Legoktm