On behalf of Wikimedia Taiwan, we would like to say that this is long overdue. For more than half a decade, good faith volunteers from Hong Kong, the Chinese mainland, and Taiwan have raised concerns about dangerous members of that organization, including in the Signpost(1) (repeatedly(2)). It is not the kind of threat that communities or even larger ones, like our wiki, can deal with entirely on their own. We have been having very exhausting years.
Now there is some hope. But we have a lot of work ahead of us as a volunteer community, and we call upon the Foundation to meet its commitment of support as we do. We need to rebuild an inclusive wiki that welcomes everyone from all places who wants to contribute to Chinese language Wikipedia in good faith. Many people have felt very unsafe for years, so restoring a shared sense of comfort will likely take a long time. Doing this work is very important to get back to focusing on knowledge and Wikipedia’s five pillars that should unite our community.
Yuan Chang, Chairman of Wikimedia Taiwan
(1) https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2019-10-31/In_fo…
(2) https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2021-07-25/Speci…
中文版本:
在此僅代表台灣維基媒體協會聲明,這個行動是遲來的努力。多年來,香港、中國大陸和台灣的用戶一再呼籲對該組織中的危險成員與行為的關切,包括但不僅止於之前於Signpost報導(1) (另一則報導(2))中所提及。這不是社群,或甚至整個中文維基,可以自行處理的威脅。這些年我們心力交瘁。
現在的處置,讓我們覺得終於有了一些希望。但作為志願者組織,我們仍有很多工作要做,並希望基金會能大力支持。我們需要重建一個具有包容性的維基百科,歡迎來自所有地區,願意真誠貢獻中文知識的參與者。這幾年來,許多參與者感到不安,要恢復原本平和的氛圍,需要相當長時間的努力。這個工作非常重要,有助於我們把心力集中在知識,以及團結我們社群的維基百科的五大支柱上。
台灣維基媒體協會理事長 張遠
(1) https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2019-10-31/In_fo…
(2) https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2021-07-25/Speci…
Original Link of the statement: https://meta.wikimedia.org/wiki/Wikimedia_Taiwan/Declaration/Wikimedia_Taiw…
Dear All,
Please join me in welcoming Luis Bitencourt-Emilio to the Wikimedia
Foundation Board of Trustees. Luis was unanimously appointed to a 3-year
term and replaces a board-selected Trustee, Lisa Lewin, whose term ended in
November 2021 [1].
Currently based in São Paulo, Luis is the Chief Technology Officer at Loft,
a technology startup in the real-estate industry. He brings product and
technology experience from a globally diverse career that has spanned large
technology companies including Microsoft, online networking sites like
Reddit, and a series of entrepreneurial technology ventures focused in the
USA and Latin America. Luis has led product and technology teams across
Latin America, the United States, Europe and Asia. He is passionately
involved in building and promoting the entrepreneurial ecosystem for Latin
American-based startups.
Luis has more than two decades of experience across product development,
software engineering, and data science. At Microsoft, he led engineering
teams shipping multiple Microsoft Office products. At Reddit, he led the
Knowledge Group, an engineering team that owned critical functions such as
data, machine learning, abuse detection and search. He was deeply involved
in Reddit’s growth stage and worked closely with Reddit’s communities in
that evolution. Luis also co-founded a fintech startup to help millennials
manage and automate their finances.
His career has also been shaped by a visible commitment to recruiting
diverse leaders. At Reddit, Luis was a key member of the recruitment
efforts that achieved equal representation of women engineering directors.
Luis says his proudest achievement at Microsoft was building their
Brazilian talent pipeline by working closely with local universities to
place thousands of engineering candidates at Microsoft, as well as his
involvement in expanding global recruitment to markets including Ukraine,
Poland, Great Britain, the EU and Mexico.
Luis was educated in Brazil and the United States, receiving a Bachelor of
Science in Computer Engineering with Honors from the University of
Maryland. He is fluent in Portuguese, Spanish and English. He is also a
proud father and dog lover.
I would like to thank the Governance Committee, chaired by Dariusz
Jemielniak, for this nomination process as well as volunteers in our
Spanish and Portuguese speaking communities who also met with Luis or
shared their experiences.
You can find an official announcement here [2].
PS. You can help translate or find translations of this message on
Meta-Wiki:
https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Board_noticeboard/Janu…
[1] Lisa Lewin served from January 2019 till November 2021:
https://foundation.wikimedia.org/wiki/Resolution:Renewing_Lisa_Lewin%E2%80%…
[2]
https://diff.wikimedia.org/2022/01/12/luis-bitencourt-emilio-joins-wikimedi…
Best regards,
antanana / Nataliia Tymkiv
Chair, Wikimedia Foundation Board of Trustees
*NOTICE: You may have received this message outside of your normal working
hours/days, as I usually can work more as a volunteer during weekend. You
should not feel obligated to answer it during your days off. Thank you in
advance!*
As others mentioned in the thread, WMF can't enforce this directly as it is
not the copyright holder. However, in past instances, we have raised the
issue with Google (similar to the KPN example) and will do so for this one
as well.
I am meeting with Google later today and will flag this to remind them of
the copyright obligations that come with using this text.
Thanks for surfacing this,
Nicholas
On Tue, Aug 30, 2022 at 12:58 PM <wikimedia-l-request(a)lists.wikimedia.org>
wrote:
> Send Wikimedia-l mailing list submissions to
> wikimedia-l(a)lists.wikimedia.org
>
> To subscribe or unsubscribe, please visit
>
> https://lists.wikimedia.org/postorius/lists/wikimedia-l.lists.wikimedia.org/
>
> You can reach the person managing the list at
> wikimedia-l-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Wikimedia-l digest..."
>
> Today's Topics:
>
> 1. Re: Is GoogleTV violating Wikipedia's license? (Peter Southwood)
> 2. Re: Is GoogleTV violating Wikipedia's license? (Ciell Wikipedia)
> 3. Re: [Small Wiki Toolkit] Writing Wikidata Queries Using WDQS Tool
> Workshop On Tuesday, August 30th, 16:00 UTC
> (Seyram Komla Sapaty)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 30 Aug 2022 11:18:53 +0200
> From: "Peter Southwood" <peter.southwood(a)telkomsa.net>
> Subject: [Wikimedia-l] Re: Is GoogleTV violating Wikipedia's license?
> To: "'F. Xavier Dengra i Grau'" <xavier.dengra(a)protonmail.com>,
> "'Wikimedia Mailing List'" <wikimedia-l(a)lists.wikimedia.org>
> Message-ID: <002201d8bc51$8963c6a0$9c2b53e0$(a)telkomsa.net>
> Content-Type: multipart/alternative;
> boundary="----=_NextPart_000_0023_01D8BC62.4CF06730"
>
> If I understand the CC-by-sa licence correctly, Wikipedia and WMF
> themselves do not own the copyright, it is owned by the contributors who
> created the text. They can take this up with Google, the WMF cannot. If you
> are one of those contributors you can approach Google as misusing your
> copyright.
>
> Cheers, Peter
>
>
>
> From: F. Xavier Dengra i Grau via Wikimedia-l [mailto:
> wikimedia-l(a)lists.wikimedia.org]
> Sent: 29 August 2022 19:00
> To: Wikimedia Mailing List; legal(a)wikimedia.org
> Cc: F. Xavier Dengra i Grau
> Subject: [Wikimedia-l] Is GoogleTV violating Wikipedia's license?
>
>
>
> Hi all,
>
>
>
> I want to bring a legal concern here on Google's misuse of our content. It
> came up today <
> https://twitter.com/epineda/status/1564143156702199813?s=20&t=z2xu6PMB29vvk…>
> on Twitter that the GoogleTV app had linked a movie description text in
> Catalan language (which in principle it should be good news regarding
> language normalization). However, shortly after a wikipedian colleague
> realised that the text was fully taken by the Catalan Wikipedia. Once I
> downloaded the app by myself, I double-checked that Google does not specify
> anywhere (or at least that I could find minimally visible) that those lines
> belong to Wikipedia: neither the origin, the license, nor a link to the
> full article or to the CC license.
>
>
>
> I'd like to recall the licensing footpage on Wikipedia (Text is available
> under the Creative Commons Attribution-ShareAlike License 3.0 <
> https://en.wikipedia.org/wiki/Wikipedia:Text_of_Creative_Commons_Attributio…>
> ) and its conditions, as well as to ask others to check whether there's
> more situations like this one. It's worth noting how wrong this is to
> minoritised language Wikipedias: not only the legal issue itself, but also
> the lack of legitimate clicks and views that we end up losing, the
> confusion and misunderstandings from the readers that think this is a win
> by Google (the example I shared, with both screenshots enclosed), and even
> a subsequent chicken-and-egg situation that can lead to deleted articles by
> some users thinking that the content was stolen from Google and not
> actually the opposite.
>
>
>
> I remember that there was a previous thread here, not so long ago, about
> the problems of Google taking over our data and therefore diminishing
> clicks to the Wikimedia projects. Considering that I am fully against the
> GAFAM-drift that the WMF is increasingly adopting by benefiting from Google
> in our human, economical and digital structures, I prefer to share it here
> as well -and not only to the legal team of the WMF (cced).
>
>
>
> Kind regards,
>
>
>
> Xavier Dengra
>
>
>
>
>
>
>
>
> <
> http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_cam…>
> width=
>
> Virus-free. <
> http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_cam…>
> www.avg.com
>
>
>
>
Hello
The Wiki Loves Women team launched a podcast a few weeks ago.
We have released 5 episodes so far, with a frequency of two episodes per
month.
All episodes are available on the usual podcast platforms, or may be
accessed on Wiki Loves Women website with additional notes about each
episode.
https://podcast.wikiloveswomen.org
The latest episode features Angela Lungati, current CEO of Ushaidi.
If you are interested to receive a brief message on your talk each time
a new episode is published, please drop your name here :
https://meta.wikimedia.org/wiki/Wiki_Loves_Women/Podcast#Subscribe
Anthere
------------------
About Inspiring Open
Inspiring Open is a podcast series about women from Wiki Loves Women
that celebrates the inspirational women whose careers and personal
ethics intersect with the Open movement. Each episode features a dynamic
woman from Africa who has pushed the boundaries of what it means to
build communities and succeed as a collective. As a podcast series, it
is available at anytime, anywhere to amplify the motivational stories of
each guest, as spoken in their own voice. Listen to their personal
journeys in conversation with host Betty Kankam-Boadu.
Join Inspiring Open as we raise the global visibility and profiles of
women who are redefining and reclaiming the Open sector.
Be inspired • Be challenged • Be bold!
This paper (first reference) is the result of a class project I was part of
almost two years ago for CSCI 5417 Information Retrieval Systems. It builds
on a class project I did in CSCI 5832 Natural Language Processing and which
I presented at Wikimania '07. The project was very late as we didn't send
the final paper in until the day before new years. This technical report was
never really announced that I recall so I thought it would be interesting to
look briefly at the results. The goal of this paper was to break articles
down into surface features and latent features and then use those to study
the rating system being used, predict article quality and rank results in a
search engine. We used the [[random forests]] classifier which allowed us to
analyze the contribution of each feature to performance by looking directly
at the weights that were assigned. While the surface analysis was performed
on the whole english wikipedia, the latent analysis was performed on the
simple english wikipedia (it is more expensive to compute). = Surface
features = * Readability measures are the single best predictor of quality
that I have found, as defined by the Wikipedia Editorial Team (WET). The
[[Automated Readability Index]], [[Gunning Fog Index]] and [[Flesch-Kincaid
Grade Level]] were the strongest predictors, followed by length of article
html, number of paragraphs, [[Flesh Reading Ease]], [[Smog Grading]], number
of internal links, [[Laesbarhedsindex Readability Formula]], number of words
and number of references. Weakly predictive were number of to be's, number
of sentences, [[Coleman-Liau Index]], number of templates, PageRank, number
of external links, number of relative links. Not predictive (overall - see
the end of section 2 for the per-rating score breakdown): Number of h2 or
h3's, number of conjunctions, number of images*, average word length, number
of h4's, number of prepositions, number of pronouns, number of interlanguage
links, average syllables per word, number of nominalizations, article age
(based on page id), proportion of questions, average sentence length. :*
Number of images was actually by far the single strongest predictor of any
class, but only for Featured articles. Because it was so good at picking out
featured articles and somewhat good at picking out A and G articles the
classifier was confused in so many cases that the overall contribution of
this feature to classification performance is zero. :* Number of external
links is strongly predictive of Featured articles. :* The B class is highly
distinctive. It has a strong "signature," with high predictive value
assigned to many features. The Featured class is also very distinctive. F, B
and S (Stop/Stub) contain the most information.
:* A is the least distinct class, not being very different from F or G. =
Latent features = The algorithm used for latent analysis, which is an
analysis of the occurence of words in every document with respect to the
link structure of the encyclopedia ("concepts"), is [[Latent Dirichlet
Allocation]]. This part of the analysis was done by CS PhD student Praful
Mangalath. An example of what can be done with the result of this analysis
is that you provide a word (a search query) such as "hippie". You can then
look at the weight of every article for the word hippie. You can pick the
article with the largest weight, and then look at its link network. You can
pick out the articles that this article links to and/or which link to this
article that are also weighted strongly for the word hippie, while also
contributing maximally to this articles "hippieness". We tried this query in
our system (LDA), Google (site:en.wikipedia.org hippie), and the Simple
English Wikipedia's Lucene search engine. The breakdown of articles occuring
in the top ten search results for this word for those engines is: * LDA
only: [[Acid rock]], [[Aldeburgh Festival]], [[Anne Murray]], [[Carl
Radle]], [[Harry Nilsson]], [[Jack Kerouac]], [[Phil Spector]], [[Plastic
Ono Band]], [[Rock and Roll]], [[Salvador Allende]], [[Smothers brothers]],
[[Stanley Kubrick]]. * Google only: [[Glam Rock]], [[South Park]]. * Simple
only: [[African Americans]], [[Charles Manson]], [[Counterculture]], [[Drug
use]], [[Flower Power]], [[Nuclear weapons]], [[Phish]], [[Sexual
liberation]], [[Summer of Love]] * LDA & Google & Simple: [[Hippie]],
[[Human Be-in]], [[Students for a democratic society]], [[Woodstock
festival]] * LDA & Google: [[Psychedelic Pop]] * Google & Simple: [[Lysergic
acid diethylamide]], [[Summer of Love]] ( See the paper for the articles
produced for the keywords philosophy and economics ) = Discussion /
Conclusion = * The results of the latent analysis are totally up to your
perception. But what is interesting is that the LDA features predict the WET
ratings of quality just as well as the surface level features. Both feature
sets (surface and latent) both pull out all almost of the information that
the rating system bears. * The rating system devised by the WET is not
distinctive. You can best tell the difference between, grouped together,
Featured, A and Good articles vs B articles. Featured, A and Good articles
are also quite distinctive (Figure 1). Note that in this study we didn't
look at Start's and Stubs, but in earlier paper we did. :* This is
interesting when compared to this recent entry on the YouTube blog. "Five
Stars Dominate Ratings"
http://youtube-global.blogspot.com/2009/09/five-stars-dominate-ratings.html…
I think a sane, well researched (with actual subjects) rating system
is
well within the purview of the Usability Initiative. Helping people find and
create good content is what Wikipedia is all about. Having a solid rating
system allows you to reorganized the user interface, the Wikipedia
namespace, and the main namespace around good content and bad content as
needed. If you don't have a solid, information bearing rating system you
don't know what good content really is (really bad content is easy to spot).
:* My Wikimania talk was all about gathering data from people about articles
and using that to train machines to automatically pick out good content. You
ask people questions along dimensions that make sense to people, and give
the machine access to other surface features (such as a statistical measure
of readability, or length) and latent features (such as can be derived from
document word occurence and encyclopedia link structure). I referenced page
262 of Zen and the Art of Motorcycle Maintenance to give an example of the
kind of qualitative features I would ask people. It really depends on what
features end up bearing information, to be tested in "the lab". Each word is
an example dimension of quality: We have "*unity, vividness, authority,
economy, sensitivity, clarity, emphasis, flow, suspense, brilliance,
precision, proportion, depth and so on.*" You then use surface and latent
features to predict these values for all articles. You can also say, when a
person rates this article as high on the x scale, they also mean that it has
has this much of these surface and these latent features.
= References =
- DeHoust, C., Mangalath, P., Mingus., B. (2008). *Improving search in
Wikipedia through quality and concept discovery*. Technical Report.
PDF<http://grey.colorado.edu/mediawiki/sites/mingus/images/6/68/DeHoustMangalat…>
- Rassbach, L., Mingus., B, Blackford, T. (2007). *Exploring the
feasibility of automatically rating online article quality*. Technical
Report. PDF<http://grey.colorado.edu/mediawiki/sites/mingus/images/d/d3/RassbachPincock…>
Hello everyone,
Wikimedia is participating in the winter edition of this year's Outreachy <
https://www.outreachy.org/> [1] (December 2022–March 2023)! The deadline to
submit projects on the Outreachy website is September 30th, 2022. We are
currently working on a list of interesting project ideas. If you have some
ideas for coding or non-coding (design, documentation, translation,
outreach, research) projects, share them here: <
https://phabricator.wikimedia.org/T313361> [2].
*About the Outreachy program*
Outreachy offers three-month internships to work remotely in Free and Open
Source Software (FOSS), coding, and non-coding projects with experienced
mentors. These internships run twice a year–from May to August and December
to March. Interns are paid a stipend of USD 7000 for the three months of
work. Interns often find employment after their internship with Outreachy
sponsors or jobs that use the skills they learned during their internship.
This program is open to both students and non-students. Outreachy expressly
invites the following people to apply:
* Women (both cis and trans), trans men, and genderqueer people.
* Anyone who faces under-representation, systematic bias, or discrimination
in the technology industry in their country of residence.
* Residents and nationals of the United States of any gender who are
Black/African American, Hispanic/Latinx, Native American/American Indian,
Alaska Native, Native Hawaiian, or Pacific Islander.
See a blog post highlighting the experiences and outcomes of interns who
participated in a previous round of Outreachy with Wikimedia <
https://techblog.wikimedia.org/2021/06/02/outreachy-round-21-experiences-an…>
[3]
*Tips for mentors for proposing projects*
* Follow this task description template when you propose a project in
Phabricator: <
https://phabricator.wikimedia.org/tag/outreach-programs-projects> [4]. Add
#Outreachy-Round-25 tag.
* Project should require an experienced developer ~15 days and a newcomer
~3 months to complete.
* Each project should have at least two mentors, with one of them holding a
technical background.
* Ideally, the project has no tight deadlines, a moderate learning curve,
and fewer dependencies on Wikimedia's core infrastructure. Projects
addressing the needs of a language community are most welcome.
* If you don't have an idea in mind and would like to pick one from an
existing list, check out these projects: <
https://phabricator.wikimedia.org/tag/outreach-programs-projects/> [4]
* To learn more about the roles and responsibilities of mentors, visit our
resources on MediaWiki.org: <
https://www.mediawiki.org/wiki/Outreachy/Mentors> [5].
We look forward to your participation!
Cheers,
Srishti
[1] https://www.outreachy.org/
[2] https://phabricator.wikimedia.org/T313361
[3]
https://techblog.wikimedia.org/2021/06/02/outreachy-round-21-experiences-an…
[4] https://phabricator.wikimedia.org/tag/outreach-programs-projects/
[5] https://www.mediawiki.org/wiki/Outreachy/Mentors
*Srishti Sethi*
Senior Developer Advocate
Wikimedia Foundation <https://wikimediafoundation.org/>
Hi all,
I want to bring a legal concern here on Google's misuse of our content. [It came up today on Twitter](https://twitter.com/epineda/status/1564143156702199813?s=20&t=z2xu… that the GoogleTV app had linked a movie description text in Catalan language (which in principle it should be good news regarding language normalization). However, shortly after a wikipedian colleague realised that the text was fully taken by the Catalan Wikipedia. Once I downloaded the app by myself, I double-checked that Google does not specify anywhere (or at least that I could find minimally visible) that those lines belong to Wikipedia: neither the origin, the license, nor a link to the full article or to the CC license.
I'd like to recall the licensing footpage on Wikipedia(Text is available under the [Creative Commons Attribution-ShareAlike License 3.0](https://en.wikipedia.org/wiki/Wikipedia:Text_of_Creative_Commons_Attri…) and its conditions, as well as to ask others to check whether there's more situations like this one. It's worth noting how wrong this is to minoritised language Wikipedias: not only the legal issue itself, but also the lack of legitimate clicks and views that we end up losing, the confusion and misunderstandings from the readers that think this is a win by Google (the example I shared, with both screenshots enclosed), and even a subsequent chicken-and-egg situation that can lead to deleted articles by some users thinking that the content was stolen from Google and not actually the opposite.
I remember that there was a previous thread here, not so long ago, about the problems of Google taking over our data and therefore diminishing clicks to the Wikimedia projects. Considering that I am fully against the GAFAM-drift that the WMF is increasingly adopting by benefiting from Google in our human, economical and digital structures, I prefer to share it here as well -and not only to the legal team of the WMF (cced).
Kind regards,
Xavier Dengra
Dear Wikimedians,
I'm delighted to invite you to episode 15 of WikiAfrica Hour, titled
*Wikimedians
In Residence*.
The session is focused on shining light on the marriage between Wikimedia
and some host organisations and its members who are interested in a
productive relationship with the encyclopedia and its community.
To make the experience a fun and memorable one, we have invited the
following guest speakers:
1. Bobby Shabangu - WiR, United Nations Development Programme,South
Africa
2. Florence Devouard - WiR, World Intellectual Property Organization
(WIPO)
3. Nicolas Vigneron - WiR, Clermont Auvergne University
4. Alice Kibombo - WiR, African Library and Information Associations
(AfLIA)
5. Daniel Obiokeke - WiR, The Africa Narrative
Date: 2nd September 2022
Time: 4pm UTC
Details: https://w.wiki/5dft <https://t.co/Z5FogXbr5l>
Regards,
Ceslause Ogbonnaya
*Host,WikiAfrica Hour *