Hello everyone - apologies for cross-posting! *TL;DR*: We would like your
feedback on our Metrics Kit project. Please have a look and comment on
The Wikimedia Foundation's Trust and Safety team, in collaboration with the
Community Health Initiative, is working on a Metrics Kit designed to
measure the relative "health" of various communities that make up the
The ultimate outcome will be a public suite of statistics and data looking
at various aspects of Wikimedia project communities. This could be used by
both community members to make decisions on their community direction and
Wikimedia Foundation staff to point anti-harassment tool development in the
We have a set of metrics we are thinking about including in the kit,
ranging from the ratio of active users to active administrators,
administrator confidence levels, and off-wiki factors such as freedom to
participate. It's ambitious, and our methods of collecting such data will
Right now, we'd like to know:
* Which metrics make sense to collect? Which don't? What are we missing?
* Where would such a tool ideally be hosted? Where would you normally look
for statistics like these?
* We are aware of the overlap in scope between this and Wikistats <
https://stats.wikimedia.org/v2/#/all-projects> — how might these tools
Your opinions will help to guide this project going forward. We'll be
reaching out at different stages of this project, so if you're interested
in direct messaging going forward, please feel free to indicate your
interest by signing up on the consultation page.
Looking forward to reading your thoughts.
P.S.: Please feel free to CC me in conversations that might happen on this
 What do we mean by "health"? There is no standard definition of what
makes a Wikimedia community "healthy", but there are many indicators that
highlight where a wiki is doing well, and where it could improve. This
project aims to provide a variety of useful data points that will inform
community decisions that will benefit from objective data.
*Joe Sutherland* (he/him or they/them)
Trust and Safety Specialist
The next Research Showcase will be live-streamed this Wednesday, June 26,
at 11:30 AM PST/19:30 UTC. We will have three presentations this showcase,
all relating to Wikipedia blocks.
YouTube stream: https://www.youtube.com/watch?v=WiUfpmeJG7E
As usual, you can join the conversation on IRC at #wikimedia-research. You
can also watch our past research showcases here:
This month's presentations:
Trajectories of Blocked Community Members: Redemption, Recidivism and
By Jonathan Chang, Cornell University
Community norm violations can impair constructive communication and
collaboration online. As a defense mechanism, community moderators often
address such transgressions by temporarily blocking the perpetrator. Such
actions, however, come with the cost of potentially alienating community
members. Given this tradeoff, it is essential to understand to what extent,
and in which situations, this common moderation practice is effective in
reinforcing community rules. In this work, we introduce a computational
framework for studying the future behavior of blocked users on Wikipedia.
After their block expires, they can take several distinct paths: they can
reform and adhere to the rules, but they can also recidivate, or
straight-out abandon the community. We reveal that these trajectories are
tied to factors rooted both in the characteristics of the blocked
individual and in whether they perceived the block to be fair and
justified. Based on these insights, we formulate a series of prediction
tasks aiming to determine which of these paths a user is likely to take
after being blocked for their first offense, and demonstrate the
feasibility of these new tasks. Overall, this work builds towards a more
nuanced approach to moderation by highlighting the tradeoffs that are in
Automatic Detection of Online Abuse in Wikipedia
By Lane Rasberry, University of Virginia
Researchers analyzed all English Wikipedia blocks prior to 2018 using
machine learning. With insights gained, the researchers examined all
English Wikipedia users who are not blocked against the identified
characteristics of blocked users. The results were a ranked set of
predictions of users who are not blocked, but who have a history of conduct
similar to that of blocked users. This research and process models a system
for the use of computing to aid human moderators in identifying conduct on
English Wikipedia which merits a block.
First Insights from Partial Blocks in Wikimedia Wikis
By Morten Warncke-Wang, Wikimedia Foundation
The Anti-Harassment Tools team at the Wikimedia Foundation released the
partial block feature in early 2019. Where previously blocks on Wikimedia
wikis were sitewide (users were blocked from editing an entire wiki),
partial blocks makes it possible to block users from editing specific pages
and/or namespaces. The Italian Wikipedia was the first wiki to start using
this feature, and it has since been rolled out to other wikis as well. In
this presentation, we will look at how this feature has been used in the
first few months since release.
Janna Layton (she, her)
Administrative Assistant - Audiences & Technology
Wikimedia Foundation <https://wikimediafoundation.org/>
as part of https://phabricator.wikimedia.org/T225306 I need to reboot the
an-coord1001 host, that runs the Hive server/metastore and Oozie. Tomorrow
June 26th I'll reboot the host at around 9 AM CEST, the maintenance window
should last 10/15 minutes more or less. This means that hive jobs might
fail during that timeframe, please let me know if it is a problem.
Thanks in advance,
Luca (on behalf of the Analytics team)
at 10 AM CEST the SRE/Analytics team will take down db1107 and db1108
(where the log database is stored/accessed) for maintenance. It should last
half an hour.
If you have any questions please reach out to me (elukey) on IRC or to the
a-team alias at #wikimedia-analytics on Freenode.