Hello everyone - apologies for cross-posting! *TL;DR*: We would like your
feedback on our Metrics Kit project. Please have a look and comment on
Meta-Wiki:
https://meta.wikimedia.org/wiki/Community_health_initiative/Metrics_kit
The Wikimedia Foundation's Trust and Safety team, in collaboration with the
Community Health Initiative, is working on a Metrics Kit designed to
measure the relative "health"[1] of various communities that make up the
Wikimedia movement:
https://meta.wikimedia.org/wiki/Community_health_initiative/Metrics_kit
The ultimate outcome will be a public suite of statistics and data looking
at various aspects of Wikimedia project communities. This could be used by
both community members to make decisions on their community direction and
Wikimedia Foundation staff to point anti-harassment tool development in the
right direction.
We have a set of metrics we are thinking about including in the kit,
ranging from the ratio of active users to active administrators,
administrator confidence levels, and off-wiki factors such as freedom to
participate. It's ambitious, and our methods of collecting such data will
vary.
Right now, we'd like to know:
* Which metrics make sense to collect? Which don't? What are we missing?
* Where would such a tool ideally be hosted? Where would you normally look
for statistics like these?
* We are aware of the overlap in scope between this and Wikistats <
https://stats.wikimedia.org/v2/#/all-projects> — how might these tools
coexist?
Your opinions will help to guide this project going forward. We'll be
reaching out at different stages of this project, so if you're interested
in direct messaging going forward, please feel free to indicate your
interest by signing up on the consultation page.
Looking forward to reading your thoughts.
best,
Joe
P.S.: Please feel free to CC me in conversations that might happen on this
list!
[1] What do we mean by "health"? There is no standard definition of what
makes a Wikimedia community "healthy", but there are many indicators that
highlight where a wiki is doing well, and where it could improve. This
project aims to provide a variety of useful data points that will inform
community decisions that will benefit from objective data.
--
*Joe Sutherland* (he/him or they/them)
Trust and Safety Specialist
Wikimedia Foundation
joesutherland.rocks
Hi all!
We'll be upgrading Spark to 2.4.4 in the Analytics (Hadoop) cluster on
Tuesday November 5th. We don't expect any real downtime, but if you have
regularly scheduled jobs that are built with Spark 2.3, you'll likely want
to rebuild them to use Spark 2.4 after Tuesday.
You can follow along here:
https://phabricator.wikimedia.org/T222253
- Andrew Otto (Systems Engineer) & Analytics Engineering Team
Hello!
After a long process, we are finally ready to officially disable MySQL
support for EventLogging. All (most) all data is now available in Hive in
the event database.
We'll be turning of the ingestion of data into MySQL this week, and will be
archiving and deleting data from the MySQL instance and eventually
repurposing the hardware.
More info here and in sub tickets:
https://phabricator.wikimedia.org/T159170
Thanks!
- Andrew Otto (Systems Engineer) & Analytics Engineering Team
Hi all,
The next Research Showcase will be live-streamed next Wednesday, October
16, at 9:30 AM PDT/16:30 UTC.
YouTube stream: https://www.youtube.com/watch?v=KZ35weAVlIU
As usual, you can join the conversation on IRC at #wikimedia-research. You
can also watch our past Research Showcases here:
https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase
This month's presentations:
Elections Without Fake: Deploying Real Systems to Counter Misinformation
Campaigns
By Fabrício Benevenuto, Computer Science Department, Universidade Federal
de Minas Gerais (UFMG), Brazil
The political debate and electoral dispute in the online space during the
2018 Brazilian elections were marked by an information war. In order to
mitigate the misinformation problem, we created the project Elections
Without Fake <http://www.eleicoes-sem-fake.dcc.ufmg.br/> and developed a
few technological solutions able to reduce the abuse of misinformation
campaigns in the online space. Particularly, we created a system to monitor
public groups in WhatsApp and a system to monitor ads in Facebook. Our
systems showed to be fundamental for fact-checking and investigative
journalism, and are currently being used by over 150 journalists with
editorial lines and various fact-checking agencies.
More info on second talk by Francesca Spezzano to come
--
Janna Layton (she, her)
Administrative Assistant - Product & Technology
Wikimedia Foundation <https://wikimediafoundation.org/>
Hi everybody,
the Analytics team is going to stop HDFS and Yarn services for a
(hopefully) brief time window tomorrow, Tue Oct 15th, from 14:30 to 15:30
CEST.
We are going to swap the Zookeeper cluster from the one currently used by
all the Kafka production services to a dedicated one within the Analytics
VLAN. More details in https://phabricator.wikimedia.org/T217057
Side effects: the move should be transparent to all users, except the ones
relying on history in Yarn (for example, checking it via yarn.wikimedia.org).
The history is in fact stored in zookeeper, and to keep things simple we
are not copying znodes over to the new cluster. Please let us know if this
impacts you in any way.
As usual, if the time window affects your work please let us know and we'll
find a new maintenance window.
Thanks!
Luca (on behalf of the Analytics team).