Hello everyone - apologies for cross-posting! *TL;DR*: We would like your
feedback on our Metrics Kit project. Please have a look and comment on
Meta-Wiki:
https://meta.wikimedia.org/wiki/Community_health_initiative/Metrics_kit
The Wikimedia Foundation's Trust and Safety team, in collaboration with the
Community Health Initiative, is working on a Metrics Kit designed to
measure the relative "health"[1] of various communities that make up the
Wikimedia movement:
https://meta.wikimedia.org/wiki/Community_health_initiative/Metrics_kit
The ultimate outcome will be a public suite of statistics and data looking
at various aspects of Wikimedia project communities. This could be used by
both community members to make decisions on their community direction and
Wikimedia Foundation staff to point anti-harassment tool development in the
right direction.
We have a set of metrics we are thinking about including in the kit,
ranging from the ratio of active users to active administrators,
administrator confidence levels, and off-wiki factors such as freedom to
participate. It's ambitious, and our methods of collecting such data will
vary.
Right now, we'd like to know:
* Which metrics make sense to collect? Which don't? What are we missing?
* Where would such a tool ideally be hosted? Where would you normally look
for statistics like these?
* We are aware of the overlap in scope between this and Wikistats <
https://stats.wikimedia.org/v2/#/all-projects> — how might these tools
coexist?
Your opinions will help to guide this project going forward. We'll be
reaching out at different stages of this project, so if you're interested
in direct messaging going forward, please feel free to indicate your
interest by signing up on the consultation page.
Looking forward to reading your thoughts.
best,
Joe
P.S.: Please feel free to CC me in conversations that might happen on this
list!
[1] What do we mean by "health"? There is no standard definition of what
makes a Wikimedia community "healthy", but there are many indicators that
highlight where a wiki is doing well, and where it could improve. This
project aims to provide a variety of useful data points that will inform
community decisions that will benefit from objective data.
--
*Joe Sutherland* (he/him or they/them)
Trust and Safety Specialist
Wikimedia Foundation
joesutherland.rocks
Hi everybody,
the stat1004-6-7 and notebook1003-4 hosts will be rebooted tomorrow
morning, May 21st, during the EU morning for security upgrades (Linux
kernel upgrades). Please let me or anybody in the Analytics team know if
this is problematic for your work so we can schedule a better maintenance
window.
Thanks!
Luca (on behalf of the Analytics team)
Hi everybody,
as FYI I am going to upgrade Superset tomorrow (May 15th) to 0.32. This
will involve moving to a new host based on Debian Buster and Python 3.7, so
the move will require some time and it will be hopefully fully done early
during the EU morning.
Tracking task: https://phabricator.wikimedia.org/T211706
Luca (on behalf of the Analytics team)
Hi all!
Over the last couple months we've been working on improving the experience
of looking through the past on Wikistats 2. Until now simple questions like
"who were the top editors in June 2010" or "what countries were visiting
Arabic Wikipedia the most in 2004" were difficult to answer because of our
very limited time selection options on the UI.
[image: topeditors.gif]
This week we deployed the *new time range selector* on Wikistats. As
opposed to the old one, it works for both top and time-series metrics. It
can be used to share links to specific periods in any metric of any wiki.
And we've added a toggle button to switch between monthly and daily
granularities. Check out the big slump in editors
<http://localhost:5000/dist-dev/#/tr.wikipedia.org/contributing/editors/norm…>
on
trwiki since Turkey started blocking Wikipedia:
[image: mobile.gif]
(Yes the new time selector is also mobile friendly!)
Take it for a spin and let us know if you have comments, suggestions, or if
you find anything weird.
Thank you and good weekend!
Fran / the A-Team
--
*Francisco Dans*
Software Engineer, Analytics Team
Wikimedia Foundation
Hi all,
I've got a question on the completeness of the clickstream dataset. I downloaded the dumps for 2018 from https://dumps.wikimedia.org/other/clickstream/ (English Wikipedia only). When I filter for the article pair "Climate change" and "Global warming" (either one being either prev or curr) for all of 2018, this is what I get:
prev curr type n month
<chr> <chr> <chr> <dbl> <chr>
1 Global_warming Climate_change link 755 2018-04
2 Global_warming Climate_change link 810 2018-05
3 Climate_change Global_warming link 3730 2018-05
4 Climate_change Global_warming link 3962 2018-09
5 Climate_change Global_warming link 5865 2018-11
6 Climate_change Global_warming link 5491 2018-12
7 Global_warming Climate_change link 2227 2018-12
The visit numbers seem plausible. But why is there no data on, e.g., January to March? And why is there data for both directions in May and December, but not for the others? This seems implausible given the popularity of the articles.
Here's another example:
prev curr type n month
<chr> <chr> <chr> <dbl> <chr>
1 Smog Air_pollution link 140 2018-01
2 Air_pollution Smog link 82 2018-02
3 Air_pollution Smog link 295 2018-04
4 Air_pollution Smog link 215 2018-05
5 Smog Air_pollution link 85 2018-06
6 Air_pollution Smog link 233 2018-07
7 Air_pollution Smog link 45 2018-09
8 Smog Air_pollution link 96 2018-10
9 Smog Air_pollution link 90 2018-12
Am I missing something here?
Thanks in advance,
Simon
Hi all,
As you maybe aware, Over the last 3 weeks, I've been looking into the
accuracy of active user statistics on English Wikipedia.
I haven't had a chance to upload the final results to
https://en.wikipedia.org/wiki/User:RhinosF1/activeuser but I have completed
the gathering of statistics and have attached a .pdf of the results to this
email.
I've found it interesting how there is a sudden drop in the number of
active users although I half expected this and intended to find it although
I want to look deeper.
I'd like too see whether this is down to blocks or just not continuing and
asses whether time requirements or edit requirements have bigger impact.
I look forward to any feedback and help in the research.
The plan for the next stages are as follows:
1. About 10-14 days for people getting this email to respond.
2. Run the new list of queries for about 2-3 week to gather some data to
show
3. Show the data to enwiki users and ask for feedback / help collecting
data
4. Present results in 2-3 months time.
5. Gather wide feedback on results
6. Maybe take action to improve it if we can see what action needs doing
As you will see most of the data is from around 9pm UTC so in future stages
I would appreciate data collection from a larger range of times.
Thanks in advance,
RhinosF1