Hello everyone - apologies for cross-posting! *TL;DR*: We would like your
feedback on our Metrics Kit project. Please have a look and comment on
The Wikimedia Foundation's Trust and Safety team, in collaboration with the
Community Health Initiative, is working on a Metrics Kit designed to
measure the relative "health" of various communities that make up the
The ultimate outcome will be a public suite of statistics and data looking
at various aspects of Wikimedia project communities. This could be used by
both community members to make decisions on their community direction and
Wikimedia Foundation staff to point anti-harassment tool development in the
We have a set of metrics we are thinking about including in the kit,
ranging from the ratio of active users to active administrators,
administrator confidence levels, and off-wiki factors such as freedom to
participate. It's ambitious, and our methods of collecting such data will
Right now, we'd like to know:
* Which metrics make sense to collect? Which don't? What are we missing?
* Where would such a tool ideally be hosted? Where would you normally look
for statistics like these?
* We are aware of the overlap in scope between this and Wikistats <
https://stats.wikimedia.org/v2/#/all-projects> — how might these tools
Your opinions will help to guide this project going forward. We'll be
reaching out at different stages of this project, so if you're interested
in direct messaging going forward, please feel free to indicate your
interest by signing up on the consultation page.
Looking forward to reading your thoughts.
P.S.: Please feel free to CC me in conversations that might happen on this
 What do we mean by "health"? There is no standard definition of what
makes a Wikimedia community "healthy", but there are many indicators that
highlight where a wiki is doing well, and where it could improve. This
project aims to provide a variety of useful data points that will inform
community decisions that will benefit from objective data.
*Joe Sutherland* (he/him or they/them)
Trust and Safety Specialist
As you maybe aware, Over the last 3 weeks, I've been looking into the
accuracy of active user statistics on English Wikipedia.
I haven't had a chance to upload the final results to
https://en.wikipedia.org/wiki/User:RhinosF1/activeuser but I have completed
the gathering of statistics and have attached a .pdf of the results to this
I've found it interesting how there is a sudden drop in the number of
active users although I half expected this and intended to find it although
I want to look deeper.
I'd like too see whether this is down to blocks or just not continuing and
asses whether time requirements or edit requirements have bigger impact.
I look forward to any feedback and help in the research.
The plan for the next stages are as follows:
1. About 10-14 days for people getting this email to respond.
2. Run the new list of queries for about 2-3 week to gather some data to
3. Show the data to enwiki users and ask for feedback / help collecting
4. Present results in 2-3 months time.
5. Gather wide feedback on results
6. Maybe take action to improve it if we can see what action needs doing
As you will see most of the data is from around 9pm UTC so in future stages
I would appreciate data collection from a larger range of times.
Thanks in advance,
on Monday 22nd the SRE Data Persistence team will reboot the Analytics
dbstore database hosts for Linux kernel + Mariadb upgrade during early EU
morning. It shouldn't affect anybody but please let me know if you have any
issue with it.
Luca (on behalf of Analytics and Data Persistence)
The next Research Showcase, “Group Membership and Contributions to Public
Information Goods: The Case of WikiProject” and “Thanks for Stopping By: A
Study of ‘Thanks’ Usage on Wikimedia,” will be live-streamed next
Wednesday, April 17, 2019, at 11:30 AM PDT/19:30 UTC.
YouTube stream: https://www.youtube.com/watch?v=zmb5LoJzOoE
As usual, you can join the conversation on IRC at #wikimedia-research. You
can also watch our past research showcases here:
This month's presentations:
Group Membership and Contributions to Public Information Goods: The Case of
By Ark Fangzhou Zhang
We investigate the effects of group identity on contribution behavior on
the English Wikipedia, the largest online encyclopedia that gives free
access to the public. Using an instrumental variable approach that exploits
the variations in one’s exposure to WikiProject, we find that joining a
WikiProject has a significant impact on one’s level of contribution, with
an average increase of 79 revisions or 8,672 character per month. To
uncover the potential mechanism underlying the treatment effect, we use the
size of home page for WikiProject as a proxy for the number of
recommendations from a project. The results show that the users who join a
WikiProject with more recommendations significantly increase their
contribution to articles under the joined project, but not to articles
under other projects.
Thanks for Stopping By: A Study of ‘Thanks’ Usage on Wikimedia
By Swati Goel
The Thanks feature on Wikipedia, also known as "Thanks," is a tool with
which editors can quickly and easily send one other positive feedback. The
aim of this project is to better understand this feature: its scope, the
characteristics of a typical "Thanks" interaction, and the effects of
receiving a thank on individual editors. We study the motivational impacts
of "Thanks" because maintaining editor engagement is a central problem for
crowdsourced repositories of knowledge such as Wikimedia. Our main findings
are that most editors have not been exposed to the Thanks feature (meaning
they have never given nor received a thank), thanks are typically sent
upwards (from less experienced to more experienced editors), and receiving
a thank is correlated with having high levels of editor engagement. Though
the prevalence of "Thanks" usage varies by editor experience, the impact of
receiving a thank seems mostly consistent for all users. We empirically
demonstrate that receiving a thank has a strong positive effect on
short-term editor activity across the board and provide preliminary
evidence that thanks could compound to have long-term effects as well.
Janna Layton (she, her)
Administrative Assistant - Audiences & Technology
Wikimedia Foundation <https://wikimediafoundation.org/>
the Analytics team is planning to upgrade the Hadoop cluster to CDH 5.16.1
(changelog in https://phabricator.wikimedia.org/T218343) on Wed Apr 17th at
15:00 CET. All services (HDFS, Hive, Oozie, Notebooks, etc..) will be
unavailable for one hour if everything goes according to plan, but please
account at least two if you need to schedule important work so any
unexpected issue will less likely affect you.
If this timeline impacts something really important that you have planned
or scheduled please let us know (via the above task or on IRC -
#wikimedia-analytics) and we'll re-schedule the maintenance accordingly.
Luca (on behalf of the Analytics team)
I've started the project and posted information at
Feel free to run the queries and gather data.
I'll do the first week or so independently then mention it on Village pump
to gather attention to the research.
it seems that the daily dumps of pagecounts, i.e.
are not working anymore from March 25th.
I can't find on the net if it is a temporarily issue or if it will be
Did you have some informations about that?
Please let me know and thanks in advance
I'm having trouble getting yesterday's pageviews data. There is no hourly
dump file for 23:00 4/1/19 (though all other hours are accounted for), and
the pageviews API is returning "not found" errors for requests like the
In the past I've been told that it's not unusual for there to be occasional
long delays in pageviews data becoming available at the start of the month.
Does this explain the outage? Is there any way to predict when the data
will be available, or do I just have to continue checking back? And does
anybody know what caused the slowdown, and if I should expect it to
Thank you very much,