Hello everyone - apologies for cross-posting! *TL;DR*: We would like your
feedback on our Metrics Kit project. Please have a look and comment on
The Wikimedia Foundation's Trust and Safety team, in collaboration with the
Community Health Initiative, is working on a Metrics Kit designed to
measure the relative "health" of various communities that make up the
The ultimate outcome will be a public suite of statistics and data looking
at various aspects of Wikimedia project communities. This could be used by
both community members to make decisions on their community direction and
Wikimedia Foundation staff to point anti-harassment tool development in the
We have a set of metrics we are thinking about including in the kit,
ranging from the ratio of active users to active administrators,
administrator confidence levels, and off-wiki factors such as freedom to
participate. It's ambitious, and our methods of collecting such data will
Right now, we'd like to know:
* Which metrics make sense to collect? Which don't? What are we missing?
* Where would such a tool ideally be hosted? Where would you normally look
for statistics like these?
* We are aware of the overlap in scope between this and Wikistats <
https://stats.wikimedia.org/v2/#/all-projects> — how might these tools
Your opinions will help to guide this project going forward. We'll be
reaching out at different stages of this project, so if you're interested
in direct messaging going forward, please feel free to indicate your
interest by signing up on the consultation page.
Looking forward to reading your thoughts.
P.S.: Please feel free to CC me in conversations that might happen on this
 What do we mean by "health"? There is no standard definition of what
makes a Wikimedia community "healthy", but there are many indicators that
highlight where a wiki is doing well, and where it could improve. This
project aims to provide a variety of useful data points that will inform
community decisions that will benefit from objective data.
*Joe Sutherland* (he/him or they/them)
Trust and Safety Specialist
as part of https://phabricator.wikimedia.org/T205846 we are going to ask to
all the stat1005's users to move to stat1007 during the next two weeks. The
deadline is November 14th, by which time ssh access to stat1005 will be
Background: on stat1005 we have a GPU (more details in
https://phabricator.wikimedia.org/T148843) that has been sitting there for
almost two years, and it would be great to try to make it work during the
next months. This effort will require a lot of tests/reboots/etc.. that can
of course impact ongoing work of all of you, so we prefer to move everybody
to another identical machine beforehand.
Please reach out to me or to the analytics team in T205846 or IRC
(#wikimedia-analytics on Freenode) if you have any
questions/doubts/blocker/etc.., we are not going to enforce the deadline if
anybody will raise concerns or blockers of course. It would be great to
move everybody by Nov 14th but we surely don't want to disrupt any ongoing
I am going to update the Wikitech documentation about stat1005 and stat1007
as soon as possible, for the moment keep in mind that stat1007 will take
over completely everything that stat1005 currently does.
I have already copied over all the stat1005 directories to stat1007, and
I'll periodically sync them during the following days. If you don't find
anything important, please add a note in T205846.
Thanks a lot and sorry for the trouble,
Luca (on behalf of the Analytics team)
I'm an assistant professor in the Department of Communication at Stanford. My co-author, Molly Roberts (Political Science, UCSD), and I are working on a paper examining the effect of China's 2015 block of Chinese language wikipedia on pageviews, which builds on our previous work on censorship in China.
We are using the block to conduct a interrupted time series design to measure the effect of censorship on Chinese users. Our main finding is that Chinese users were using Wikipedia to browse (starting at the home page), and the block influenced users' ability to explore and encounter unexpected information. One question we have is whether the pageviews we observe are driven by bots and spiders. We know that the wikimedia rest api provides this information going back to July 1 2015. Since the China block of Wikipedia was on May 19, 2015, we are wondering if there is pageview data by agent type for zh.wikipedia.org pages (all or some subset like most popular) going back to May 2015 (specifically May 18-21, 2015)? From https://meta.wikimedia.org/wiki/Research:Timeline_of_Wikimedia_analytics,
it says that pageview data is available in bulk starting on May 1, 2015, so we thought maybe there was some chance this data exists.
Any suggestions would be greatly appreciated, and if this is not possible, please let us know.
the Analytics team will upgrade the Druid cluster behind Superset/Turnilo
(druid100[1-3]) to version 0.12.3 on Thursday 25th at 11AM CEST. At the
same time, we'll upgrade Turnilo to version 1.8.1. Since it will be a
rolling upgrade, you shouldn't see a major impact but possibly sporadic
failures while maintenance will be ongoing.
- Druid https://phabricator.wikimedia.org/T206839
- Turnilo https://phabricator.wikimedia.org/T197276
Please let us know on IRC or email or Phabricator if this is going to
Are views of republished Wikimedia content, such as on Google and Youtube,
something that we could include in addition to Wikimedia pageview
statistics? I imagine that this would require cooperation from Alphabet and
other companies that reuse Wikimedia content. It would be nice if we could
get that cooperation.
Also, Is this republication taken into account in website traffic rankings?
My guess is that the answer is no, and that other types of republication
such as embedded Youtube videos are not taken into account for their
content provider's site rankings, although I think that Youtube would count
views of embedded videos in its own statistics of video views. I am
thinking that for Youtube and Wikipedia, and other similar sites for which
republication or embedding are common, site rankings which are based on
pageviews could significantly underestimate the popularity and influence of
( https://meta.wikimedia.org/wiki/User:Pine )
The next Research Showcase will be live-streamed this Wednesday, October
17, 2018 at 11:30 AM (PST) 18:30 UTC.
YouTube stream: https://www.youtube.com/watch?v=UJrJLWuNvXo
As usual, you can join the conversation on IRC at #wikimedia-research. You
can also watch our past research showcases here: https://www.mediawiki.
This month's presentation:
*"Welcome" Changes? Descriptive and Injunctive Norms in a Wikipedia
*By Jonathan T. Morgan, Wikimedia Foundation and Anna Filippova, GitHub*
Open online communities rely on social norms for behavior regulation, group
cohesion, and sustainability. Research on the role of social norms online
has mainly focused on one source of influence at a time, making it
difficult to separate different normative influences and understand their
interactions. In this study, we use the Focus Theory to examine
interactions between several sources of normative influence in a Wikipedia
sub-community: local descriptive norms, local injunctive norms, and norms
imported from similar sub- communities. We find that exposure to injunctive
norms has a stronger effect than descriptive norms, that the likelihood of
performing a behavior is higher when both injunctive and descriptive norms
are congruent, and that conflicting social norms may negatively impact
pro-normative behavior. We contextualize these findings through member
interviews, and discuss their implications for both future research on
normative influence in online groups and the design of systems that support
*The pipeline of online participation inequalities: The case of Wikipedia
*By Aaron Shaw, Northwestern University and Eszter Hargittai, University of
Participatory platforms like the Wikimedia projects have unique potential
to facilitate more equitable knowledge production. However, digital
inequalities such as the Wikipedia gender gap undermine this democratizing
potential. In this talk, I present new research in which Eszter Hargittai
and I conceptualize a "pipeline" of online participation and model distinct
levels of awareness and behaviors necessary to become a contributor to the
participatory web. We test the theory in the case of Wikipedia editing,
using new survey data from a diverse, national sample of adult internet
users in the U.S.
The results show that Wikipedia participation consistently reflects
inequalities of education and internet experiences and skills. We find that
the gender gap only emerges later in the pipeline whereas gaps along racial
and socioeconomic lines explain variations earlier in the pipeline. Our
findings underscore the multidimensionality of digital inequalities and
suggest new pathways toward closing knowledge gaps by highlighting the
importance of education and Internet skills.
We conclude that future research and interventions to overcome digital
participation gaps should not focus exclusively on gender or class
differences in content creation, but expand to address multiple aspects of
digital inequality across pipelines of participation. In particular, when
it comes to overcoming gender gaps in the case of Wikipedia, our results
suggest that continued emphasis on recruiting female editors should include
efforts to disseminate the knowledge that Wikipedia can be edited. Our
findings support broader efforts to overcome knowledge- and skill-based
barriers to entry among potential contributors to the open web.
Administrative Assistant - Audiences & Technology
1 Montgomery St. Suite 1600
San Francisco, CA 94104
I stopped Eventlogging completely from 14:16 to 14:17 UTC to allow a host
reboot for kernel upgrades. This might end up shown in some Kafka
throughput metrics related to the Eventlogging schemas as a dip.
If you have any question please feel free to follow up with me or the