Analytics October 2018

analytics@lists.wikimedia.org

18 participants
14 discussions

Community health metrics kit: Input needed!
by Joe Sutherland 27 Feb '20

27 Feb '20

Hello everyone - apologies for cross-posting! *TL;DR*: We would like your feedback on our Metrics Kit project. Please have a look and comment on Meta-Wiki: https://meta.wikimedia.org/wiki/Community_health_initiative/Metrics_kit The Wikimedia Foundation's Trust and Safety team, in collaboration with the Community Health Initiative, is working on a Metrics Kit designed to measure the relative "health"[1] of various communities that make up the Wikimedia movement: https://meta.wikimedia.org/wiki/Community_health_initiative/Metrics_kit The ultimate outcome will be a public suite of statistics and data looking at various aspects of Wikimedia project communities. This could be used by both community members to make decisions on their community direction and Wikimedia Foundation staff to point anti-harassment tool development in the right direction. We have a set of metrics we are thinking about including in the kit, ranging from the ratio of active users to active administrators, administrator confidence levels, and off-wiki factors such as freedom to participate. It's ambitious, and our methods of collecting such data will vary. Right now, we'd like to know: * Which metrics make sense to collect? Which don't? What are we missing? * Where would such a tool ideally be hosted? Where would you normally look for statistics like these? * We are aware of the overlap in scope between this and Wikistats < https://stats.wikimedia.org/v2/#/all-projects> — how might these tools coexist? Your opinions will help to guide this project going forward. We'll be reaching out at different stages of this project, so if you're interested in direct messaging going forward, please feel free to indicate your interest by signing up on the consultation page. Looking forward to reading your thoughts. best, Joe P.S.: Please feel free to CC me in conversations that might happen on this list! [1] What do we mean by "health"? There is no standard definition of what makes a Wikimedia community "healthy", but there are many indicators that highlight where a wiki is doing well, and where it could improve. This project aims to provide a variety of useful data points that will inform community decisions that will benefit from objective data. -- *Joe Sutherland* (he/him or they/them) Trust and Safety Specialist Wikimedia Foundation joesutherland.rocks

7 8

Upcoming move of users from stat1005 to stat1007
by Luca Toscano 20 Nov '18

20 Nov '18

Hi everybody, as part of https://phabricator.wikimedia.org/T205846 we are going to ask to all the stat1005's users to move to stat1007 during the next two weeks. The deadline is November 14th, by which time ssh access to stat1005 will be removed. Background: on stat1005 we have a GPU (more details in https://phabricator.wikimedia.org/T148843) that has been sitting there for almost two years, and it would be great to try to make it work during the next months. This effort will require a lot of tests/reboots/etc.. that can of course impact ongoing work of all of you, so we prefer to move everybody to another identical machine beforehand. Please reach out to me or to the analytics team in T205846 or IRC (#wikimedia-analytics on Freenode) if you have any questions/doubts/blocker/etc.., we are not going to enforce the deadline if anybody will raise concerns or blockers of course. It would be great to move everybody by Nov 14th but we surely don't want to disrupt any ongoing important work. I am going to update the Wikitech documentation about stat1005 and stat1007 as soon as possible, for the moment keep in mind that stat1007 will take over completely everything that stat1005 currently does. I have already copied over all the stat1005 directories to stat1007, and I'll periodically sync them during the following days. If you don't find anything important, please add a note in T205846. Thanks a lot and sorry for the trouble, Luca (on behalf of the Analytics team)

1 2

Pageviews by agent for May 18-21 2015
by Jennifer Pan 14 Nov '18

14 Nov '18

Hi there, I'm an assistant professor in the Department of Communication at Stanford. My co-author, Molly Roberts (Political Science, UCSD), and I are working on a paper examining the effect of China's 2015 block of Chinese language wikipedia on pageviews, which builds on our previous work on censorship in China. We are using the block to conduct a interrupted time series design to measure the effect of censorship on Chinese users. Our main finding is that Chinese users were using Wikipedia to browse (starting at the home page), and the block influenced users' ability to explore and encounter unexpected information. One question we have is whether the pageviews we observe are driven by bots and spiders. We know that the wikimedia rest api provides this information going back to July 1 2015. Since the China block of Wikipedia was on May 19, 2015, we are wondering if there is pageview data by agent type for zh.wikipedia.org pages (all or some subset like most popular) going back to May 2015 (specifically May 18-21, 2015)? From https://meta.wikimedia.org/wiki/Research:Timeline_of_Wikimedia_analytics, it says that pageview data is available in bulk starting on May 1, 2015, so we thought maybe there was some chance this data exists. Any suggestions would be greatly appreciated, and if this is not possible, please let us know. Thank you! Jennifer Pan

4 3

Academic paper of Wikimedia' statistics v2?
by ABEL SERRANO JUSTE 25 Oct '18

25 Oct '18

Hello! Is there any academic paper published about Wikimedia' statistics v2? Thank you. -- Saludos, Abel.

2 2

Wiktionary word page views?
by James Salsman 25 Oct '18

25 Oct '18

How can I get pageview statistics for individual words in the English Wiktionary?

6 6

Druid upgrade - Thu 25th 11 AM CEST
by Luca Toscano 23 Oct '18

23 Oct '18

Hi everybody, the Analytics team will upgrade the Druid cluster behind Superset/Turnilo (druid100[1-3]) to version 0.12.3 on Thursday 25th at 11AM CEST. At the same time, we'll upgrade Turnilo to version 1.8.1. Since it will be a rolling upgrade, you shouldn't see a major impact but possibly sporadic failures while maintenance will be ongoing. Tracking tasks: - Druid https://phabricator.wikimedia.org/T206839 - Turnilo https://phabricator.wikimedia.org/T197276 Please let us know on IRC or email or Phabricator if this is going to impact you. Thanks! Luca

1 0

Statistics about republication of Wikimedia content
by Pine W 20 Oct '18

20 Oct '18

Hi Analytics, Are views of republished Wikimedia content, such as on Google and Youtube, something that we could include in addition to Wikimedia pageview statistics? I imagine that this would require cooperation from Alphabet and other companies that reuse Wikimedia content. It would be nice if we could get that cooperation. Also, Is this republication taken into account in website traffic rankings? My guess is that the answer is no, and that other types of republication such as embedded Youtube videos are not taken into account for their content provider's site rankings, although I think that Youtube would count views of embedded videos in its own statistics of video views. I am thinking that for Youtube and Wikipedia, and other similar sites for which republication or embedding are common, site rankings which are based on pageviews could significantly underestimate the popularity and influence of the sites. Regards, Pine ( https://meta.wikimedia.org/wiki/User:Pine )

3 3

[Wikimedia Research Showcase] Wednesday October 17, 2018 at 11:30 AM (PST) 18:30 UTC
by Janna Layton 17 Oct '18

17 Oct '18

Hello everyone, The next Research Showcase will be live-streamed this Wednesday, October 17, 2018 at 11:30 AM (PST) 18:30 UTC. YouTube stream: https://www.youtube.com/watch?v=UJrJLWuNvXo As usual, you can join the conversation on IRC at #wikimedia-research. You can also watch our past research showcases here: https://www.mediawiki. org/wiki/Wikimedia_Research/Showcase This month's presentation: *"Welcome" Changes? Descriptive and Injunctive Norms in a Wikipedia Sub-Community* *By Jonathan T. Morgan, Wikimedia Foundation and Anna Filippova, GitHub* Open online communities rely on social norms for behavior regulation, group cohesion, and sustainability. Research on the role of social norms online has mainly focused on one source of influence at a time, making it difficult to separate different normative influences and understand their interactions. In this study, we use the Focus Theory to examine interactions between several sources of normative influence in a Wikipedia sub-community: local descriptive norms, local injunctive norms, and norms imported from similar sub- communities. We find that exposure to injunctive norms has a stronger effect than descriptive norms, that the likelihood of performing a behavior is higher when both injunctive and descriptive norms are congruent, and that conflicting social norms may negatively impact pro-normative behavior. We contextualize these findings through member interviews, and discuss their implications for both future research on normative influence in online groups and the design of systems that support open collaboration. *The pipeline of online participation inequalities: The case of Wikipedia Editing* *By Aaron Shaw, Northwestern University and Eszter Hargittai, University of Zurich* Participatory platforms like the Wikimedia projects have unique potential to facilitate more equitable knowledge production. However, digital inequalities such as the Wikipedia gender gap undermine this democratizing potential. In this talk, I present new research in which Eszter Hargittai and I conceptualize a "pipeline" of online participation and model distinct levels of awareness and behaviors necessary to become a contributor to the participatory web. We test the theory in the case of Wikipedia editing, using new survey data from a diverse, national sample of adult internet users in the U.S. The results show that Wikipedia participation consistently reflects inequalities of education and internet experiences and skills. We find that the gender gap only emerges later in the pipeline whereas gaps along racial and socioeconomic lines explain variations earlier in the pipeline. Our findings underscore the multidimensionality of digital inequalities and suggest new pathways toward closing knowledge gaps by highlighting the importance of education and Internet skills. We conclude that future research and interventions to overcome digital participation gaps should not focus exclusively on gender or class differences in content creation, but expand to address multiple aspects of digital inequality across pipelines of participation. In particular, when it comes to overcoming gender gaps in the case of Wikipedia, our results suggest that continued emphasis on recruiting female editors should include efforts to disseminate the knowledge that Wikipedia can be edited. Our findings support broader efforts to overcome knowledge- and skill-based barriers to entry among potential contributors to the open web. -- Janna Layton Administrative Assistant - Audiences & Technology Wikimedia Foundation 1 Montgomery St. Suite 1600 San Francisco, CA 94104

1 2

New reports in wikistats2: "top editors" (a.k.a most prolific contributors) and "top edited articles"
by Nuria Ruiz 12 Oct '18

12 Oct '18

Hello, The analytics team would like to announce two new metrics available in wikistats2: 1. Top editors (a.k.a most prolific contributors) See example for Italian wikipedia: https://stats.wikimedia.org/v2/#/it.wikipedia.org/contributing/top-editors/… 2. Top edited articles (pages with most edits, not most contributors): Again, example for Italian wikipedia: https://stats.wikimedia.org/v2/#/it.wikipedia.org/contributing/top-edited-p… Please take a look and, as always, send feedback via phab or irc (#wikimedia-analytics) The tasks we have in our radar for wikistats2 are metrics "per family", that is. "edits for all wikitionary projects" or "unique devices for all wikipedias". Some of these metrics are already available in the API. See for example, the daily number of edits for all wikitionary.org projects for August 2018, made by registered users on articles (pages in content namespace): https://wikimedia.org/api/rest_v1/metrics/edits/aggregate/all-wiktionary-pr… More info about edit data apis here: https://wikitech.wikimedia.org/wiki/Analytics/AQS/Wikistats_2 Thanks, Nuria

1 0

Eventlogging host rebooted, metrics might show some dips
by Luca Toscano 11 Oct '18

11 Oct '18

Hi everybody, I stopped Eventlogging completely from 14:16 to 14:17 UTC to allow a host reboot for kernel upgrades. This might end up shown in some Kafka throughput metrics related to the Eventlogging schemas as a dip. If you have any question please feel free to follow up with me or the Analytics team! Luca

1 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Analytics October 2018