Analytics December 2018

analytics@lists.wikimedia.org

14 participants
10 discussions

Community health metrics kit: Input needed!

by Joe Sutherland

Hello everyone - apologies for cross-posting! *TL;DR*: We would like your feedback on our Metrics Kit project. Please have a look and comment on Meta-Wiki: https://meta.wikimedia.org/wiki/Community_health_initiative/Metrics_kit The Wikimedia Foundation's Trust and Safety team, in collaboration with the Community Health Initiative, is working on a Metrics Kit designed to measure the relative "health"[1] of various communities that make up the Wikimedia movement: https://meta.wikimedia.org/wiki/Community_health_initiative/Metrics_kit The ultimate outcome will be a public suite of statistics and data looking at various aspects of Wikimedia project communities. This could be used by both community members to make decisions on their community direction and Wikimedia Foundation staff to point anti-harassment tool development in the right direction. We have a set of metrics we are thinking about including in the kit, ranging from the ratio of active users to active administrators, administrator confidence levels, and off-wiki factors such as freedom to participate. It's ambitious, and our methods of collecting such data will vary. Right now, we'd like to know: * Which metrics make sense to collect? Which don't? What are we missing? * Where would such a tool ideally be hosted? Where would you normally look for statistics like these? * We are aware of the overlap in scope between this and Wikistats < https://stats.wikimedia.org/v2/#/all-projects> — how might these tools coexist? Your opinions will help to guide this project going forward. We'll be reaching out at different stages of this project, so if you're interested in direct messaging going forward, please feel free to indicate your interest by signing up on the consultation page. Looking forward to reading your thoughts. best, Joe P.S.: Please feel free to CC me in conversations that might happen on this list! [1] What do we mean by "health"? There is no standard definition of what makes a Wikimedia community "healthy", but there are many indicators that highlight where a wiki is doing well, and where it could improve. This project aims to provide a variety of useful data points that will inform community decisions that will benefit from objective data. -- *Joe Sutherland* (he/him or they/them) Trust and Safety Specialist Wikimedia Foundation joesutherland.rocks

4 years, 2 months

Hive EventLogging bug caused NULL fields since 2018-11-29

by Andrew Otto

Hi all, A bug in the code that imports EventLogging data into Hive caused top 3 level EventCapsule <https://meta.wikimedia.org/wiki/Schema:EventCapsule> fields to be set to NULL in all Hive EventLogging tables since 2018-11-29T17:00:00. The affected fields were recvFrom, seqId, and (more importantly) userAgent. We've fixed the bug, and are backfilling the data now. https://phabricator.wikimedia.org/T211833 has more info. Sorry for the inconvenience! Follow the phabricator ticket to get updates on when backfilling has completed. -Andrew Otto Systems Engineer, WMF

5 years, 3 months

project Grant Wikipedia Cultural Diversity Observatory (WCDO)

by Marc Miquel

Hello everyone, I am writing to tell you that we have presented a plan for a second phase to extend the project Wikipedia Cultural Diversity Observatory (WCDO). As a reminder, the WCDO aims at providing valuable strategic data in order to fight for more cultural diversity in each Wikipedia language edition. In the previous phase, we collected the Cultural Context Content (CCC) datasets for all 300 language editions and provided some top priority articles for different topics such as women-men, geolocated, among others (named Top CCC articles). The infrastructure for the project has been set (datasets and website). In this new phase <https://meta.wikimedia.org/wiki/Grants:Project/WCDO/Culture_Gap_Monthly_Mon…>, we plan to create many more tools and visualizations: Top CCC article lists based on community member suggestions, but most importantly, to create a tool to monitor the gaps on a monthly basis and serve it as a newsletter. This way editors will be able to see the efforts they dedicate each month to create geolocated articles or cultural context content to bridge the gaps. Also, we plan to research on marginalized languages in order to see which have more potential to become a new Wikipedia language edition, start creating content about their cultural context ("decolonizing the Internet"), and increase the overall cultural diversity of the project. If you think you can join the project or provide some feedback, please write us at tools.wcdo(a)tools.wmflabs.org. If you consider this may be helpful, please help us, provide some feedback and endorse the project. You can check the project here: https://meta.wikimedia.org/wiki/Grants:Project/WCDO/Culture_Gap_Monthly_Mon… Thanks in advance for your time. Best, Marc Miquel ᐧ

5 years, 4 months

Does prefetch count as a pageview?

by Chenqi Zhu

Hi everyone, I am trying to better understand the pageview data. I have a quick question. I apologize if the question has been asked or it is so naive. If the web browser prefetches a Wikipedia page, does it count as one pageview in the pageview data? By "prefetching", I meant X's Wikipedia page shows up in the search results and the browser prefetches/preloads the search results but I do not click on X's Wikipedia page. If so, the pageview data seem to over-count the number of visits to X's Wikipedia page. Thanks in advance for any insight. Chenqi Zhu New York University 44 W 4th St., Suite 10-185(B), New York, NY 10012, U.S.A.

5 years, 4 months

Superset going down for a few hours

by Nuria Ruiz

Team: Superset will be going down for a few hours today as we rollback the update we were trying to do. It turns out that the newest versions of superset are VERY non backwards compatible, they use python 3.6 which is not available on our debian distro and they introduce a bunch of other bugs. We will be working on our fork from now on so we have a more stable basis for changes: https://github.com/wikimedia/incubator-superset More updates here: https://phabricator.wikimedia.org/T211605 Thanks, Nuria

5 years, 4 months

Wikistats2 - Metrics available for project families

by Nuria Ruiz

Hello! The Analytics team would like to announce that we have now in Wikistats2 metrics available for what we are calling (for the lack of a better name) "project families". That is, "all wikipedias", "all wikibooks"..etc See, for example, bytes added by users to all wikibooks in the last month: https://stats.wikimedia.org/v2/#/all-wikibooks-projects/content/net-bytes-d… And "all wikibooks top editors" [2]: https://stats.wikimedia.org/v2/#/all-wikibooks-projects/contributing/top-ed… Not all metrics are available per project, most notably we (yet) do not have pageviews. As always please file bugs [2] if you find any, and let us know what can we do better. Thanks, Nuria [1] https://meta.wikimedia.org/wiki/Research:Wikistats_metrics/Top_editors [2] https://phabricator.wikimedia.org/maniphest/task/edit/?title=Wikistats%20Bu…

5 years, 4 months

[Wikimedia Research Showcase] Wednesday, December 12 at 11:30 AM PST, 19:30 UTC

by Janna Layton

Hello everyone, The next Research Showcase, *Why the World Reads Wikipedia*, will be live-streamed this Wednesday, December 12, 2018, at 11:30 AM PST/19:30 UTC. This presentation is about Wikipedia usage across languages. YouTube stream: https://www.youtube.com/watch?v=RKMFvi_CCB0 As usual, you can join the conversation on IRC at #wikimedia-research. You can also watch our past research showcases here: https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase This month's presentation: *Why the World Reads Wikipedia* By Florian Lemmerich, RWTH Aachen University; Diego Sáez-Trumper, Wikimedia Foundation; Robert West, EPFL; and Leila Zia, Wikimedia Foundation So far, little is known about why users across the world read Wikipedia's various language editions. To bridge this gap, we conducted a comparative study by combining a large-scale survey of Wikipedia readers across 14 language editions with a log-based analysis of user activity. For analysis, we proceeded in three steps: First, we analyzed the survey results to compare the prevalence of Wikipedia use cases across languages, discovering commonalities, but also substantial differences, among Wikipedia languages with respect to their usage. Second, we matched survey responses to the respondents' traces in Wikipedia's server logs to characterize behavioral patterns associated with specific use cases, finding that distinctive patterns consistently mark certain use cases across language editions. Third, we could show that certain Wikipedia use cases are more common in countries with certain socio-economic characteristics; e.g., in-depth reading of Wikipedia articles is substantially more common in countries with a low Human Development Index. The outcomes of this study provide a deeper understanding of Wikipedia readership in a wide range of languages, which is important for Wikipedia editors, developers, and the reusers of Wikipedia content. -- Janna Layton Administrative Assistant - Audiences & Technology Wikimedia Foundation 1 Montgomery St. Suite 1600 San Francisco, CA 94104

5 years, 4 months

Fwd: Cron <root@stat1005> /usr/local/bin/published-datasets-sync -q

by Chase Pettet

Possibly from changes made to secure rsync? I've seen a few of these so I thought I would forward. Cheers! ---------- Forwarded message --------- From: Cron Daemon <root(a)stat1005.eqiad.wmnet> Date: Tue, Dec 11, 2018 at 7:49 AM Subject: Cron <root@stat1005> /usr/local/bin/published-datasets-sync -q To: <root(a)stat1005.eqiad.wmnet> rsync: failed to connect to thorium.eqiad.wmnet (2620:0:861:108:10:64:53:26): Connection timed out (110) rsync: failed to connect to thorium.eqiad.wmnet (10.64.53.26): Connection timed out (110) rsync error: error in socket IO (code 10) at clientserver.c(125) [sender=3.1.2] -- Chase Pettet chasemp on phabricator <https://phabricator.wikimedia.org/p/chasemp/> and IRC

5 years, 4 months

Save the date: Wiki Workshop 2019 to be hosted at The Web Conference 2019 in San Francisco (May 13-14, 2019)

by Dario Taraborelli

Hi everyone, We are thrilled to announce that the *6th annual Wiki Workshop* [1] will be hosted at *The Web Conference 2019* (formerly known as WWW) in San Francisco, CA, on May 13 or 14, 2019 [2]. The workshop provides an annual forum for researchers exploring all aspects of Wikipedia, Wikidata, and other Wikimedia projects to present their work. We'd love to have your contributions, so please take a look at the details in this call: http://wikiworkshop.org/2019/#call Please note that *January 31, 2019* is the submission deadline if you want your paper to appear in the (archival) conference proceedings, and *March 14, 2019* is for all other, non-archival submissions. [3] Following past year's format, the workshop will include invited talks, a poster session, as well as offer an opportunity for participants to meet and discuss future research directions. We look forward to receiving your submissions and seeing you in San Francisco in May! Best, Dario on behalf of the organizers [4] [image: ww19_banner_www.png] [1] http://wikiworkshop.org/ [2] https://www2019.thewebconf.org/ [3] http://wikiworkshop.org/2019/#dates [4] http://wikiworkshop.org/2019/#organization

5 years, 4 months

Investigation ongoing about data loss error for Webrequest 2018-12-01 hour 14

by Luca Toscano

Hi everybody, during the weekend Oozie alerted us about a suspect data loss for the Webrequest dataset for hour 14 of 2018-12-01. We opened a task to investigate: https://phabricator.wikimedia.org/T211000 This means that related datasets will be missing until we have a final fix/answer, apologies for the delay. Luca (on behalf of the Analytics team)

5 years, 4 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Analytics December 2018