Analytics September 2019

analytics@lists.wikimedia.org

7 participants
6 discussions

Community health metrics kit: Input needed!
by Joe Sutherland 27 Feb '20

27 Feb '20

Hello everyone - apologies for cross-posting! *TL;DR*: We would like your feedback on our Metrics Kit project. Please have a look and comment on Meta-Wiki: https://meta.wikimedia.org/wiki/Community_health_initiative/Metrics_kit The Wikimedia Foundation's Trust and Safety team, in collaboration with the Community Health Initiative, is working on a Metrics Kit designed to measure the relative "health"[1] of various communities that make up the Wikimedia movement: https://meta.wikimedia.org/wiki/Community_health_initiative/Metrics_kit The ultimate outcome will be a public suite of statistics and data looking at various aspects of Wikimedia project communities. This could be used by both community members to make decisions on their community direction and Wikimedia Foundation staff to point anti-harassment tool development in the right direction. We have a set of metrics we are thinking about including in the kit, ranging from the ratio of active users to active administrators, administrator confidence levels, and off-wiki factors such as freedom to participate. It's ambitious, and our methods of collecting such data will vary. Right now, we'd like to know: * Which metrics make sense to collect? Which don't? What are we missing? * Where would such a tool ideally be hosted? Where would you normally look for statistics like these? * We are aware of the overlap in scope between this and Wikistats < https://stats.wikimedia.org/v2/#/all-projects> — how might these tools coexist? Your opinions will help to guide this project going forward. We'll be reaching out at different stages of this project, so if you're interested in direct messaging going forward, please feel free to indicate your interest by signing up on the consultation page. Looking forward to reading your thoughts. best, Joe P.S.: Please feel free to CC me in conversations that might happen on this list! [1] What do we mean by "health"? There is no standard definition of what makes a Wikimedia community "healthy", but there are many indicators that highlight where a wiki is doing well, and where it could improve. This project aims to provide a variety of useful data points that will inform community decisions that will benefit from objective data. -- *Joe Sutherland* (he/him or they/them) Trust and Safety Specialist Wikimedia Foundation joesutherland.rocks

7 8

stat1005 back in the pool of Analytics client hosts
by Luca Toscano 26 Sep '19

26 Sep '19

Hi everybody, stat1005 was replaced almost a year ago by stat1007 to allow GPU research and testing (https://phabricator.wikimedia.org/T148843). After a long journey we are happy to add stat1005 back in the pool of available Analytics client hosts. I have updated the documentation in: https://wikitech.wikimedia.org/wiki/Stat1005 https://wikitech.wikimedia.org/wiki/Analytics/Data_access#Analytics_clients The host is now an Hadoop client like stat1004, and also offers an AMD GPU ( https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/AMD_GPU). For the moment we are limiting the access to the GPU to people that explicitly need it, since it is still a testing environment and we'd like to give some priority to people that already have projects relying on it. The end goal is to give access to the GPU by default to everybody, so stay tuned. If you wish to participate to the testing efforts, please reach out to the Analytics team! ( https://wikitech.wikimedia.org/wiki/Analytics/Data_access#GPU_usage). Last but not the least - stat1005 is running Debian 10 (Buster), and openjdk-8 instead of the ones shipped by Debian (openjdk-11) since the Hadoop cluster is not ready to migrate yet. Everything seem running fine from our tests, but please report to us anything that looks strange. Thanks in advance! Luca (on behalf of the Analytics team)

1 0

[Wikimedia Research Showcase] Earlier time! September 18, 2019 at 9:30 AM PT, 16:30 UTC
by Janna Layton 18 Sep '19

18 Sep '19

Hello everyone, The next Research Showcase will be live-streamed next Wednesday, September 18, at 9:30 AM PT/16:30 UTC. This will be the new time going forward for Research Showcases in order to give more access to other timezones. YouTube stream: https://www.youtube.com/watch?v=fDhAnHrkBks As usual, you can join the conversation on IRC at #wikimedia-research. You can also watch our past research showcases here: https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase This month's presentations: Citation Needed: A Taxonomy and Algorithmic Assessment of Wikipedia's Verifiability By Miriam Redi, Research, Wikimedia Foundation Among Wikipedia's core guiding principles, verifiability policies have a particularly important role. Verifiability requires that information included in a Wikipedia article be corroborated against reliable secondary sources. Because of the manual labor needed to curate and fact-check Wikipedia at scale, however, its contents do not always evenly comply with these policies. Citations (i.e. reference to external sources) may not conform to verifiability requirements or may be missing altogether, potentially weakening the reliability of specific topic areas of the free encyclopedia. In this project <https://meta.wikimedia.org/wiki/Research:Identification_of_Unsourced_Statem…>, we aimed to provide an empirical characterization of the reasons why and how Wikipedia cites external sources to comply with its own verifiability guidelines. First, we constructed a taxonomy of reasons why inline citations are required by collecting labeled data from editors of multiple Wikipedia language editions. We then collected a large-scale crowdsourced dataset of Wikipedia sentences annotated with categories derived from this taxonomy. Finally, we designed and evaluated algorithmic models to determine if a statement requires a citation, and to predict the citation reason based on our taxonomy. We evaluated the robustness of such models across different classes of Wikipedia articles of varying quality, as well as on an additional dataset of claims annotated for fact-checking purposes. Redi, M., Fetahu, B., Morgan, J., & Taraborelli, D. (2019, May). Citation Needed: A Taxonomy and Algorithmic Assessment of Wikipedia's Verifiability. In The World Wide Web Conference (pp. 1567-1578). ACM. https://arxiv.org/abs/1902.11116 Patrolling on Wikipedia By Jonathan T. Morgan, Research, Wikimedia Foundation I will present initial findings from an ongoing research study <https://meta.wikimedia.org/wiki/Research:Patrolling_on_Wikipedia> of patrolling workflows on Wikimedia projects. Editors patrol recent pages and edits to ensure that Wikimedia projects maintains high quality as new content comes in. Patrollers revert vandalism and review newly-created articles and article drafts. Patrolling of new pages and edits is vital work. In addition to making sure that new content conforms to Wikipedia project policies, patrollers are the first line of defense against disinformation, copyright infringement, libel and slander, personal threats, and other forms of vandalism on Wikimedia projects. This research project is focused on understanding the needs, priorities, and workflows of editors who patrol new content on Wikimedia projects. The findings of this research can inform the development of better patrolling tools as well as non-technological interventions intended to support patrollers and the activity of patrolling. -- Janna Layton (she, her) Administrative Assistant - Product & Technology Wikimedia Foundation <https://wikimediafoundation.org/>

1 2

Python 2 is going EOL on January 1st - Do you need Python 2 packages in Analytics?
by Luca Toscano 13 Sep '19

13 Sep '19

Hi everybody, as https://www.python.org/doc/sunset-python-2/ says Python 2 is finally going EOL on January 1st. We (as Analytics team) have a lot of packages deployed on stat/notebook/hadoop hosts via puppet that should be removed, but before doing so we'd need to know if anybody of you is currently using a Python-2-only environment to work/research/test/etc... If so, please comment in the following task so we'll discuss your use case and possibly find a Python-3 solution: https://phabricator.wikimedia.org/T204737 In the task we are going to add info about common packages that we know (keras, tensorflow, pytorch, etc..) to help you migrate to Python 3 as quickly and painlessly as possible, so if you are interested please subscribe to the task. Thanks in advance! Luca (on behalf of the Analytics team)

4 3

The Community Insights survey (for volunteer devs)
by Johan Jönsson 11 Sep '19

11 Sep '19

Hey folks, I'm helping Rebecca Maung (rmaung(a)wikimedia.org) distribute this request. Her words below: The Wikimedia Foundation is asking for your feedback in the annual Community Insights survey. We want to know how well we are supporting your work on- and off-wiki, and how we can change or improve things in the future. The opinions you share will directly affect the current and future work of the Wikimedia Foundation. If you are a volunteer developer, and have contributed code to any pieces of MediaWiki, gadgets, or tools, please complete the survey. It is available in various languages and will take between 15 and 25 minutes to complete. Follow this link to the survey: https://wikimedia.qualtrics.com/jfe/form/SV_0pSrrkJAKVRXPpj?Target=dev If you have seen a similar message elsewhere and have already taken the Community Insights survey, please do not take it twice. You can find more information about this survey on the project page and see how your feedback helps the Wikimedia Foundation support contributors like you. This survey is hosted by a third-party service and governed by this privacy statement. Please visit our frequently asked questions page to find more information about this survey. If you need additional help, send an email to surveys(a)wikimedia.org. Thank you! //Johan Jönsson --

1 0

Prepare your small tasks to mentor a new contributor! (Google Code-in early heads-up)
by Andre Klapper 11 Sep '19

11 Sep '19

(Note: This is only an early heads-up, to be prepared. Google Code-in has NOT been announced yet, but last year, GCI mentors asked for more time in advance to identify tasks to mentor. Here you are. :) * You have small, self-contained bugs you'd like to see fixed? * Your documentation needs specific improvements? * Your user interface has some smaller design issues? * Your Outreachy/Summer of Code project welcomes small tweaks? * You'd enjoy helping someone port your template to Lua? * Your gadget code uses some deprecated API calls? * You have tasks in mind that welcome some research? Google Code-in (GCI) is an annual contest for 13-17 year old students. GCI 2019 has not yet been announced but usually takes place from late October to December. It is not only about coding: We also need tasks about design, docs, outreach/research, QA. Read https://www.mediawiki.org/wiki/Google_Code-in/Mentors , add your name to the mentors table, and start tagging tasks in Wikimedia Phabricator by adding the #gci-2019 project tag. We will need MANY mentors and MANY tasks, otherwise we cannot make it. Last year, 199 students successfully worked on 765 tasks supported by 39 mentors. For some achievements from the last round, see https://wikimediafoundation.org/news/2019/02/20/partnerships-make-it-possib… Note that "beginner tasks" (e.g. "Set up Vagrant") and generic tasks are very welcome (like "Choose and replace 2 uses of Linker::link() from the list in T223010" style). We also have more than 400 unassigned open #good-first-bug tasks: https://phabricator.wikimedia.org/maniphest/query/3YnDUWYJfXSo/#R Can and would you mentor some of these tasks in your area? Please take a moment to find / update [Phabricator etc.] tasks in your project(s) which would take an experienced contributor 2-3 hours. Read https://www.mediawiki.org/wiki/Google_Code-in/Mentors , ask if you have any questions, and add your name to https://www.mediawiki.org/wiki/Google_Code-in/2019#List_of_Wikimedia_mentors Thanks (as we will not be able to run this without your help), andre -- Andre Klapper (he/him) | Bugwrangler / Developer Advocate https://blogs.gnome.org/aklapper/

1 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Analytics September 2019