Analytics March 2019

analytics@lists.wikimedia.org

12 participants
14 discussions

Community health metrics kit: Input needed!
by Joe Sutherland 27 Feb '20

27 Feb '20

Hello everyone - apologies for cross-posting! *TL;DR*: We would like your feedback on our Metrics Kit project. Please have a look and comment on Meta-Wiki: https://meta.wikimedia.org/wiki/Community_health_initiative/Metrics_kit The Wikimedia Foundation's Trust and Safety team, in collaboration with the Community Health Initiative, is working on a Metrics Kit designed to measure the relative "health"[1] of various communities that make up the Wikimedia movement: https://meta.wikimedia.org/wiki/Community_health_initiative/Metrics_kit The ultimate outcome will be a public suite of statistics and data looking at various aspects of Wikimedia project communities. This could be used by both community members to make decisions on their community direction and Wikimedia Foundation staff to point anti-harassment tool development in the right direction. We have a set of metrics we are thinking about including in the kit, ranging from the ratio of active users to active administrators, administrator confidence levels, and off-wiki factors such as freedom to participate. It's ambitious, and our methods of collecting such data will vary. Right now, we'd like to know: * Which metrics make sense to collect? Which don't? What are we missing? * Where would such a tool ideally be hosted? Where would you normally look for statistics like these? * We are aware of the overlap in scope between this and Wikistats < https://stats.wikimedia.org/v2/#/all-projects> — how might these tools coexist? Your opinions will help to guide this project going forward. We'll be reaching out at different stages of this project, so if you're interested in direct messaging going forward, please feel free to indicate your interest by signing up on the consultation page. Looking forward to reading your thoughts. best, Joe P.S.: Please feel free to CC me in conversations that might happen on this list! [1] What do we mean by "health"? There is no standard definition of what makes a Wikimedia community "healthy", but there are many indicators that highlight where a wiki is doing well, and where it could improve. This project aims to provide a variety of useful data points that will inform community decisions that will benefit from objective data. -- *Joe Sutherland* (he/him or they/them) Trust and Safety Specialist Wikimedia Foundation joesutherland.rocks

7 8

Availability of data on Wikipedia Zero rollout
by Sneha Narayan 29 Mar '19

29 Mar '19

Hello everyone! I'm a CS professor at Carleton College (formerly a PhD student at Northwestern), and I've collaborated with folks at WMF on WP research in the past. Most notably, I was the lead author on a paper written with Jonathan Morgan and Jake Orlowitz evaluating the Wikipedia Adventure <https://dl.acm.org/citation.cfm?id=2998307>. I hope to continue having productive collaborations with other people who care about WP, and keep producing research that supports the future of the project. A potential research idea I'd like to explore is understanding whether and how Wikipedia Zero impacted the amount and nature of participation on WP in the projects that were affected by its rollout. Since the Wikipedia Zero program lasted during a particular time period and then ended, it also sets up a potentially good avenue for a comparative study. I was wondering if any of you were aware of any datasets that log information about the rollout of/participation in Wikipedia Zero. Specifically, some of the data I'm interested in include: - Countries that had access to WP Zero, including dates/times that this was turned on and off - Any information about whether access through WP Zero meant that you could only visit/edit particular parts or language editions of Wikipedia (I don't think this was the case, but I wanted to make sure) - Edits made to any WP language edition by IP addresses from those countries during that time period - Whether those edits were being made using a device that accessed WP through WP Zero - The kind of device being used for editing - Other generic features of the edit (number of characters, namespace, registered/unregistered etc) I'm aware that the data may not be recorded exactly along those lines, but I was still curious to know what data about Wikipedia Zero is out there, and whether or not it was publicly available. Thank you for your help! -Sneha *Sneha Narayan* Department of Computer Science Carleton College snehanarayan.com

4 3

Easier mapping from Wikistats1 to Wikistats2 metrics
by Nuria Ruiz 29 Mar '19

29 Mar '19

Hello! Analytics team would like to announce couple changes. We are working towards an easier way to navigate metrics that appear in both Wikistats1 and Wikistats2 and compare numbers, please take a look at changes deployed today for (for example) Italian Wikipedia: https://stats.wikimedia.org/v2/#/metrics/it.wikipedia.org This is an example of a definition of active editors that matches Wikistats1: https://stats.wikimedia.org/v2/#/it.wikipedia.org/contributing/active-edito… As always please file bugs (suggestions are fine too) on Phab using this handy link: https://phabricator.wikimedia.org/maniphest/task/edit/?title=Wikistats%20Bu… Thanks, Nuria

2 1

need metric definition clarification
by Ryan Kaldari 27 Mar '19

27 Mar '19

I need clarification about whether or not this graph <https://stats.wikimedia.org/v2/#/en.wikipedia.org/contributing/new-pages/no…> reflects deletions or not. It seems to be unclear: https://meta.wikimedia.org/w/index.php?title=Research%3AWikistats_metrics%2… Thank you.

2 2

please hire a CTO who wants to protect reader privacy
by James Salsman 23 Mar '19

23 Mar '19

I noticed just now that the Foundation is soliciting applications for a new CTO: https://www.linkedin.com/feed/update/urn:li:activity:6515003866130505729 Can we please hire a CTO who would prefer to protect reader privacy above the interests of any State or non-state actors, whether they have infiltrated staff, contractor, and NDA signatory ranks, and whether it interferes with reader statistics and analytics or not, please? In particular, I would like to repeat my request that we should not be logging personally identifiable information which might increase our subpoena burden or result in privacy violation incidents. Fuzzing geolocation is okay, but we should not be transmitting IP addresses into logs across even a LAN, for example, and we certainly shouldn't be purchasing hardware with backdoor coprocessors wasting electricity and exposing us to government or similar intrusions: https://lists.wikimedia.org/pipermail/analytics/2017-January/005696.html Best regards, Jim

1 0

[Wikimedia Research Showcase] March 20 at 11:30 AM PST, 18:30 UTC
by Janna Layton 20 Mar '19

20 Mar '19

Hello everyone, I'm just sending a reminder that the below Showcase will be starting in half an hour. -Janna Layton Hi all, The next Research Showcase, “Learning How to Correct a Knowledge Base from the Edit History” and “TableNet: An Approach for Determining Fine-grained Relations for Wikipedia Tables” will be live-streamed this Wednesday, March 20, 2019, at 11:30 AM PST/18:30 UTC (Please note the change in time in UTC due to daylight saving changes in the U.S.). The first presentation is about using edit history to automatically correct constraint violations in Wikidata, and the second is about interlinking Wikipedia tables. YouTube stream: https://www.youtube.com/watch?v=6p62PMhkVNM As usual, you can join the conversation on IRC at #wikimedia-research. You can also watch our past research showcases at https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase . This month's presentations: Learning How to Correct a Knowledge Base from the Edit History By Thomas Pellissier Tanon (Télécom ParisTech), Camille Bourgaux (DI ENS, CNRS, ENS, PSL Univ. & Inria), Fabian Suchanek (Télécom ParisTech), WWW'19. The curation of Wikidata (and other knowledge bases) is crucial to keep the data consistent, to fight vandalism and to correct good faith mistakes. However, manual curation of the data is costly. In this work, we propose to take advantage of the edit history of the knowledge base in order to learn how to correct constraint violations automatically. Our method is based on rule mining, and uses the edits that solved violations in the past to infer how to solve similar violations in the present. For example, our system is able to learn that the value of the [[d:Property:P21|sex or gender]] property [[d:Q467|woman]] should be replaced by [[d:Q6581072|female]]. We provide [https://tools.wmflabs.org/wikidata-game/distributed/#game=43 a Wikidata game] that suggests our corrections to the users in order to improve Wikidata. Both the evaluation of our method on past corrections, and the Wikidata game statistics show significant improvements over baselines. TableNet: An Approach for Determining Fine-grained Relations for Wikipedia Tables By Besnik Fetahu Wikipedia tables represent an important resource, where information is organized w.r.t table schemas consisting of columns. In turn each column, may contain instance values that point to other Wikipedia articles or primitive values (e.g. numbers, strings etc.). In this work, we focus on the problem of interlinking Wikipedia tables for two types of table relations: equivalent and subPartOf. Through such relations, we can further harness semantically related information by accessing related tables or facts therein. Determining the relation type of a table pair is not trivial, as it is dependent on the schemas, the values therein, and the semantic overlap of the cell values in the corresponding tables. We propose TableNet, an approach that constructs a knowledge graph of interlinked tables with subPartOf and equivalent relations. TableNet consists of two main steps: (i) for any source table we provide an efficient algorithm to find all candidate related tables with high coverage, and (ii) a neural based approach, which takes into account the table schemas, and the corresponding table data, we determine with high accuracy the table relation for a table pair. We perform an extensive experimental evaluation on the entire Wikipedia with more than 3.2 million tables. We show that with more than 88% we retain relevant candidate tables pairs for alignment. Consequentially, with an accuracy of 90% we are able to align tables with subPartOf or equivalent relations. Comparisons with existing competitors show that TableNet has superior performance in terms of coverage and alignment accuracy. -- Janna Layton (she, her) Administrative Assistant - Audiences & Technology Wikimedia Foundation <https://wikimediafoundation.org/>

1 0

[Wikimedia Research Showcase] March 20 at 11:30 AM PST, 18:30 UTC
by Leila Zia 19 Mar '19

19 Mar '19

Hi all, The next Research Showcase, “Learning How to Correct a Knowledge Base from the Edit History” and “TableNet: An Approach for Determining Fine-grained Relations for Wikipedia Tables” will be live-streamed this Wednesday, March 20, 2019, at 11:30 AM PST/18:30 UTC (Please note the change in time in UTC due to daylight saving changes in the U.S.). The first presentation is about using edit history to automatically correct constraint violations in Wikidata, and the second is about interlinking Wikipedia tables. YouTube stream: https://www.youtube.com/watch?v=6p62PMhkVNM As usual, you can join the conversation on IRC at #wikimedia-research. You can also watch our past research showcases at https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase . This month's presentations: Learning How to Correct a Knowledge Basefrom the Edit History By Thomas Pellissier Tanon (Télécom ParisTech), Camille Bourgaux (DI ENS, CNRS, ENS, PSL Univ. & Inria), Fabian Suchanek (Télécom ParisTech), WWW'19. The curation of Wikidata (and other knowledge bases) is crucial to keep the data consistent, to fight vandalism and to correct good faith mistakes. However, manual curation of the data is costly. In this work, we propose to take advantage of the edit history of the knowledge base in order to learn how to correct constraint violations automatically. Our method is based on rule mining, and uses the edits that solved violations in the past to infer how to solve similar violations in the present. For example, our system is able to learn that the value of the [[d:Property:P21|sex or gender]] property [[d:Q467|woman]] should be replaced by [[d:Q6581072|female]]. We provide [https://tools.wmflabs.org/wikidata-game/distributed/#game=43 a Wikidata game] that suggests our corrections to the users in order to improve Wikidata. Both the evaluation of our method on past corrections, and the Wikidata game statistics show significant improvements over baselines. TableNet: An Approach for Determining Fine-grained Relations for Wikipedia Tables By Besnik Fetahu Wikipedia tables represent an important resource, where information is organized w.r.t table schemas consisting of columns. In turn each column, may contain instance values that point to other Wikipedia articles or primitive values (e.g. numbers, strings etc.). In this work, we focus on the problem of interlinking Wikipedia tables for two types of table relations: equivalent and subPartOf. Through such relations, we can further harness semantically related information by accessing related tables or facts therein. Determining the relation type of a table pair is not trivial, as it is dependent on the schemas, the values therein, and the semantic overlap of the cell values in the corresponding tables. We propose TableNet, an approach that constructs a knowledge graph of interlinked tables with subPartOf and equivalent relations. TableNet consists of two main steps: (i) for any source table we provide an efficient algorithm to find all candidate related tables with high coverage, and (ii) a neural based approach, which takes into account the table schemas, and the corresponding table data, we determine with high accuracy the table relation for a table pair. We perform an extensive experimental evaluation on the entire Wikipedia with more than 3.2 million tables. We show that with more than 88% we retain relevant candidate tables pairs for alignment. Consequentially, with an accuracy of 90% we are able to align tables with subPartOf or equivalent relations. Comparisons with existing competitors show that TableNet has superior performance in terms of coverage and alignment accuracy. Best, Leila

1 0

New User Analysis
by RhinosF1 Wikipedia 16 Mar '19

16 Mar '19

Hi, Thanks for the responses. Is it best to use Talk:Quarry to request assistance or can someone provide the SQL if I can't work it out? I'll look at HostBot's query first but that might help so thanks Jmorgan Can you please cc me on any reply as I'm not part of the list? Thanks, RhinosF1

2 1

New User Analysis
by RhinosF1 Wikipedia 14 Mar '19

14 Mar '19

Hi, Could someone advise how I could do Quarry queries on user activity? I'd like to monitor the following on en-wiki: All for unblocked humans -Number of active, confirmed editors (Done, https://quarry.wmflabs.org/query/34268) -Number of active editors with only 1 edit -Number of active editors with only 1 mainspace edit -Number of active non-conformirmed Editors Definitions: Active : 1 edit in last 30 days Human : No bot flag Confirmed : 4 days / 10 edits

3 2

R: Analytics Digest, Vol 85, Issue 5
by viviana paga 14 Mar '19

14 Mar '19

Thank you so much Nuria for your clarifications ! All the best, Viviana ________________________________ Da: Analytics <analytics-bounces(a)lists.wikimedia.org> per conto di analytics-request(a)lists.wikimedia.org <analytics-request(a)lists.wikimedia.org> Inviato: lunedì 11 marzo 2019 17:53 A: analytics(a)lists.wikimedia.org Oggetto: Analytics Digest, Vol 85, Issue 5 Send Analytics mailing list submissions to analytics(a)lists.wikimedia.org To subscribe or unsubscribe via the World Wide Web, visit https://lists.wikimedia.org/mailman/listinfo/analytics or, via email, send a message with subject or body 'help' to analytics-request(a)lists.wikimedia.org You can reach the person managing the list at analytics-owner(a)lists.wikimedia.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Analytics digest..." Today's Topics: 1. Re: R: Analytics Digest, Vol 85, Issue 3 (Nuria Ruiz) ---------------------------------------------------------------------- Message: 1 Date: Mon, 11 Mar 2019 09:53:32 -0700 From: Nuria Ruiz <nuria(a)wikimedia.org> To: "A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics." <analytics(a)lists.wikimedia.org> Subject: Re: [Analytics] R: Analytics Digest, Vol 85, Issue 3 Message-ID: <CAMpYYkF=r=EWPivqkbVoOS+4i0L6WYi3BviKBaAoZ7C4iWUiAA(a)mail.gmail.com> Content-Type: text/plain; charset="utf-8" >but I'd like to follow the behaviour flow when an user access some Wikipedia page following a link by my website. >I don't know if that is possible somehow and if it makes sense for you. I see, It does make sense but that is not data we have. Thanks, Nuria On Sat, Mar 9, 2019 at 5:19 AM viviana paga <viviana.paga(a)hotmail.it> wrote: > Hi Nuria, > thanks for your reply and tips! > As you propose, I use matomo to get data from my client, but I'd like to > follow the behaviour flow when an user access some Wikipedia page following > a link by my website. > I don't know if that is possible somehow and if it makes sense for you. > Many thanks, > Viviana > > > ------------------------------ > *Da:* Analytics <analytics-bounces(a)lists.wikimedia.org> per conto di > analytics-request(a)lists.wikimedia.org < > analytics-request(a)lists.wikimedia.org> > *Inviato:* venerdì 8 marzo 2019 17:00 > *A:* analytics(a)lists.wikimedia.org > *Oggetto:* Analytics Digest, Vol 85, Issue 3 > > Send Analytics mailing list submissions to > analytics(a)lists.wikimedia.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.wikimedia.org/mailman/listinfo/analytics > or, via email, send a message with subject or body 'help' to > analytics-request(a)lists.wikimedia.org > > You can reach the person managing the list at > analytics-owner(a)lists.wikimedia.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Analytics digest..." > > > Today's Topics: > > 1. R: Analytics Digest, Vol 85, Issue 2 (viviana paga) > 2. Re: R: Analytics Digest, Vol 85, Issue 2 (Nuria Ruiz) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 8 Mar 2019 13:21:34 +0000 > From: viviana paga <viviana.paga(a)hotmail.it> > To: "analytics(a)lists.wikimedia.org" <analytics(a)lists.wikimedia.org> > Subject: [Analytics] R: Analytics Digest, Vol 85, Issue 2 > Message-ID: > < > PR1PR06MB4698D7F323AE49A1CC429C23E44D0(a)PR1PR06MB4698.eurprd06.prod.outlook.com > > > > Content-Type: text/plain; charset="utf-8" > > Hi Dan, > thanks for your reply ! > I agree with you and in fact I do that in my front-end, but I think that > it would be interesting have same general stats from Wikimedia too; in > particular to understand which impact could my project have on general > Wikimedia stats and which will be the behaviour of the users arriving to > Wikimedia from my site (if the attended one or not). > I thought having some stats by api-user-agent from backend could help me > to understand these points and improve in the future my project in the best > way. What do you think ? Is there a procedure that can I follow to have > these stats? > Many thanks, > Viviana > > ________________________________ > Da: Analytics <analytics-bounces(a)lists.wikimedia.org> per conto di > analytics-request(a)lists.wikimedia.org < > analytics-request(a)lists.wikimedia.org> > Inviato: venerdì 8 marzo 2019 13:00 > A: analytics(a)lists.wikimedia.org > Oggetto: Analytics Digest, Vol 85, Issue 2 > > Send Analytics mailing list submissions to > analytics(a)lists.wikimedia.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.wikimedia.org/mailman/listinfo/analytics > or, via email, send a message with subject or body 'help' to > analytics-request(a)lists.wikimedia.org > > You can reach the person managing the list at > analytics-owner(a)lists.wikimedia.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Analytics digest..." > > > Today's Topics: > > 1. Stats of mediawiki API / Access to non-public data (viviana paga) > 2. Re: Stats of mediawiki API / Access to non-public data > (Dan Andreescu) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Thu, 7 Mar 2019 14:15:27 +0000 > From: viviana paga <viviana.paga(a)hotmail.it> > To: "analytics(a)lists.wikimedia.org" <analytics(a)lists.wikimedia.org> > Subject: [Analytics] Stats of mediawiki API / Access to non-public > data > Message-ID: > < > PR1PR06MB4698B179051D012C970A8928E44C0(a)PR1PR06MB4698.eurprd06.prod.outlook.com > > > > Content-Type: text/plain; charset="utf-8" > > Hi all, > > I’m working on a project about the sharing of the cultural heritage and, > more in general, about the sharing of the open knowledges. > In particular, I'm developing a webservice that use the Mediawiki API and > I'd like to have some stats about the traffic of my api calls to the > commons.wikipedia.org domain. > > More specifically, I'd like to have: > - the number of GET Requests by Api-User-Agent > - the number of views/edit by Api-User-Agent > - the stats of the Wikipedia traffic from inbound links by a specif domain > or url > > Is this possible somehow to access to these limited non-public data? > Is there a procedure that I can follow? > > The project is still in development, but next April we will release a beta > version for a limited range of users-testers. > The project is completely non-profit and it would provide maximum freedom, > independence and privacy for its users. > That’s why, I’d like to have from backend some stats by api-user-agent: > that would guarantees the total privacy of the user, and, at the same time, > the project could have some general stats about the traffic, the > utilisation and its impact on the general Wikimedia stats. > > If someone among you is interested in these issues (open-shared-cultural > heritage, open linked data), I’d like to keep in touch and, even, to > propose to partecipate as tester in April. > > Thank you in advance, > Kind regards, > Viviana Paga > https://www.linkedin.com/in/viviana-paga-42bb8b44/ > >

1 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Analytics March 2019