Analytics

analytics@lists.wikimedia.org

2 participants
1825 discussions

Developer Portal content draft: A future central place which links key technical documents
by Andre Klapper 06 May '21

06 May '21

<tl;dr>: Your feedback as a technically-minded Wikimedian is welcome on https://www.mediawiki.org/wiki/Developer_Advocacy/Developer_Portal/Content_… Hi everyone, last year[1] the Developer Advocacy team started to work on a single, central entry point for developers and tech-minded people for Wikimedia's technical documentation[2]. A central entry point ("developer portal") would cover common technical use cases to allow existing and future technical contributors and developers to find the information they need. Each use case links to its most relevant documentation (i.e. to pages on wikitech or mediawiki.org). This is part of a larger initiative to implement an organization strategy for key technical documents: Understand challenges about finding and maintaining docs, identify key docs, and investigate ways to improve our workflows around documentation. So far we: * Researched and reviewed existing documentation venues and pages * Reviewed developer/documentation portals in the broader industry * Interviewed several engineering teams around technical documentation workflows, audiences, and key technical docs (the key themes from these conversations are available, see the link above) * Created an initial draft for the structure and content of the single entry point Now we would like to improve this initial content draft with your help. 1) Please take a look at the initial draft at https://www.mediawiki.org/wiki/Developer_Advocacy/Developer_Portal/Content_… 2) Then, please help improve it by sharing your thoughts and feedback until *May 25th* at https://www.mediawiki.org/wiki/Developer_Advocacy/Developer_Portal/Content_… Note that this is only a draft how to structure content. It is not a design or layout proposal and it is not an implementation. For additional future work, see also the Phabricator workboard[3]. Next steps include: * A session at the remote Hackathon (May 22-23; [4]) * Incorporate content improvements, based on your feedback * Check the documents linked from the single entry point proposal for accuracy * Investigate requirements for the technical implementation * Investigate improvements of processes around technical documentation (structure, locations, navigation, stewardship, etc). If you want to learn more about the project, please see https://www.mediawiki.org/wiki/Developer_Advocacy/Developer_Portal Thanks to everybody who has provided their valuable input to get to this stage, and thanks in advance to everyone who will! Cheers, andre [1] https://lists.wikimedia.org/pipermail/wikitech-l/2020-August/093773.html [2] https://www.mediawiki.org/wiki/Developer_Advocacy/Developer_Portal [3] https://phabricator.wikimedia.org/tag/wikimedia-developer-portal/ [4] https://www.mediawiki.org/wiki/Wikimedia_Hackathon_2021 -- Andre Klapper (he/him) | Bugwrangler / Developer Advocate https://blogs.gnome.org/aklapper/

1 0

Fixing missing revision-create events & removing the rev_is_revert field
by Andrew Otto 05 May '21

05 May '21

Hi all, tl;dr: we'd like to remove the rev_is_revert field from the mediawiki.revision-create stream to solve a missing event problem. For years now, we've known that the mediawiki.revision-create stream <https://stream.wikimedia.org/?doc#/streams/get_v2_stream_mediawiki_revision…> has been missing many real revision create events <https://phabricator.wikimedia.org/T215001> when compared with MediaWiki's MySQL databases. This makes the stream almost useless for those who want to use it as a notification mechanism about all MediaWiki page changes. The reason for the large number of missing events is because the code that emits the event is subscribing to the wrong MediaWiki hook. This patch <https://gerrit.wikimedia.org/r/c/mediawiki/extensions/EventBus/+/679353/> will fix this, however the correct hook does not give us the information we need to set the rev_is_revert and rev_revert_details fields. This field is relatively new (only added last August 2020 <https://github.com/wikimedia/schemas-event-primary/commit/53b6480cb1045316c…>). We think that including the missing revisions is more important than capturing the revert information, which really only captures whether or not a user used the MediaWiki UI to issue a revert. We plan on moving forward with this, but would like feedback before we do. If you have objections, or other ideas on how we can provide this data (like maybe including it in mediawiki/revision-tags-change <https://schema.wikimedia.org/repositories//primary/jsonschema/mediawiki/rev…> and making that public?), let us know by replying to this email or in this ticket: https://phabricator.wikimedia.org/T215001 Thanks! -Andrew Otto SRE, Data Engineering, WMF

1 1

Invitation for Wikimedia Research Office hours May 04, 2021
by Martin Gerlach 04 May '21

04 May '21

Hi all, Join the Research Team at the Wikimedia Foundation [1] for their monthly Office hours on 2021-05-04 at 16:00-17:00 UTC (9am PT/6pm CET). To participate, join the video-call via this link [2]. There is no set agenda - feel free to add your item to the list of topics in the etherpad [3] (You can do this after you join the meeting, too.), otherwise you are welcome to also just hang out. More detailed information (e.g. about how to attend) can be found here [4]. Through these office hours, we aim to make ourselves more available to answer some of the research related questions that you as Wikimedia volunteer editors, organizers, affiliates, staff, and researchers face in your projects and initiatives. Some example cases we hope to be able to support you in: - You have a specific research related question that you suspect you should be able to answer with the publicly available data and you don’t know how to find an answer for it, or you just need some more help with it. For example, how can I compute the ratio of anonymous to registered editors in my wiki? - You run into repetitive or very manual work as part of your Wikimedia contributions and you wish to find out if there are ways to use machines to improve your workflows. These types of conversations can sometimes be harder to find an answer for during an office hour, however, discussing them can help us understand your challenges better and we may find ways to work with each other to support you in addressing it in the future. - You want to learn what the Research team at the Wikimedia Foundation does and how we can potentially support you. Specifically for affiliates: if you are interested in building relationships with the academic institutions in your country, we would love to talk with you and learn more. We have a series of programs that aim to expand the network of Wikimedia researchers globally and we would love to collaborate with those of you interested more closely in this space. - You want to talk with us about one of our existing programs [5]. Hope to see many of you, Martin on behalf of the WMF Research Team [1] https://research.wikimedia.org/team.html [2] https://meet.jit.si/WMF-Research-Office-Hours [3] https://etherpad.wikimedia.org/p/Research-Analytics-Office-hours [4] https://www.mediawiki.org/wiki/Wikimedia_Research/Office_hours [5] https://research.wikimedia.org/projects.html -- Martin Gerlach Research Scientist Wikimedia Foundation

1 1

Growth in reader engagement since 2016?
by Tilman Bayer 30 Apr '21

30 Apr '21

In her recent announcement of her upcoming departure as the Wikimedia Foundation's CEO, Katherine highlighted a growth in "reader engagement" by 30% during her tenure (i.e. since 2016).[1] A WMF board member since reported in somewhat more detail that this refers to "~1 billion interactions up 32% in six years".[2] Are the underlying numbers published somewhere? Regards, Tilman [1] https://twitter.com/krmaher/status/1357390962410987520 [2] https://twitter.com/raju/status/1371100758343614471 PS: As some may be aware, a widely read German blogger linked to Katherine's tweet while singling out the "reader engagement" bit for some outspoken criticism. Just to clarify, that's not why I'm asking (in fact I disagree with most of that criticism).

3 4

[events] Wiki Workshop 2021 Announcement and Call for Papers
by Leila Zia 08 Apr '21

08 Apr '21

Hi everyone, We are delighted to announce that Wiki Workshop 2021 will be held virtually in April 2021 and as part of the Web Conference 2021 [1]. The exact day is to be finalized and we know it will be between April 19-23. In the past years, Wiki Workshop has traveled to Oxford, Montreal, Cologne, Perth, Lyon, and San Francisco, and (virtually) to Taipei. Last year, we had more than 120 participants in the workshop and we are particularly excited about this year's as we will celebrate the 20th birthday of Wikipedia. We encourage contributions by all researchers who study the Wikimedia projects. We specifically encourage 1-2 page submissions of preliminary research. You will have the option to publish your work as part of the proceedings of The Web Conference 2021. You can read more about the call for papers and the workshop at http://wikiworkshop.org/2021/#call. Please note that the deadline for the submissions to be considered for proceedings is January 29. All other submissions should be received by March 1. If you have questions about the workshop, please let us know on this list or at wikiworkshop(a)googlegroups.com. Looking forward to seeing many of you in this year's edition. Best, Miriam Redi, Wikimedia Foundation Bob West, EPFL Leila Zia, Wikimedia Foundation [1] https://www2021.thewebconf.org/

1 2

PAWS ready to use the new DB replicas
by Joaquin Oltra Hernandez 01 Apr '21

01 Apr '21

TLDR: - PAWS can now connect to the new replicas, see News/Wiki Replicas 2020 Redesign#How should I connect to databases in PAWS? <https://wikitech.wikimedia.org/wiki/News/Wiki_Replicas_2020_Redesign#How_sh…> for more info. - Report issues here: T276284 Establish a working setup for PAWS with multi-instance wikireplicas <https://phabricator.wikimedia.org/T276284> Hi! I'm forwarding this message from the cloud lists, in case you use PAWS and didn't see the message. PAWS is now capable of connecting and using the new replicas. For background on the new replicas, please see News/Wiki_Replicas_2020_Redesign <https://wikitech.wikimedia.org/wiki/News/Wiki_Replicas_2020_Redesign> Here are some resources you can check: - News/Wiki Replicas 2020 Redesign#How should I connect to databases in PAWS? <https://wikitech.wikimedia.org/wiki/News/Wiki_Replicas_2020_Redesign#How_sh…> - Accessing the new replicas, changes from the previous cluster <https://public.paws.wmcloud.org/User:JHernandez_(WMF)/Accessing%20the%20new…> - Using Wikireplicas from PAWS with Python <https://public.paws.wmcloud.org/User:JHernandez_(WMF)/Accessing%20Wikirepli…> In summary, due to issues with mysql-proxy and the new architecture, connecting to the replicas will be more in line with the Toolforge approach. There is a credentials file in $HOME/.my.cnf that you can use when connecting, instead of the environment variables. For the host name, you can use the same ones you would use when connecting from Toolforge (" {wiki}.{analytics,web}.db.svc.wikimedia.cloud"). To update a notebook, here is an example of the couple of changes when connecting: - import os import pymysql conn = pymysql.connect( - host = os.environ['MYSQL_HOST'], + host = "eswiki.analytics.db.svc.wikimedia.cloud", - user = os.environ['MYSQL_USERNAME'], - password = os.environ['MYSQL_PASSWORD'], + read_default_file = ".my.cnf", database = "eswiki_p" ) Note you have to connect to the host name of the DB you are going to query against. Existing notebooks remain readable with the output cached, and we are working on updating the documentation. In two weeks -April 15- the old cluster will migrate the old cluster to utilize new replication hosts, at which point replication may stop and running PAWS notebooks connecting to the old cluster may get stale results. In ~four weeks -April 28- the old hostnames will be redirected to the new cluster, and running notebooks connecting to MYSQL_HOST will not work and will need updating the credentials and DB host name. If you find any issues or problems or need help, please reach out via IRC on #wikimedia-cloud, mailing list (cloud(a)lists.wikimedia.org), or in the phabricator task T276284 Establish a working setup for PAWS with multi-instance wikireplicas <https://phabricator.wikimedia.org/T276284> Feel free to forward this as needed to spread the word to PAWS users, thank you! -- Joaquin Oltra Hernandez Developer Advocate - Wikimedia Foundation

1 0

[Wikimedia Research Showcase] March 17: Curiosity
by Janna Layton 17 Mar '21

17 Mar '21

In this showcase, Prof. Danielle Bassett will present recent work studying individual and collective curiosity as network building processes using Wikipedia. Date/Time: March 17, 16:30 UTC (9:30am PT/12:30pm ET/17:30pm CET) Youtube: https://www.youtube.com/watch?v=jw2s_Y4J2tI Speaker: Danielle Bassett (University of Pennsylvania) Title: The curious human Abstract: The human mind is curious. It is strange, remarkable, and mystifying; it is eager, probing, questioning. Despite its pervasiveness and its relevance for our well-being, scientific studies of human curiosity that bridge both the organ of curiosity and the object of curiosity remain in their infancy. In this talk, I will integrate historical, philosophical, and psychological perspectives with techniques from applied mathematics and statistical physics to study individual and collective curiosity. In the former, I will evaluate how humans walk on the knowledge network of Wikipedia during unconstrained browsing. In doing so, we will capture idiosyncratic forms of curiosity that span multiple millennia, cultures, languages, and timescales. In the latter, I will consider the fruition of collective curiosity in the building of scientific knowledge as encoded in Wikipedia. Throughout, I will make a case for the position that individual and collective curiosity are both network building processes, providing a connective counterpoint to the common acquisitional account of curiosity in humans. Related papers: Hunters, busybodies, and the knowledge network building associated with curiosity. https://doi.org/10.31234/osf.io/undy4 The network structure of scientific revolutions. http://arxiv.org/abs/2010.08381 https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#March_2021 -- Janna Layton (she/her) Administrative Associate - Product & Technology Wikimedia Foundation <https://wikimediafoundation.org/>

2 3

About Varnish NTP server time accuracy
by Ho Chung 16 Mar '21

16 Mar '21

Hello Have anyone know the Varnish ntp server clock accuracy can it be used as a forensic clock after evaluation? Do you use some third-party unapproved NTP clocks for your time? Thanks 👍

2 2

Invitation for Wikimedia Research Office hours March 16, 2021
by Martin Gerlach 16 Mar '21

16 Mar '21

Hi all, Join the Research Team at the Wikimedia Foundation [1] for their monthly Office hours on 2021-03-16 at 16:00-17:00 UTC (9am PT/5pm CET). To participate, join the video-call via this link [2]. There is no set agenda - feel free to add your item to the list of topics in the etherpad [3] (You can do this after you join the meeting, too.), otherwise you are welcome to also just hang out. More detailed information (e.g. about how to attend) can be found here [4]. Through these office hours, we aim to make ourselves more available to answer some of the research related questions that you as Wikimedia volunteer editors, organizers, affiliates, staff, and researchers face in your projects and initiatives. Some example cases we hope to be able to support you in: - You have a specific research related question that you suspect you should be able to answer with the publicly available data and you don’t know how to find an answer for it, or you just need some more help with it. For example, how can I compute the ratio of anonymous to registered editors in my wiki? - You run into repetitive or very manual work as part of your Wikimedia contributions and you wish to find out if there are ways to use machines to improve your workflows. These types of conversations can sometimes be harder to find an answer for during an office hour, however, discussing them can help us understand your challenges better and we may find ways to work with each other to support you in addressing it in the future. - You want to learn what the Research team at the Wikimedia Foundation does and how we can potentially support you. Specifically for affiliates: if you are interested in building relationships with the academic institutions in your country, we would love to talk with you and learn more. We have a series of programs that aim to expand the network of Wikimedia researchers globally and we would love to collaborate with those of you interested more closely in this space. - You want to talk with us about one of our existing programs [5]. Hope to see many of you, Martin (WMF Research Team) [1] https://research.wikimedia.org/team.html [2] https://meet.jit.si/WMF-Research-Office-Hours [3] https://etherpad.wikimedia.org/p/Research-Analytics-Office-hours [4] https://www.mediawiki.org/wiki/Wikimedia_Research/Office_hours [5] https://research.wikimedia.org/projects.html -- Martin Gerlach Research Scientist Wikimedia Foundation

1 1

Pageview-complete entries labeled as "-"
by Ogier Maitre 16 Mar '21

16 Mar '21

Hello everybody, We are currently working on a wikipedia visualisation tool (which is presented here: http://www.wikimaps.io/). We use several pageview statistics to generate time series for each page from 2008 to 2020. (we use pagecounts, pageviews and pageview_complete). This last format is great for our work compared to previous format, and we use it for our data from 2016 to 2020. (Thank to the analytics team for that). We aggregate redirections as one page, identified by the page_id (as it is done in the pageview_complete files). But when we compare with the wikimedia API, we have some small differences. I think this problem comes from the fact that wikimedia API (and pageviews.toolforge.org) uses page_title to get the time series, and I saw that pageview_complete files contain entries where the page_title is missing (replaced by a "-"). As we are using page_id to do the aggregation whenever it is possible, we aggregate these "-" entries, but pageviews.toolforge.org probably does not. For example for the page Barack_Obama in French, and the file `pageviews-20200112-user.bz2`, I get several relevant entries. fr.wikipedia - 167398 mobile-web 1 B1 fr.wikipedia Barack 167398 mobile-web 1 X1 fr.wikipedia Barack_Hussein_Obama 167398 mobile-web 1 J1 fr.wikipedia Barack_Obama 167398 desktop 748 A18B10C5D8E3F3G8H6I18J36K41L37M35N37O55P76Q65R57S48T29U56V42W23X32 fr.wikipedia Barack_Obama 167398 mobile-app 10 A1L1O1Q1T3U2V1 fr.wikipedia Barack_Obama 167398 mobile-web 1732 A62B38C28D17E24F10G16H43I40J56K65L78M87N100O95P100Q93R127S84T128U124V184W84X49 fr.wikipedia Natasha_Obama 167398 desktop 3 Q1R2 fr.wikipedia Obama 167398 desktop 11 J2K1M1O1Q2R1S1U1W1 fr.wikipedia Obama 167398 mobile-web 2 R1V1 fr.wikipedia Obama_Barack 167398 desktop 3 N1P2 fr.wikipedia Sacha_Obama 167398 desktop 3 J1O2 fr.wikipedia Sacha_Obama 167398 mobile-web 1 C1 fr.wikipedia Barack_Obama mobile-app 29 B1C1H4J1L1M2N3O3P1R3S5V1W2X1 That is 12 entries that use the page_id, and one that does not. I have two questions about that result. What kind of query can cause theses "-" entries ? Why the entry "Barack_Obama mobile-app" appears two times ? Sorry for the long introduction and thank you for your time. Regards, Ogier

3 5

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Analytics