Analytics April 2017

analytics@lists.wikimedia.org

15 participants
18 discussions

Short Hive, Oozie, Druid & Pivot downtime Tuesday April 25th

by Andrew Otto

Update: Due to the big Datacenter Switchover happening next week, we’ve decided to postpone this a bit. We won’t be doing this downtime on Monday April 17th. Instead, we will do this at 13:30 UTC on Tuesday April 25th. Thanks all! On Thu, Apr 6, 2017 at 1:04 PM, Andrew Otto <otto(a)wikimedia.org> wrote: > Hi all! > > As part of our Hadoop Cluster Debian Jessie upgrade, we need to reinstall > the server that acts as a metadata state store for Hive, Oozie and Druid. > To be safe, we plan to take those services plus Pivot (which runs on Druid) > offline during while we reinstall this server. We plan to do this on > Monday April 17th at 13:30 UTC (9:30am US East Coast, 6:30am US West > Coast). We don’t expect more than an hour or two of downtime, but likely > less. > > Let us know if there are any objections. Please forward this email to any > relevant folks. > > Thanks! > -Andrew and Luca, your Analytics Opsen. > > >

7 years

Presentation Code of Conduct Committee candidates and community review

by Quim Gil

Following the process described in the Code of Conduct for Wikimedia technical spaces <https://www.mediawiki.org/wiki/Code_of_Conduct>, the Wikimedia Foundation’s Technical Collaboration team has selected five candidates to form the first Code of Conduct Committee and five candidates to become auxiliary members. Here you have their names in alphabetical order. For details about each candidate, please check https://www.mediawiki.org/wiki/Code_of_Conduct/Committee_members Committee member candidates: - Amir Sarabadani (Ladsgroup) - Lucie-Aimée Kaffee (Frimelle) - Nuria Ruiz (NRuiz-WMF) - Sébastien Santoro (Dereckson) - Tony Thomas (01tonythomas) Auxiliary member candidates: - Ariel Glenn (ArielGlenn) - Caroline Becker (Léna) - Florian Schmidt (Florianschmidtwelzow) - Huji - Matanya This list of candidates is subject to a community review period of two weeks starting today. If no major objections are presented about any candidate, they will be appointed in six weeks. You can provide feedback on these candidates, via private email to techconductcandidates(a)wikimedia.org. This feedback will be received by the Community Health <https://meta.wikimedia.org/wiki/Technical_Collaboration/Community_health> group handling this process, and will be treated with confidentiality. We want to thank all the people who has considered the possibility to support the Code of Conduct with their participation in this Committee. 77 persons have been contacted during the selection process, counting self-nominations and recommendations. From these, 21 made it to a short list of candidates confirmed and (according to our estimation) a potential good fit for the Committee. Selecting the five candidates for the Committee has been hard, as we have tried to form a diverse group that could work together effectively in the consolidation of the Code of Conduct. Selecting the five auxiliary members has been even harder, and we know that we have left out candidates who could have contributed just as much. Being the first people assuming these roles, we have tended a bit towards more technical profiles with good knowledge of our technical spaces. We believe that future renewals will offer better chances to other profiles (not so technical and/or not so Wikimedia veteran), adding a higher diversity and variety of perspectives to the mix. On Thu, Mar 9, 2017 at 12:30 PM, Quim Gil <qgil(a)wikimedia.org> wrote: > Dear Wikimedia technical community members, > > https://www.mediawiki.org/wiki/Code_of_Conduct > > The review of the Code of Conduct for Wikimedia technical spaces has been > completed and now it is time to bootstrap its first committee. The > Technical Collaboration team is looking for five candidates to form the > Committee plus five additional auxiliary members. One of them could be you > or someone you know! > > You can propose yourself as a candidate and you can recommend others > *privately* at > techconductcandidates AT wikimedia DOT org > > We want to form a very diverse list of candidates reflecting the variety > of people, activities, and spaces in the Wikimedia technical community. We > are also open to other candidates with experience in the field. Diversity > in the Committee is also a way to promote fairness and independence in > their decisions. This means that no matter who you are, where you come > from, what you work on, or for how long, you are a potential good member of > this Committee. > > The main requirements to join the Committee are a will to foster an open > and welcoming community and a commitment to making participation in > Wikimedia technical projects a respectful and harassment-free experience > for everyone. The committee will handle reports of unacceptable behavior, > will analyze the cases, and will resolve on them according to the Code of > Conduct. The Committee will also handle proposals to amend the Code of > Conduct for the purpose of increasing its efficiency. The term of this > first Committee will be one year. > > Once we have a list of 5 + 5 candidates, we will announce it here for > review. You can learn more about the Committee and its selection process at > https://www.mediawiki.org/wiki/Code_of_Conduct/Committee and you can ask > questions in the related Talk page (preferred) or here. > > You can also track the progress of this bootstrapping process at > https://www.mediawiki.org/wiki/Talk:Code_of_Conduct# > Bootstrapping_the_Code_of_Conduct_Committee > > PS: We have many technical spaces and reaching to all people potentially > interested is hard! Please help spreading this call. > > -- > Quim Gil > Engineering Community Manager @ Wikimedia Foundation > http://www.mediawiki.org/wiki/User:Qgil > -- Quim Gil Engineering Community Manager @ Wikimedia Foundation http://www.mediawiki.org/wiki/User:Qgil

7 years

Wikistats 2.0 Prototype

by Dan Andreescu

Hello! We've built an interactive prototype of the next version of Wikistats <https://analytics-prototype.wmflabs.org/> [1] based on community priorities and feedback. We'd love your input on the visual design and look & feel. We have some follow-up questions but mostly an open discussion here <https://www.mediawiki.org/wiki/Wikistats_2.0_Design_Project/RequestforFeedb…> [2]. Please comment by *Monday, May 1st* so we can include your feedback into the first release of Wikistats 2.0. Thank you very much! Your Wikistats 2.0 Team [1] https://analytics-prototype.wmflabs.org/ [2] https://www.mediawiki.org/wiki/Wikistats_2.0_Design_Project/RequestforFeedb… p.s. apologies for cross-posting

7 years

Research Showcase April 19, 2017

by Sarah R

Hi Everyone, The next Research Showcase will be live-streamed this Wednesday, April 19, 2017 at 11:30 AM (PST) 18:30 UTC. YouTube stream: https://www.youtube.com/watch?v=_Prf0Vb-k1I As usual, you can join the conversation on IRC at #wikimedia-research. And, you can watch our past research showcases here <https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#April_2017>. This month's presentations: Using WikiBrain to visualize Wikipedia's neighborhoodsBy *Dr. Shilad Sen <https://www.mediawiki.org/wiki/User:Shilad>*While Wikipedia serves as the world's most widely reference for humans, it also represents the most widely use body of knowledge for algorithms that must reason about the world. I will provide an overview of WikiBrain, a software project that serves as a platform for Wikipedia-based algorithms. I will also demo a brand new system built on WikiBrain that visualizes any dataset as a topographic map whose neighborhoods correspond to related Wikipedia articles. I hope to get feedback about which directions for these tools are most useful to the Wikipedia research community. -- Sarah R. Rodlund Senior Project Coordinator-Product & Technology, Wikimedia Foundation srodlund(a)wikimedia.org

7 years, 1 month

Question regarding specific pageviews graph

by Gheorghe Postelnicu

Hello, First of all, thank you for providing such a wealth of information regarding Wikipedia usage. This is really interesting. I was browsing the pageviews visualization provided by wmflabs and noticed a particular case which surprised. The FR page for *Batman v Superman *seems to have received no visit whasoever before 4/29/2016: https://tools.wmflabs.org/pageviews/?project=fr.wikipedia.org&platform=all-… However, the page existed as early as 2014: https://fr.wikipedia.org/w/index.php?title=Batman_v_Superman_:_L%27Aube_de_… and similar pages in IT or ES show quite a lot of views: https://tools.wmflabs.org/pageviews/?project=it.wikipedia.org&platform=all-… https://tools.wmflabs.org/pageviews/?project=es.wikipedia.org&platform=all-… Any idea why the complete lack of pageviews for the FR page? Even if for some reason the page was not very popular in FR, shouldn't it have received at least some views? Many thanks in advance for any ideas, Gheorghe

7 years, 1 month

Fwd: [Mediawiki-api] Wikipedia most viewed pages data

by Anandroid Inc

Hi team, Can you help here? Is there any api to fetch most viewed wiki pages countrywise? For instance, most viewed pages in India, USA, UK etc. Thanks, Anand Sent from my iPhone Begin forwarded message: > From: Michael Holloway <mholloway(a)wikimedia.org> > Date: 7 April 2017 at 9:58:22 PM IST > To: Anandroid Inc <anandrdinc(a)gmail.com> > Cc: "MediaWiki API announcements & discussion" <mediawiki-api(a)lists.wikimedia.org> > Subject: Re: [Mediawiki-api] Wikipedia most viewed pages data > > Anand, > > I don't believe we track per-article pageview counts by country. However, it might be worth asking on analytics(a)lists.wikimedia.org to be sure, as they're the experts on our analytics infrastructure. > > Best, > Michael > >> On Fri, Apr 7, 2017 at 12:03 PM, Anandroid Inc <anandrdinc(a)gmail.com> wrote: >> Thanks Michael, this clears it. >> One last question is there a way to get the most viewed pages countrywise. >> For instance, most viewed pages in India, USA, UK etc. >> >> Thanks, >> Anand >> >>> On 07-Apr-2017, at 8:58 PM, Michael Holloway <mholloway(a)wikimedia.org> wrote: >>> >>> Whoops! Meant to reply to list. >>> >>>> On Fri, Apr 7, 2017 at 10:20 AM, Michael Holloway <mholloway(a)wikimedia.org> wrote: >>>> Hi Anand, >>>> >>>> It looks like the results from those two queries are from different days. The first set is from April 5 and the second is from April 6. For the results from April 5 in the REST API, you'll want https://en.wikipedia.org/api/rest_v1/feed/featured/2017/04/06. >>>> >>>> Also please note that the results from the Action API (api.php) aren't sorted in decreasing pageview order as the REST API's are. You'll need to sort them yourself. (Depending on the number of results you need, you should also familiarize yourself with query continuation as well as the result limits per request on the various modules you're using (for instance, 20 for TextExtracts and 50 for PageImages as indicated at the bottom of the results of your query)). >>>> >>>> After ensuring you have the same date and sorting the Action API results, you should have the same or very similar results. For instance, looking at the results from April 5, it looks like "Lake Nyos Disaster" is on top with 497638 pageviews in both sets. You still might see slight discrepancies between the two sources due to implementation details, however; for instance, in the REST API endpoint we use a heuristic to attempt to filter out pages with pageview counts likely inflated by bot traffic, and so a few pages here and there that appear in the Action API results wouldn't appear in the REST API list. >>>> >>>> Best, >>>> Michael >>>> >>>>> On Fri, Apr 7, 2017 at 6:02 AM, Anandroid Inc <anandrdinc(a)gmail.com> wrote: >>>>> Hi Team, >>>>> >>>>> Thanks a lot for your support and quick responses. >>>>> As per all the suggestions you gave for my query, I found below two apis which solves my purpose, >>>>> But I see both queries ask for todays most viewed pages on wikipedia but they are returning different results. >>>>> Request you to please help me on this. >>>>> >>>>> 1) >>>>> https://en.wikipedia.org/w/api.php?action=query&format=json&prop=extracts%7… >>>>> >>>>> 2) https://en.wikipedia.org/api/rest_v1/feed/featured/2017/04/07 >>>>> >>>>> >>>>> >>>>>> On 07-Apr-2017, at 3:43 AM, Adam Baso <abaso(a)wikimedia.org> wrote: >>>>>> >>>>>> For the action API I think you're looking for an extra property, pageviews. Maybe something like this: >>>>>> >>>>>> https://en.wikipedia.org/wiki/Special:ApiSandbox#action=query&format=json&p… >>>>>> >>>>>>> On Thu, Apr 6, 2017 at 4:44 PM, Anandroid Inc <anandrdinc(a)gmail.com> wrote: >>>>>>> ++Michael >>>>>>> >>>>>>>> On 07-Apr-2017, at 3:09 AM, Anandroid Inc <anandrdinc(a)gmail.com> wrote: >>>>>>>> >>>>>>>> Hi Michael, >>>>>>>> >>>>>>>> Thanks for your quick response. >>>>>>>> This is very helpful, exactly what I was looking for. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Anand >>>>>>>> >>>>>>>>> On 07-Apr-2017, at 12:08 AM, Michael Holloway <mholloway(a)wikimedia.org> wrote: >>>>>>>>> >>>>>>>>> Hi Anand, >>>>>>>>> >>>>>>>>> It sounds like the REST API's featured feed endpoint provides what you're looking for. >>>>>>>>> >>>>>>>>> For example: https://en.wikipedia.org/api/rest_v1/feed/featured/2017/04/06 (see the content under the "mostread" key). >>>>>>>>> >>>>>>>>> Under the hood, titles are obtained from the Pageview API and then supplementary information for our desired titles is obtained from the REST API's page summary endpoint. That would be the easiest way to go if you'd like to go in a slightly different direction from what's provided in our featured feed endpoint. >>>>>>>>> >>>>>>>>> If you'd like, you can view the implementation for the "mostread" section of our aggregated feed endpoint here: >>>>>>>>> >>>>>>>>> https://phabricator.wikimedia.org/diffusion/GMOA/browse/master/lib/feed/mos… >>>>>>>>> >>>>>>>>> Please note that as of now the REST API endpoints are officially designated as unstable. >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> Michael >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Thu, Apr 6, 2017 at 2:19 PM, Anandroid Inc <anandrdinc(a)gmail.com> wrote: >>>>>>>>>> Hi Team, >>>>>>>>>> >>>>>>>>>> Thanks for the great api provided for all the wiki info. >>>>>>>>>> I am looking for a combination of right parameters which will return me most viewed pages with the page url, article snippet, thumbnail image url. >>>>>>>>>> Hope you can help with this. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Anand >>>>>>>>>> _______________________________________________ >>>>>>>>>> Mediawiki-api mailing list >>>>>>>>>> Mediawiki-api(a)lists.wikimedia.org >>>>>>>>>> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Mediawiki-api mailing list >>>>>>>>> Mediawiki-api(a)lists.wikimedia.org >>>>>>>>> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api >>>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Mediawiki-api mailing list >>>>>>> Mediawiki-api(a)lists.wikimedia.org >>>>>>> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Mediawiki-api mailing list >>>>>> Mediawiki-api(a)lists.wikimedia.org >>>>>> https://lists.wikimedia.org/mailman/listinfo/mediawiki-api >>>>> >>>> >>> >> >

7 years, 1 month

Legacy Pagecounts available in API

by Nuria Ruiz

Hello! The analytics team would like to announce that legacy pagecounts are now available programatically in an API. "Pagecount" is the legacy definition of what we now call "pageview". Pagecounts agreggated per project are available on API endpoint since January 2008 to December 2016. The main difference among pagecounts and the current pageview data is lack of filtering of self-reported bots, thus automated and human traffic are reported together. You can access data overall but also mobile/desktop split. Note that -at this time- we still do not have pagecount data per article, thus far we have loaded only per-project legacy data. More info and examples of how to query the API can be found here: https://wikitech.wikimedia.org/wiki/Analytics/AQS/Legacy_Pagecounts Thanks to Thomas Steiner for promptly updating the pageviews.js node client to be able to access pagecounts. Javascript client can be found here: https://github.com/tomayac/pageviews.js A glimpse of this data: https://analytics.wikimedia.org/dashboards/reportcard/#pagecounts-dec-2007-… Thanks, Nuria

7 years, 1 month

Short Hive, Oozie, Druid & Pivot downtime Monday April 17th

by Andrew Otto

Hi all! As part of our Hadoop Cluster Debian Jessie upgrade, we need to reinstall the server that acts as a metadata state store for Hive, Oozie and Druid. To be safe, we plan to take those services plus Pivot (which runs on Druid) offline during while we reinstall this server. We plan to do this on Monday April 17th at 13:30 UTC (9:30am US East Coast, 6:30am US West Coast). We don’t expect more than an hour or two of downtime, but likely less. Let us know if there are any objections. Please forward this email to any relevant folks. Thanks! -Andrew and Luca, your Analytics Opsen.

7 years, 1 month

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Analytics April 2017