Analytics July 2014

analytics@lists.wikimedia.org

25 participants
32 discussions

Editor Engagement Vital Signs Dashboard
by Nuria Ruiz 24 Jul '14

24 Jul '14

(sending to public analytics list plus people with whom we have talked about dashboard technologies in the past) Team: As you known we are building a dashboard to showcase editor engagement metrics and to explore replacement of our current dashboarding technology. We have spent time researching tech options for our initial prototype and have decided on our initial tech stack. Technical criteria plus choices are documented here: https://www.mediawiki.org/wiki/Analytics/Editor_Engagement_Vital_Signs/Dash… The set of metrics we are working on (in parallel) can be found here: https://www.mediawiki.org/wiki/Analytics/Editor_Engagement_Vital_Signs Thanks, Nuria

1 0

"analytics" keyword in Bugzilla: Ever looked at?
by Andre Klapper 23 Jul '14

23 Jul '14

Heja Analytics crew, Wikimedia Bugzilla has an "analytics" keyword (apart from the Analytics product). See https://bugzilla.wikimedia.org/buglist.cgi?keywords=analytics for its 67 open tickets currently. As we're slowly planning for the post-Bugzilla shiny new Phabricator world: Does anybody actually ever look at tickets with that keyword, or can we realistically just drop it (if nobody ever does)? Thank you, andre -- Andre Klapper | Wikimedia Bugwrangler http://blogs.gnome.org/aklapper/

3 4

Analytics Dev Team Showcase
by Kevin Leduc 23 Jul '14

23 Jul '14

The dev team completed its 2 week sprint today and showcased its progress. The slides from the showcase are here: https://docs.google.com/presentation/d/1UtQpfgHW-kIeaeHE_RB_W16ZD3qWvQfzVi0… Following our planning session this coming Thursday, the team will announce the user stories the team will commit to completing for the next sprint.

1 0

Re: [Analytics] [Wikitech-l] Tech Talk: Hadoop and Beyond. An overview of Analytics infrastructure, Tuesday!
by Pine W 18 Jul '14

18 Jul '14

Thanks for this. Forwarding to Analytics and Research for others who are curious. Pine On Tue, Jul 15, 2014 at 9:29 AM, Rachel Farrand <rfarrand(a)wikimedia.org> wrote: > This Tech Talk will be starting in 30 minuets. Thanks! > > > On Fri, Jul 11, 2014 at 3:30 PM, Rachel Farrand <rfarrand(a)wikimedia.org> > wrote: > > > Hello! > > > > Please join Nuria Ruiz and Andrew Otto next Tuesday, July 15th at 10am SF > > time/5pm UTC > > < > http://www.timeanddate.com/worldclock/fixedtime.html?msg=Analytics+Tech+Tal… > > > > for a 30 min tech talk. You can join our hangout or follow along on > > youtube: > > > https://plus.google.com/u/0/b/103470172168784626509/events/c53ho5esd0luccd0… > > (please note that a link to join the hangout will be posted in the > comments > > of this event just as it starts). > > > > You can follow ask questions on IRC during the talk in #wikimedia-dev. > > > > If you are not able to follow along live, a video recording will be > posted > > here > > < > https://plus.google.com/u/0/b/103470172168784626509/103470172168784626509/v… > >, > > to the MediaWiki YouTube channel immediately following the tech talk for > > you to view at any time. > > > > More information about the tech talk: > > > > *Hadoop and Beyond. An overview of Analytics infrastructure*In this tech > > talk we will be presenting the analytics infrastructure that we have > > recently rolled out in production. By now probably everybody knows that > > wikimedia hosts an instance of hadoop from which we are going to extract > > pageview data in the near future. But .. how exactly does the data get > > there? > > > > We will go over the path that webrequest log data takes from varnish to > > kafka (a distributed log buffer) to hadoop and the challenges of > deploying > > this java-based infrastructure in production. We will also talk about how > > can we query the data with hive, an SQL-like interface. How can you set > up > > this stack on vagrant to play with and, last but not least, how we used > > hive recently to provide GLAM folks with image view stats: > > > https://commons.wikimedia.org/wiki/Commons:GLAMwiki_Toolset_Project/NARA_an… > > > > Thanks! > > > > > _______________________________________________ > Wikitech-l mailing list > Wikitech-l(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l >

1 0

Media Viewer User Preference Data
by Fabrice Florin 18 Jul '14

18 Jul '14

Dear Analytics Team, As discussed with Toby, Dario and Aaron, we would be grateful for your guidance to collect user preference data for Media Viewer (1), which is being challenged in RfCs on English (2) and German (3) Wikipedias, as well as on Wikimedia Commons (4). The English RfC closed last week, requesting that we disable Media Viewer right away. The multimedia team, in consultation with community and legal teams, is not comfortable disabling the feature on the basis of this small RfC, which we believe does not accurately represent the views of the millions of users whom we serve. To address this issue, we would like to conduct a wider outreach to collect more accurate data about user preferences than either this RfC or our optional surveys can provide. We propose to develop a more prominent viewing options panel (5) that would make it very easy to switch quickly to your preferred viewing mode, as shown in this prototype (6). Right after launch of this new feature, all users would be shown this prominent panel and asked to select their favorite viewing option. Once we have collected and analyzed that data over a period of a month or two, we would be able to make more informed decisions with our community on whether or not to keep the feature enabled -- based on actual responses from all users, rather than speculation. With your help, we have already developed a basic dashboard that tracks Media Viewer opt-in/out events (8), based on clicks on the less prominent ‘Disable’ feature at the bottom of the metadata panel. We propose to modify this dashboard to support this new initiative, as outlined in this ticket (7), with these changes: a) aim to track total users who enable/disable Media Viewer, rather than just events b) switch to a 3-state preference setting: enabled / disabled / default c) try to measure the total number of users in each group (instead of daily events) Also note that most users would start in the default state, showing them Media Viewer, since it is enabled by default — but after 10 or so image views, the viewing options panel would appear automatically, asking them to either enable or disable the feature. After they have made their selection, the panel would remain accessible (but much less prominent) and they can still switch state (but can't go back to default). We could use your guidance on the right way of logging total logged-out users, so we can use them as a basis for percentages: should we log enabled/disabled state on some specified event/action: page load? thumbnail click? first site visit of the day? — or do you recommend another method? We would want to collect this data for a month or two, so we don’t only capture the immediate responses from the most active users, but also those of less frequent visitors. We would also appreciate your comments on the specific tickets (5) and (6), with any recommendations for improvement. Keep in mind that we would like to respond quickly to our community’s concerns and have limited resources, so we would prefer to get this work done in the next week or two. And it will take a couple months to develop, release and collect enough data -- so even if we start tomorrow, we may not have conclusive data to share until September. So time is of the essence. :) Gergo is spearheading this project, and may have more technical questions for you. But I wanted to give an overview from a product perspective, before we dive in to the implementation details. Once we figure out a practical way to implement this request together, we will present it to community members to discuss its impact on the RfC, and agree on acceptance criteria (e.g. keep Media Viewer enabled by default unless a 51% majority of users disables the feature on Enwiki by September 30?). Thanks in advance for your advice. We look forward to working with you soon on this important project, which could impact other features now in development. All the best, Fabrice (1) https://www.mediawiki.org/wiki/Multimedia/About_Media_Viewer (2) https://en.wikipedia.org/wiki/Wikipedia:Media_Viewer/June_2014_RfC (3) https://de.wikipedia.org/wiki/Wikipedia:Meinungsbilder/Medienbetrachter (4) https://commons.wikimedia.org/wiki/Commons:Requests_for_comment/Media_Viewe… (5) https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards/787 (6) http://pauginer.github.io/prototypes/media-viewer/desk/disabling-settings/i… (7) https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards/793 _______________________________ Fabrice Florin Product Manager, Multimedia Wikimedia Foundation https://www.mediawiki.org/wiki/User:Fabrice_Florin_(WMF)

11 15

Research and Data Showcase Survey
by Leila Zia 17 Jul '14

17 Jul '14

Hi all, Starting December 2013, Research and Data has had eight showcases. We would like to hear your feedback about them through this survey*: https://www.surveymonkey.com/s/ResearchandData The deadline for filling out the survey is Wednesday, July 30. We will present the results in the August showcase. Thanks in advance for your participation. Best, Leila <https://www.surveymonkey.com/s/ResearchandData> * Thanks to Abbey Ripstra for designing this survey for us. :-)

1 0

Monthly Research & Data Showcase this Wednesday
by Leila Zia 17 Jul '14

17 Jul '14

The next Research & Data Showcase will be live-streamed this Wednesday, 7/16 at 11.30 PT. The streaming link will be posted on the lists a few minutes before the showcase starts and as usual, you can join the conversation on IRC at # wikimedia-research. We look forward to seeing you! Leila This month: *Night Terrors: Day and Night Cycles in Reader and Editor Behaviour*By Oliver Keyes: Using new geolocation tools, we look at reader and editor behaviour to understand how and when people access and contribute to our content. This is largely exploratory research, but has potential implications for our A/B testing and how we understand both cultural divides between reader and editor groups from different countries, and how we understand the transition of people from consumers to contributors. *Using Open Data and Stories to Broaden Crowd Content*By Nathan Matias*: Nathan will share a series of research on gender diversity online and designs for collaborative content creation that foster learning and community. He will also demo a prototype for a system that could leverage open data to attract and support new Wikipedia contributors. *Bio: Nathan Matias, who does research on cooperation across diversity, is a PhD student at the MIT Center for Civic Media and Fellow at the Berkman Center for Internet and Society, where he co-facilitates the Cooperation working group. He also facilitates #1book140, The Atlantic's Twitter Book Club.

1 1

Re: [Analytics] UniversalLanguageSelector-tofu_7629564 table
by Nuria Ruiz 16 Jul '14

16 Jul '14

(swapping internal by public list) On Wed, Jul 16, 2014 at 11:11 AM, Nuria Ruiz <nuria(a)wikimedia.org> wrote: > Team: > > Following up on sean's initial e-mail regarding size of > UniversalLanguageSelector-tofu_7629564 table. > > Per Amir's request we have created a table in staging database with just > two weeks of data to be able to determine font support. > > > Please let us know if we can delete UniversalLanguageSelector-tofu table > on the main database. The size of that table is about 100G by now. > > We are tracking this work as part of this bug: > https://bugzilla.wikimedia.org/show_bug.cgi?id=67463 > > > Thanks, > > Nuria >

1 0

Hadoop and More. An overview of Analytics infrastructure
by Nuria Ruiz 16 Jul '14

16 Jul '14

Hello everyone, Just an FYI that we gave a talk yesterday about the hadoop infrastructure we have recently set up in production to receive and store pageview data. Talk is about 25 minutes long and recording is available here: https://plus.google.com/u/0/events/c53ho5esd0luccd09a1c30rlrmg Thanks, Nuria

1 0

Team's Sprint Commitments
by Kevin Leduc 16 Jul '14

16 Jul '14

Greetings, The analytics engineering team is now using ScrumBugs to manage and track it's commitments for every Sprint (2 week iterations). Our current sprint started Thursday June 26 and ends Tuesday July 8th. You can see the Stories (features) and bugs we are presently working on: http://sb.wmflabs.org/t/analytics-developers/2014-06-26/ Note, the third pie chart "Story User" shows the beneficiaries of the features being implemented.

3 4

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Analytics July 2014