Heja Analytics crew,
Wikimedia Bugzilla has an "analytics" keyword (apart from the Analytics
See https://bugzilla.wikimedia.org/buglist.cgi?keywords=analytics for
its 67 open tickets currently.
As we're slowly planning for the post-Bugzilla shiny new Phabricator
world: Does anybody actually ever look at tickets with that keyword, or
can we realistically just drop it (if nobody ever does)?
Andre Klapper | Wikimedia Bugwrangler
Thanks for this. Forwarding to Analytics and Research for others who are
On Tue, Jul 15, 2014 at 9:29 AM, Rachel Farrand <rfarrand(a)wikimedia.org>
> This Tech Talk will be starting in 30 minuets. Thanks!
> On Fri, Jul 11, 2014 at 3:30 PM, Rachel Farrand <rfarrand(a)wikimedia.org>
> > Hello!
> > Please join Nuria Ruiz and Andrew Otto next Tuesday, July 15th at 10am SF
> > time/5pm UTC
> > <
> > for a 30 min tech talk. You can join our hangout or follow along on
> > youtube:
> > (please note that a link to join the hangout will be posted in the
> > of this event just as it starts).
> > You can follow ask questions on IRC during the talk in #wikimedia-dev.
> > If you are not able to follow along live, a video recording will be
> > here
> > <
> > to the MediaWiki YouTube channel immediately following the tech talk for
> > you to view at any time.
> > More information about the tech talk:
> > *Hadoop and Beyond. An overview of Analytics infrastructure*In this tech
> > talk we will be presenting the analytics infrastructure that we have
> > recently rolled out in production. By now probably everybody knows that
> > wikimedia hosts an instance of hadoop from which we are going to extract
> > pageview data in the near future. But .. how exactly does the data get
> > there?
> > We will go over the path that webrequest log data takes from varnish to
> > kafka (a distributed log buffer) to hadoop and the challenges of
> > this java-based infrastructure in production. We will also talk about how
> > can we query the data with hive, an SQL-like interface. How can you set
> > this stack on vagrant to play with and, last but not least, how we used
> > hive recently to provide GLAM folks with image view stats:
> > Thanks!
> Wikitech-l mailing list
Dear Analytics Team,
As discussed with Toby, Dario and Aaron, we would be grateful for your guidance to collect user preference data for Media Viewer (1), which is being challenged in RfCs on English (2) and German (3) Wikipedias, as well as on Wikimedia Commons (4).
The English RfC closed last week, requesting that we disable Media Viewer right away. The multimedia team, in consultation with community and legal teams, is not comfortable disabling the feature on the basis of this small RfC, which we believe does not accurately represent the views of the millions of users whom we serve.
To address this issue, we would like to conduct a wider outreach to collect more accurate data about user preferences than either this RfC or our optional surveys can provide. We propose to develop a more prominent viewing options panel (5) that would make it very easy to switch quickly to your preferred viewing mode, as shown in this prototype (6). Right after launch of this new feature, all users would be shown this prominent panel and asked to select their favorite viewing option. Once we have collected and analyzed that data over a period of a month or two, we would be able to make more informed decisions with our community on whether or not to keep the feature enabled -- based on actual responses from all users, rather than speculation.
With your help, we have already developed a basic dashboard that tracks Media Viewer opt-in/out events (8), based on clicks on the less prominent ‘Disable’ feature at the bottom of the metadata panel. We propose to modify this dashboard to support this new initiative, as outlined in this ticket (7), with these changes:
a) aim to track total users who enable/disable Media Viewer, rather than just events
b) switch to a 3-state preference setting: enabled / disabled / default
c) try to measure the total number of users in each group (instead of daily events)
Also note that most users would start in the default state, showing them Media Viewer, since it is enabled by default — but after 10 or so image views, the viewing options panel would appear automatically, asking them to either enable or disable the feature. After they have made their selection, the panel would remain accessible (but much less prominent) and they can still switch state (but can't go back to default).
We could use your guidance on the right way of logging total logged-out users, so we can use them as a basis for percentages: should we log enabled/disabled state on some specified event/action: page load? thumbnail click? first site visit of the day? — or do you recommend another method? We would want to collect this data for a month or two, so we don’t only capture the immediate responses from the most active users, but also those of less frequent visitors.
We would also appreciate your comments on the specific tickets (5) and (6), with any recommendations for improvement. Keep in mind that we would like to respond quickly to our community’s concerns and have limited resources, so we would prefer to get this work done in the next week or two. And it will take a couple months to develop, release and collect enough data -- so even if we start tomorrow, we may not have conclusive data to share until September. So time is of the essence. :)
Gergo is spearheading this project, and may have more technical questions for you. But I wanted to give an overview from a product perspective, before we dive in to the implementation details. Once we figure out a practical way to implement this request together, we will present it to community members to discuss its impact on the RfC, and agree on acceptance criteria (e.g. keep Media Viewer enabled by default unless a 51% majority of users disables the feature on Enwiki by September 30?).
Thanks in advance for your advice. We look forward to working with you soon on this important project, which could impact other features now in development.
All the best,
Product Manager, Multimedia
Starting December 2013, Research and Data has had eight showcases. We
would like to hear your feedback about them through this survey*:
The deadline for filling out the survey is Wednesday, July 30. We will
present the results in the August showcase.
Thanks in advance for your participation.
* Thanks to Abbey Ripstra for designing this survey for us. :-)
The next Research & Data Showcase will be live-streamed this Wednesday,
7/16 at 11.30 PT.
The streaming link will be posted on the lists a few minutes before the
showcase starts and as usual, you can join the conversation on IRC at #
We look forward to seeing you!
*Night Terrors: Day and Night Cycles in Reader and Editor Behaviour*By
Oliver Keyes: Using new geolocation tools, we look at reader and editor
behaviour to understand how and when people access and contribute to our
content. This is largely exploratory research, but has potential
implications for our A/B testing and how we understand both cultural
divides between reader and editor groups from different countries, and how
we understand the transition of people from consumers to contributors.
*Using Open Data and Stories to Broaden Crowd Content*By Nathan Matias*:
Nathan will share a series of research on gender diversity online and
designs for collaborative content creation that foster learning and
community. He will also demo a prototype for a system that could leverage
open data to attract and support new Wikipedia contributors.
*Bio: Nathan Matias, who does research on cooperation across diversity, is
a PhD student at the MIT Center for Civic Media and Fellow at the Berkman
Center for Internet and Society, where he co-facilitates the Cooperation
working group. He also facilitates #1book140, The Atlantic's Twitter Book
(swapping internal by public list)
On Wed, Jul 16, 2014 at 11:11 AM, Nuria Ruiz <nuria(a)wikimedia.org> wrote:
> Following up on sean's initial e-mail regarding size of
> UniversalLanguageSelector-tofu_7629564 table.
> Per Amir's request we have created a table in staging database with just
> two weeks of data to be able to determine font support.
> Please let us know if we can delete UniversalLanguageSelector-tofu table
> on the main database. The size of that table is about 100G by now.
> We are tracking this work as part of this bug:
The analytics engineering team is now using ScrumBugs to manage and track
it's commitments for every Sprint (2 week iterations).
Our current sprint started Thursday June 26 and ends Tuesday July 8th. You
can see the Stories (features) and bugs we are presently working on:
Note, the third pie chart "Story User" shows the beneficiaries of the
features being implemented.