Adding analytics list to make sure everyone in the team sees this thread.
>Is there a way to visualize where editors of a specific page come from?
I assume you mean if this data is available to the general public. By
reading the couple links you posted seems like the consensus was that this
data is too private to be made public and thus, it is only accessible
inside WMF or to users with CheckUser rights.
As far as we know nothing has changed in this regard so this type of
"geo-data" is not available to general public.
On Sun, Apr 13, 2014 at 8:15 PM, Federico Leva (Nemo) <nemowiki(a)gmail.com>wrote:
> Recent discussion on the topic:
> * http://lists.wikimedia.org/pipermail/analytics/2013-
> * http://thread.gmane.org/gmane.org.wikimedia.analytics/103
> Wikitech-l mailing list
We had our quarterly review with WMF management last week. The minutes
are posted up on meta along with the deck we presented. (Thank you to
Tilman for taking the minutes and helping post the slides)
Please take a look at the deck and let me know if you have any questions.
In particular, I'd like to highlight our reprioritization of our
projects. We continue to focus on our Editor Engagement Vital Signs project
and have added a couple of new projects, including taking over the Event
Logging system from the Platform Team.
I want to call out the Page View API project specifically. Everyone on the
team wants to work on this but we have prioritized other projects ahead of
it. While this is challenging for everyone, Editor Growth remains the
priority for the Foundation and Analytics needs to support this initiative.
In the meantime, we have worked with Henrik, the maintainer of
stats.grok.seto help scale out this service. We've purchased a new
machine and the
initial performance numbers are very encouraging. We'll have more updates
on this shortly.
In conclusion, now that the team is fully staffed, I'll have more time to
communicate about our projects and how they will interact with the
community. I'm looking forward to it :)
I recently read this post by a researcher at Quora, about determining
relationships between content. While they most likely have a lot more
personally identifiable information about their users than we do, some of
the concepts might be applicable.
*Jared Zimmerman * \\ Director of User Experience \\ Wikimedia Foundation
M : +1 415 609 4043 | : @JaredZimmerman<https://twitter.com/JaredZimmerman>
EventLogging Schemas are in JSON, and I hate hand-writing JSON. It
feels rather painful. So I whipped up this little python script
(requires pyyaml) that will let me write the actual schema in YAML,
and convert it trivially to JSON for copy pasting into metawiki. You
can find it at https://gist.github.com/yuvipanda/10481205
My workflow now is:
1. Write YAML file describing the schema
2. run `python yaml-to-json.py <yaml-file-name> | pbcopy`
3. Paste that into appropriate metawiki page
(2) and (3) can be further consolidated if we want, but that is for
another day (and I had to write only 4 schemas from scratch, so this
was useful enough).
Hope someone finds it useful!
Yuvi Panda T
We finished our sprint on Tuesday and made plans for the next one on
Thursday and I wanted to let you know the updates.
I will also provide updates on our epics and our quarterly review in a
We finished two tasks committed to for this sprint:
- Puppet allows wikimetrics user to write files (WikiMetrics)
- Reports results can be made public in vagrant (WikiMetrics)
We (technically ops) also got our Archiva deployment server up and running.
This is a significant step towards a production Hadoop/Kafka environment
and will also be used by other JVM-based tools such as Search.
We did not finish one task that we committed to:
- UI Changes for Recurrent and Public Reports (WikiMetrics)
We also finished a numbers of unplanned tasks:
- Camus and Kraken review (Hadoop/Kafka)
- Changing Kraken to Apache for Camus folks (Hadoop/Kafka)
- User Agent discussions (EventLogging)
- Look at X-Analytics change (Wikipedia Zero)
- Meeting with grants team wikimetrics consulting (WikiMetrics)
- Flake8 work due to upgrade in Jenkins (WikiMetrics)
- README changes (WikiMetrics)
- Reworking limn-mobile-data patch from Jan (Limn)
- Repair data and run the 3 reports for the Mobile team (Mobile Metrics)
- Put the results in files and send to mobile-web team (Mobile Metrics)
We fixed the following defects:
- 62922 Wikipedia Zero: Doubled zero tags in varnish logs 
- 57371 Limn: SSL-Error for https at
- 62830 WikimetricsNew reports not working; New cohorts all invalid 
For this sprint, we committed to the following tasks to complete two
features for WikiMetrics: Publicly Sharable Report Results and Scheduled
- Verify and apply script to migrate report table
- Finish testing 112165 + puppet changes on staging
- Unit test concatenated recurrent public reports
- Code review 112165
- Code review 122638
- Test in vagrant
- Test in staging
Please let me know if you have any questions.
Minutes and slides from Monday's quarterly review meeting of the
Foundation's Analytics team are now available at
On Wed, Dec 19, 2012 at 6:49 PM, Erik Moeller <erik(a)wikimedia.org> wrote:
> Hi folks,
> to increase accountability and create more opportunities for course
> corrections and resourcing adjustments as necessary, Sue's asked me
> and Howie Fung to set up a quarterly project evaluation process,
> starting with our highest priority initiatives. These are, according
> to Sue's narrowing focus recommendations which were approved by the
> Board :
> - Visual Editor
> - Mobile (mobile contributions + Wikipedia Zero)
> - Editor Engagement (also known as the E2 and E3 teams)
> - Funds Dissemination Committe and expanded grant-making capacity
> I'm proposing the following initial schedule:
> - Editor Engagement Experiments
> - Visual Editor
> - Mobile (Contribs + Zero)
> - Editor Engagement Features (Echo, Flow projects)
> - Funds Dissemination Committee
> We'll try doing this on the same day or adjacent to the monthly
> metrics meetings , since the team(s) will give a presentation on
> their recent progress, which will help set some context that would
> otherwise need to be covered in the quarterly review itself. This will
> also create open opportunities for feedback and questions.
> My goal is to do this in a manner where even though the quarterly
> review meetings themselves are internal, the outcomes are captured as
> meeting minutes and shared publicly, which is why I'm starting this
> discussion on a public list as well. I've created a wiki page here
> which we can use to discuss the concept further:
> The internal review will, at minimum, include:
> Sue Gardner
> Howie Fung
> Team members and relevant director(s)
> Designated minute-taker
> So for example, for Visual Editor, the review team would be the Visual
> Editor / Parsoid teams, Sue, me, Howie, Terry, and a minute-taker.
> I imagine the structure of the review roughly as follows, with a
> duration of about 2 1/2 hours divided into 25-30 minute blocks:
> - Brief team intro and recap of team's activities through the quarter,
> compared with goals
> - Drill into goals and targets: Did we achieve what we said we would?
> - Review of challenges, blockers and successes
> - Discussion of proposed changes (e.g. resourcing, targets) and other
> action items
> - Buffer time, debriefing
> Once again, the primary purpose of these reviews is to create improved
> structures for internal accountability, escalation points in cases
> where serious changes are necessary, and transparency to the world.
> In addition to these priority initiatives, my recommendation would be
> to conduct quarterly reviews for any activity that requires more than
> a set amount of resources (people/dollars). These additional reviews
> may however be conducted in a more lightweight manner and internally
> to the departments. We're slowly getting into that habit in
> As we pilot this process, the format of the high priority reviews can
> help inform and support reviews across the organization.
> Feedback and questions are appreciated.
> All best,
>  https://wikimediafoundation.org/wiki/Vote:Narrowing_Focus
>  https://meta.wikimedia.org/wiki/Metrics_and_activities_meetings
> Erik Möller
> VP of Engineering and Product Development, Wikimedia Foundation
> Support Free Knowledge: https://wikimediafoundation.org/wiki/Donate
> Wikimedia-l mailing list
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
Senior Operations Analyst (Movement Communications)
IRC (Freenode): HaeB
Hi Niklas -- I have no idea. Forwarding to the general list.
On Tue, Apr 1, 2014 at 2:10 AM, Niklas Laxström <nlaxstrom(a)wikimedia.org>wrote:
> While trying to figure out whether I can accomplish the following ,
> I noticed that user_properties table is not available in the toollabs
> database replicas.
>  Gather daily statistics of the numbers of users who have activated
> beta feature X, for multiple wikis, possibly excluding users who
> auto-roll to all beta features.
> Is there a way to access the user_properties table? Or is there
> someone already gathering this kind of stats?
For your information. The thread about our Gerrit review queue metrics is
continuing in wikitech-l under
---------- Forwarded message ----------
From: *Quim Gil* <qgil(a)wikimedia.org>
Date: Thursday, April 3, 2014
Subject: Data to improve our code review queue
To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
Please have a look to some graphs visualizing interesting data from our
code review queues in Gerrit, focusing in the key Wikimedia software
The queue of open chagesets keeps growing. We have open changesets
submitted in every month since March 2012. However, since last December we
must be doing something right, because the median times to update and
resolve submissions are decreasing.
Looking at http://korma.wmflabs.org/browser/scr.html , one reason for this
improvement might be that the volume of new changesets has also decreased
during the same period. Maybe newer patches get faster reviews? Any ideas?
We need to dig further.
We have created a "hall of shame" (add you preferred smiley here) to bring
under the light the repositories with the open changesets that haven't seen
any activity for a longer period. The principle is simple: you don't want
to see one of your repos appearing in the top 10.
Many of the _leading_ repos have a couple of open changesets only, and our
hope is that by showing up there, the maintainers will act on them quickly
(e.g. OpenStackManager, fluoride, commons, UserMerge, TorBlock, Vipscaler,
luasandbox...). This will leave the fight for the pole position to the
projects that actually have a real problem dealing with patches received
(Donationinterface, GuidedTour, UploadWizard...)
Who knows, perhaps we should organize "patch days", in the same way that we
have organized bug days in the past (which we want to recover now). We also
want to look at ways promote the oldest inactive requests. For instance,
what about directing new volunteers there, asking them to submit their code
revisions. For a patch that has been waiting in silence for over a year,
any feedback will be better than no feedback.
One last detail. Our initial motivation to look at the age of open
changesets by affiliation was to check whether submissions from WMF
employees and independent developers were treated equally. Interestingly,
there are no big differences between these groups. However, there are big
differences between the median age of open WMDE changesets (16.5 days) and
open Wikia changesets (almost 283 days). All this according to our
estimation of the origin of patches (domain of the submitter's email +
affiliation submitted by the developers that filled our survey.
Your feedback about these metrics is welcome. Please reply here or file
Bugzilla reports directly to Analytics > Tech Community metrics
(Short link just in case: http://bit.ly/1q0itsl )
Engineering Community Manager @ Wikimedia Foundation
Engineering Community Manager @ Wikimedia Foundation
In another thread Oliver asked about the progress of One Machine To Rule
Them All :-)
In fact it looks like it will now be two machines to rule them all, or
rather, two machines to cooperatively rule them all in roughly equal
capacity. I know, it doesn't have the same ring to it...
I posted the following to RT 6383, but who knows who reads RT, so here it
-- quote --
An update on this. Some Analytics folk have probably already heard bits and
pieces via mailing lists, but my fellow Opsen on RT duty rightly begin to
wonder about this ticket.
We have procured dbstore100, both with which will be replicating shards
S[1-7] into a single MariaDB 10 instance each, using the new multi-source
replication. The boxes are still being setup (because recombining the
shards requires full dump/reload, plus getting all seven in sync, plus
compressing tables -- slow going). The x1 shard and event logging will
replicate to dbstore too, but that's pending RT 7081. Analytics will have
direct, but read-only, access to dbstore1002.
db1047, the current s1-analytics-slave, has the required disk space so it
will likely become a slave to dbstore1002, or else make use of the MariaDB
10 CONNECT engine to access the data (like federation, but better than
FEDERATED engine was, thanks to ECP: engine-condition-pushdown). As ever,
Analytics will have read/write access to db1047 with scratch space.
The situation will result in:
- cross joins/unions on any wiki on either db1047 or dbstore1002
- ability to spread load across both boxes with a single SQL query
- less likely to block others due to locking
- less likely to cause replag
I'm happy to go into more technical detail if anyone is interested.
When will it be ready, you ask? :-) Not until after the Ops meet in
Athens, which realistically means: in May.
-- endquote --
The bit about spreading load across two machines with one query will
require people to be a bit careful in designing the SQL. Alternatively you
guys might simply choose to dictate which box should run expensive queries,
to avoid tripping each other up.
Incidentally, MariaDB 10 has the Cassandra storage engine which might be of
some use to you guys in time. But so far I've only been trialing
DBA @ WMF