> [Steven] Considering the pain and suffering Limn causes us, this seems like an interesting

> [Steven] avenue to explore for internal dashboard needs.

So true. It sure causes me pain and suffering seeing every js library known to mankind being used there. :)

We will definitely take a look at the tool on the dashboard research we are doing. By all means send us everything that catches your eye.

One of the major problems with limn is "dashboard discovery". That is not a visualization problem, but rather an information architecture one.Tessera does not aim to solve that either. Also our other "hard" problem not solved by Tessera currently is retrieval of data, we serve data from many datasources.

> [Steven] To back up... As a consumer of numerous dashboards and someone

> [Steven] who has to decide when/how to request creation of them

> [Steven] I care about getting a readable new dashboard set up and maintained to run indefinitely

> [Steven] with as little developer or researcher time as possible.

Understood. Anything we come up with we will run with other developers to make sure it is easy to use by them. We have enlisted Yuvi as our Guinea Pig in residence.

Now, to set expectations right, our first dashboard is only to going to solve the issues regarding editor metrics while setting up groundwork others can benefit from to roll out new visualizations. That's it. We are not solving the whole dashboard problem quite yet.

Please take a look at the prototype of the editor vital signs dashboard as that makes the point of what is what we are doing in the near term:

http://pauginer.github.io/prototypes/analytics-dashboard/index.html

>[Dario] I think the best investment of our time would be to:

>[Dario] (1) give Wikimetrics and EventLogging a standard interface to plug the data and metadata into any arbitrary >[Dario] dashboard/visualization frontend – whether custom-built, off-the-shelf or even hosted

>[Dario] (2) start solving the visualization problem incrementally, moving from the most urgent customer needs and

>[Dario] evaluating visualization solutions against these priorities.

We are doing #2 now, but only for editor vital signs metrics using public data. Data will be available for anyone to graph served via wikimetrics as json files. Temporary data is available now in staging:

https://metrics-staging.wmflabs.org/static/public/datafiles/NewlyRegistered/

As for Event Logging is still private data, that might be changing mid term but not short term.

On Thu, Jul 10, 2014 at 3:56 PM, Dario Taraborelli <dtaraborelli@wikimedia.org> wrote:

Much as I love the idea of adding charting capability in MediaWiki (especially if it were to be integrated with a data namespace and version controlled JSON annotations) – I agree with Steven that this seems to solve a different problem.

The biggest pain points of using Limn to me (on top of the usability issues mentioned in this thread [1]) are its poor information architecture and its limited support for data documentation/metadata. We know that it’s hard at the moment for people to find the data they are looking for or to be able to navigate in an intuitive form a large set of dashboards. For example: the first metric we modeled for the vital signs project (newly registered users), when combined with a single breakdown by platform (desktop site, mobile site, apps), would result in ~2.5K data series. I can’t quite figure out how these series would look like and be discoverable on Limn.

I think the best investment of our time would be to:

(1) give Wikimetrics and EventLogging a standard interface to plug the data and metadata into any arbitrary dashboard/visualization frontend – whether custom-built, off-the-shelf or even hosted

(2) start solving the visualization problem incrementally, moving from the most urgent customer needs and evaluating visualization solutions against these priorities.

That would give us ample time to bring data (and immediate value) to the users, while testing the best approach for visualizing it and supporting more sophisticated requirements for presenting and rendering the data (we could abandon the first frontend when it stops serving our needs and migrate to something more sophisticated).

I like the look and feel of Tessera and the fact that it can easily consume Graphite data, but I share Dan’s concerns about storage.
Dan, I think it would be valuable to put your thoughts on a wiki page, if you have bandwidth to do so.

Dario

[1] I also want to add that whatever solution we settle on, it needs to be mobile friendly.

On Jul 9, 2014, at 11:55 PM, Dan Andreescu <dandreescu@wikimedia.org> wrote:

On Wed, Jul 9, 2014 at 4:23 PM, Steven Walling <swalling@wikimedia.org> wrote:

On Wed, Jul 9, 2014 at 1:01 PM, Dan Andreescu <dandreescu@wikimedia.org> wrote:

By the way, if this at all sounds like I'm proposing a "new" monster codebase, that is not at all the case. Most of the hard problems will be out-sourced to promising projects. Like Vega is in the top running to handle the visualizations themselves and the dashboarding around it will be very simplistic but solve problems we've encountered with Limn. But again, very early days.

Yeah to be honest I'm pretty skeptical of such a plan.

To back up... As a consumer of numerous dashboards and someone who has to decide when/how to request creation of them, I care about getting a readable new dashboard set up and maintained to run indefinitely with as little developer or researcher time as possible.

Agreed, Limn fails at this pretty miserably, and it's definitely one of our top problems to solve.

The main problem with Limn is that to set up a suite of dashboards takes a very large initial investment.

There are many other problems, a few relevant examples: discovery of dashboards, documentation of visualization capabilities, lack of annotations, ease of contributing to the code base

I'm not really sure how shoehorning a dashboard service on top of MediaWiki really solves this problem better than just setting up one of the many existing solutions out there. I don't care about transparent versioning and authentication, which seems to be the two things that MediaWiki is really good at in this context.

I'm not sure this is true. You may not care about it, but storage needs to happen, and I'd rather outsource that problem. Limn's idea of using file-backed storage made it very inefficient and clumsy to work with. A custom database, like Tessera is using, is much better but also requires someone to maintain it and manage access, etc. So more ops burden but less up-front development. And the definitions would be "further" from our community. Meaning, for example, if someone defaces a graph, we'd have to build a "watch this page" mechanism to help us deal with it. I started where you're starting with Tesera and as I thought of these problems I slowly migrated to Mediawiki. But I'll try to explain below why I don't think this is a big undertaking at all. MediaWiki is really easy to use as a service.

Building a custom tool from scratch is also part of what got us in this mess with Limn to begin with.

I see that I have caused a bit of a misunderstanding. So, Limn is well over 10,000 lines of Coco. This is a dense language that transpiles to roughly 20,000 lines of Javascript. The tool I'm proposing here is basically ignoring 90% of the problems that Limn dealt with. Visualization is the main problem, and that is solved by Vega JS [1]. Of the remaining problems, we're ignoring about half of them by making this server-less. So let's examine the points I made above:

* Getting a Dashboard up Quickly. EventLogging is well liked, so I figured if we did something simpler than that, we'd be starting off on the right foot. A dashboard could be a simple JSON document on mediawiki, rendered the way EventLogging schemas are. A page called Dashboard:Growth could have something like { graphs: [ {name: 'example', data-url: 'http://.../'} ] }. This would be viewable at http://dashiki.wmflabs.org/dashboards/growth. The JSON could be created from the dashboarding tool itself, but we can start bare-bones. If this seems weird or not as fast as you'd like, please describe an ideal scenario and let's talk about it.

* Dashboard discovery. Pau is designing the beginning of a solution to this. Tessera says they have not dealt with this yet, and I don't see how this is a generic problem (but I'd be glad to be wrong). But this is not some giant project that we'd have to implement. It would organize the data available in a friendly intuitive way, 'cause Pau is amazing as we all know. If there's a generic solution out there for this, I haven't found it.

* Documentation of visualization capabilities. This is very well documented on the Vega wiki, so it should take almost no effort if we integrate it well. Most other tools we've tried to use are too limiting. Timeseries only, maps only, etc. I think we would end up stitching a few out-of-the-box solutions together and that seems more headache than it's worth.

* Lack of annotations. I think it makes sense to store annotations in MediaWiki, in a JSON document that's tied to a datafile. This way everyone using that datafile can share the annotations, and anyone interested in the history of the document can take advantage of MediaWiki's revision history. In most other solutions I've seen, annotation is not a social activity, but something that researchers do to explain their data. In our case, I've heard many many people ask for something much richer when they talk about annotations.

* Ease of contributing to the code base. I don't like big code bases. If a simple dashboard layout built around getting metadata from mediawiki via json and rendering Vega graphs gets anywhere *near* Limn's size and incomprehensibility, you can burn me at the stake. I'll light the fire. The other thing about Coco is that like 100 people in the world can read it so this would be purely Javascript. And in gerrit.

[1] trifacta.github.io/vega/

p.s. Seriously, I'm not trying to reinvent the wheel here, please let me know if you think out-of-the-box solves more problems than I think. The whole idea behind this project is to offload as much as possible, I have a miles-long backlog I can busy myself with, so I don't need to invent work. And all of what I said above I'm making up on the spot, so I'm thinking out loud much more than trying to sell a solution. Would a wiki page explaining the pros and cons of this decision be a good use of our time?

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics