On Wed, Jul 9, 2014 at 4:23 PM, Steven Walling swalling@wikimedia.org wrote:
On Wed, Jul 9, 2014 at 1:01 PM, Dan Andreescu dandreescu@wikimedia.org wrote:
By the way, if this at all sounds like I'm proposing a "new" monster codebase, that is not at all the case. Most of the hard problems will be out-sourced to promising projects. Like Vega is in the top running to handle the visualizations themselves and the dashboarding around it will be very simplistic but solve problems we've encountered with Limn. But again, very early days.
Yeah to be honest I'm pretty skeptical of such a plan.
To back up... As a consumer of numerous dashboards and someone who has to decide when/how to request creation of them, I care about getting a readable new dashboard set up and maintained to run indefinitely with as little developer or researcher time as possible.
Agreed, Limn fails at this pretty miserably, and it's definitely one of our top problems to solve.
The main problem with Limn is that to set up a suite of dashboards takes a
very large initial investment.
There are many other problems, a few relevant examples: discovery of dashboards, documentation of visualization capabilities, lack of annotations, ease of contributing to the code base
I'm not really sure how shoehorning a dashboard service on top of MediaWiki really solves this problem better than just setting up one of the many existing solutions out there. I don't care about transparent versioning and authentication, which seems to be the two things that MediaWiki is really good at in this context.
I'm not sure this is true. You may not care about it, but storage needs to happen, and I'd rather outsource that problem. Limn's idea of using file-backed storage made it very inefficient and clumsy to work with. A custom database, like Tessera is using, is much better but also requires someone to maintain it and manage access, etc. So more ops burden but less up-front development. And the definitions would be "further" from our community. Meaning, for example, if someone defaces a graph, we'd have to build a "watch this page" mechanism to help us deal with it. I started where you're starting with Tesera and as I thought of these problems I slowly migrated to Mediawiki. But I'll try to explain below why I don't think this is a big undertaking at all. MediaWiki is really easy to use as a service.
Building a custom tool from scratch is also part of what got us in this mess with Limn to begin with.
I see that I have caused a bit of a misunderstanding. So, Limn is well over 10,000 lines of Coco. This is a dense language that transpiles to roughly 20,000 lines of Javascript. The tool I'm proposing here is basically ignoring 90% of the problems that Limn dealt with. Visualization is the main problem, and that is solved by Vega JS [1]. Of the remaining problems, we're ignoring about half of them by making this server-less. So let's examine the points I made above:
* Getting a Dashboard up Quickly. EventLogging is well liked, so I figured if we did something simpler than that, we'd be starting off on the right foot. A dashboard could be a simple JSON document on mediawiki, rendered the way EventLogging schemas are. A page called Dashboard:Growth could have something like { graphs: [ {name: 'example', data-url: 'http://.../%27%7D ] }. This would be viewable at http://dashiki.wmflabs.org/dashboards/growth. The JSON could be created from the dashboarding tool itself, but we can start bare-bones. If this seems weird or not as fast as you'd like, please describe an ideal scenario and let's talk about it.
* Dashboard discovery. Pau is designing the beginning of a solution to this. Tessera says they have not dealt with this yet, and I don't see how this is a generic problem (but I'd be glad to be wrong). But this is not some giant project that we'd have to implement. It would organize the data available in a friendly intuitive way, 'cause Pau is amazing as we all know. If there's a generic solution out there for this, I haven't found it.
* Documentation of visualization capabilities. This is very well documented on the Vega wiki, so it should take almost no effort if we integrate it well. Most other tools we've tried to use are too limiting. Timeseries only, maps only, etc. I think we would end up stitching a few out-of-the-box solutions together and that seems more headache than it's worth.
* Lack of annotations. I think it makes sense to store annotations in MediaWiki, in a JSON document that's tied to a datafile. This way everyone using that datafile can share the annotations, and anyone interested in the history of the document can take advantage of MediaWiki's revision history. In most other solutions I've seen, annotation is not a social activity, but something that researchers do to explain their data. In our case, I've heard many many people ask for something much richer when they talk about annotations.
* Ease of contributing to the code base. I don't like big code bases. If a simple dashboard layout built around getting metadata from mediawiki via json and rendering Vega graphs gets anywhere *near* Limn's size and incomprehensibility, you can burn me at the stake. I'll light the fire. The other thing about Coco is that like 100 people in the world can read it so this would be purely Javascript. And in gerrit.
[1] trifacta.github.io/vega/
p.s. Seriously, I'm not trying to reinvent the wheel here, please let me know if you think out-of-the-box solves more problems than I think. The whole idea behind this project is to offload as much as possible, I have a miles-long backlog I can busy myself with, so I don't need to invent work. And all of what I said above I'm making up on the spot, so I'm thinking out loud much more than trying to sell a solution. Would a wiki page explaining the pros and cons of this decision be a good use of our time?