On Wed, Jul 9, 2014 at 4:23 PM, Steven Walling <swalling(a)wikimedia.org>
wrote:
On Wed, Jul 9, 2014 at 1:01 PM, Dan Andreescu <dandreescu(a)wikimedia.org>
wrote:
By the way, if this at all sounds like I'm
proposing a "new" monster
codebase, that is not at all the case. Most of the hard problems will be
out-sourced to promising projects. Like Vega is in the top running to
handle the visualizations themselves and the dashboarding around it will be
very simplistic but solve problems we've encountered with Limn. But again,
very early days.
Yeah to be honest I'm pretty skeptical of such a plan.
To back up... As a consumer of numerous dashboards and someone who has to
decide when/how to request creation of them, I care about getting a
readable new dashboard set up and maintained to run indefinitely with as
little developer or researcher time as possible.
Agreed, Limn fails at this pretty miserably, and it's definitely one of our
top problems to solve.
The main problem with Limn is that to set up a suite of dashboards takes a
very large initial investment.
There are many other problems, a few relevant examples: discovery of
dashboards, documentation of visualization capabilities, lack of
annotations, ease of contributing to the code base
I'm not really sure how shoehorning a dashboard
service on top of
MediaWiki really solves this problem better than just setting up one of the
many existing solutions out there. I don't care about transparent
versioning and authentication, which seems to be the two things that
MediaWiki is really good at in this context.
I'm not sure this is true. You may not care about it, but storage needs to
happen, and I'd rather outsource that problem. Limn's idea of using
file-backed storage made it very inefficient and clumsy to work with. A
custom database, like Tessera is using, is much better but also requires
someone to maintain it and manage access, etc. So more ops burden but less
up-front development. And the definitions would be "further" from our
community. Meaning, for example, if someone defaces a graph, we'd have to
build a "watch this page" mechanism to help us deal with it. I started
where you're starting with Tesera and as I thought of these problems I
slowly migrated to Mediawiki. But I'll try to explain below why I don't
think this is a big undertaking at all. MediaWiki is really easy to use as
a service.
Building a custom tool from scratch is also part of
what got us in this
mess with Limn to begin with.
I see that I have caused a bit of a misunderstanding. So, Limn is well
over 10,000 lines of Coco. This is a dense language that transpiles to
roughly 20,000 lines of Javascript. The tool I'm proposing here is
basically ignoring 90% of the problems that Limn dealt with. Visualization
is the main problem, and that is solved by Vega JS [1]. Of the remaining
problems, we're ignoring about half of them by making this server-less. So
let's examine the points I made above:
* Getting a Dashboard up Quickly. EventLogging is well liked, so I figured
if we did something simpler than that, we'd be starting off on the right
foot. A dashboard could be a simple JSON document on mediawiki, rendered
the way EventLogging schemas are. A page called Dashboard:Growth could
have something like { graphs: [ {name: 'example', data-url:
'http://.../'}
] }. This would be viewable at
http://dashiki.wmflabs.org/dashboards/growth.
The JSON could be created from the dashboarding tool itself, but we can
start bare-bones. If this seems weird or not as fast as you'd like, please
describe an ideal scenario and let's talk about it.
* Dashboard discovery. Pau is designing the beginning of a solution to
this. Tessera says they have not dealt with this yet, and I don't see how
this is a generic problem (but I'd be glad to be wrong). But this is not
some giant project that we'd have to implement. It would organize the data
available in a friendly intuitive way, 'cause Pau is amazing as we all
know. If there's a generic solution out there for this, I haven't found it.
* Documentation of visualization capabilities. This is very well
documented on the Vega wiki, so it should take almost no effort if we
integrate it well. Most other tools we've tried to use are too limiting.
Timeseries only, maps only, etc. I think we would end up stitching a few
out-of-the-box solutions together and that seems more headache than it's
worth.
* Lack of annotations. I think it makes sense to store annotations in
MediaWiki, in a JSON document that's tied to a datafile. This way everyone
using that datafile can share the annotations, and anyone interested in the
history of the document can take advantage of MediaWiki's revision history.
In most other solutions I've seen, annotation is not a social activity,
but something that researchers do to explain their data. In our case, I've
heard many many people ask for something much richer when they talk about
annotations.
* Ease of contributing to the code base. I don't like big code bases. If
a simple dashboard layout built around getting metadata from mediawiki via
json and rendering Vega graphs gets anywhere *near* Limn's size and
incomprehensibility, you can burn me at the stake. I'll light the fire.
The other thing about Coco is that like 100 people in the world can read
it so this would be purely Javascript. And in gerrit.
[1] trifacta.github.io/vega/
p.s. Seriously, I'm not trying to reinvent the wheel here, please let me
know if you think out-of-the-box solves more problems than I think. The
whole idea behind this project is to offload as much as possible, I have a
miles-long backlog I can busy myself with, so I don't need to invent work.
And all of what I said above I'm making up on the spot, so I'm thinking
out loud much more than trying to sell a solution. Would a wiki page
explaining the pros and cons of this decision be a good use of our time?