If we skip the db and dump the data into hadoop it could probably handle the load. No idea if this is a good idea right now. Just a thought.
_______________________________________________---------- Forwarded message ----------
From: Gilles Dubuc <gilles@wikimedia.org>
Date: Tue, May 20, 2014 at 5:21 AM
Subject: Re: [Analytics] [Multimedia] Media Viewer Dashboards
To: Wikimedia Foundation Multimedia Team <multimedia@lists.wikimedia.org>
Cc: Analytics Team List <analytics@lists.wikimedia.org>
Media Viewer's usage of EventLogging grew considerably because of all the tracking we're doing: http://lists.wikimedia.org/pipermail/analytics/2014-May/002053.html and Nuria asked us to reduce the rate.
Due to the global size we're dealing with, instead of logging every action on every site, we'll now have to measure a sample and extrapolate an estimate. As a quickfix last Friday Gergo introduced the sampling of actions (one every thousand actions instead of each action is now recorded). As a result all figures on the actions graph were divided by 1000 overnight, making the line appear to go to 0. If you actually hover over recent days and look at the lest sidebar, you'll see that there are figures (they are kind of useless, though, more on that below).
We're now working on improvements and fixing the graphs: https://wikimedia.mingle.thoughtworks.com/projects/multimedia/cards/619 The general gist of it is that the figures will be compensated according to the sampling and that the sampling factor will be fine-tuned to only apply to metrics that were responsible for the high traffic.
Unfortunately it looks like the 1:1000 sampling since last Friday was too extreme and is destructive of information, even for the actions that were the most numerous. We knew that such a high sampling factor was going to destroy information for small wikis or metrics with low figures, but even the huge metrics in the millions have become unreliable. I'm saying that because multiplying even the largest figures by 1000 still doesn't give an estimate close to what it was before the change. Which means that the actions graph probably won't be fixable for the period since last Friday until my fixes make it through. Even compensating for the sampling (by multiplying the figures by 1000), the line would jump up and down every day for each metric.
Graphs other than actions are unaffected (they were already sampled). The duration log was also affected, but that one doesn't have graphs yet, as the task to create them has been given low priority in the cycle.
On Mon, May 19, 2014 at 8:43 PM, Fabrice Florin <fflorin@wikimedia.org> wrote:
_______________________________________________Hi guys,Does anyone know why the Media Viewer metrics dashboards seem to be stuck with old data from Friday?Is there anything we could fiddle with to get the new data to show up?
Thanks for any insights :)Fabrice_______________________________
Fabrice FlorinProduct ManagerWikimedia Foundation
Multimedia mailing list
Multimedia@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/multimedia
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics