(cc-ing Amir)

>> [sean] Is there a plan for purging old data from this one?

>[christian]Just to make expectations explicit:
>[christian] Since in a different part of this thread you are asking more for
>[christian] expected growth bounds, I assume that the table can stay at that size
>[christian] until discussion with Language about the way forward produced concrete
>[christian] next steps, and you do not expect us to prune data right away.

So we are all on the same page, the table has a lot of data cause i18n team was not aware logging was happening until we notify them of that fact. As Amir mentioned, the bug that prompted the logging has been fixed. As Dario said we definitely do not need that much data.
I confirmed last week that we only need to 2 weeks of data to analyze, the data is just a short  "survey" of what our users have available when it comes to fonts. So, yes, we could delete a bunch of the data and I believe Amir was about to request us to do so. 

Since I have no permits to create tables, could we create a temporary table that holds the last two weeks of data? We could use that for our analysis and get rid of the other table once the bugfix is in production and logging has stopped.


>[sean] I'm interested in identifying the expected growth bounds rather than limiting tables arbitrarily.

This is definitely an item on our court, we need to determine those bounds and throttle when they are exceeded.
We do not have any throttling when it comes to record creation. We detect the higher throughput of data but that's about it. 

I have created a backlog item to this extent:
https://bugzilla.wikimedia.org/show_bug.cgi?id=67470












On Thu, Jul 3, 2014 at 11:02 AM, Christian Aistleitner <christian@quelltextlich.at> wrote:
Hi Sean,

On Thu, Jul 03, 2014 at 12:21:34PM +1000, Sean Pringle wrote:
> The following table is easily the largest in eventlogging and growing
> fastest:
>
> 114G     UniversalLanguageSelector-tofu_7629564

thanks for the heads up!

We are aware of UniversalLanguageSelector-tofu producing too much data
since 2014-06-25 ([1], [2]), and Nuria is on it.

As I could not find a corresponding bug, I created one to track the
issue at:
  https://bugzilla.wikimedia.org/show_bug.cgi?id=67463


> Is there a plan for purging old data from this one?

Just to make expectations explicit:
Since in a different part of this thread you are asking more for
expected growth bounds, I assume that the table can stay at that size
until discussion with Language about the way forward produced concrete
next steps, and you do not expect us to prune data right away.

> There is a duplicate table called UniversalLanguageSelecTor-tofu_7629564 --
> note the uppercase T -- with a single row. Is that needed?

I noted that too when looking at the issue last week, but decided
against calling it out, since it's just a single small table.
I expect we see these artifacts from time to time. Do they get in the
way somehow, or is it ok to just keep them around?

Thanks,
Christian


[1] http://lists.wikimedia.org/pipermail/analytics/2014-June/002260.html
[2] search for “tofu” on
  http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-analytics/20140625.txt



--
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
                           Companies' registry: 360296y in Linz
Christian Aistleitner
Kefermarkterstrasze 6a/3     Email:  christian@quelltextlich.at
4293 Gutau, Austria          Phone:          +43 7946 / 20 5 81
                             Fax:            +43 7946 / 20 5 81
                             Homepage: http://quelltextlich.at/
---------------------------------------------------------------

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics