(cc-ing Amir)
[sean] Is there a plan for purging old data from this one?
[christian]Just to make expectations explicit: [christian] Since in a different part of this thread you are asking more
for
[christian] expected growth bounds, I assume that the table can stay at
that size
[christian] until discussion with Language about the way forward produced
concrete
[christian] next steps, and you do not expect us to prune data right away.
So we are all on the same page, the table has a lot of data cause i18n team was not aware logging was happening until we notify them of that fact. As Amir mentioned, the bug that prompted the logging has been fixed. As Dario said we definitely do not need that much data. I confirmed last week that we only need to 2 weeks of data to analyze, the data is just a short "survey" of what our users have available when it comes to fonts. So, yes, we could delete a bunch of the data and I believe Amir was about to request us to do so.
Since I have no permits to create tables, could we create a temporary table that holds the last two weeks of data? We could use that for our analysis and get rid of the other table once the bugfix is in production and logging has stopped.
[sean] I'm interested in identifying the expected growth bounds rather
than limiting tables arbitrarily.
This is definitely an item on our court, we need to determine those bounds and throttle when they are exceeded. We do not have any throttling when it comes to record creation. We detect the higher throughput of data but that's about it.
I have created a backlog item to this extent: https://bugzilla.wikimedia.org/show_bug.cgi?id=67470
On Thu, Jul 3, 2014 at 11:02 AM, Christian Aistleitner < christian@quelltextlich.at> wrote:
Hi Sean,
On Thu, Jul 03, 2014 at 12:21:34PM +1000, Sean Pringle wrote:
The following table is easily the largest in eventlogging and growing fastest:
114G UniversalLanguageSelector-tofu_7629564
thanks for the heads up!
We are aware of UniversalLanguageSelector-tofu producing too much data since 2014-06-25 ([1], [2]), and Nuria is on it.
As I could not find a corresponding bug, I created one to track the issue at: https://bugzilla.wikimedia.org/show_bug.cgi?id=67463
Is there a plan for purging old data from this one?
Just to make expectations explicit: Since in a different part of this thread you are asking more for expected growth bounds, I assume that the table can stay at that size until discussion with Language about the way forward produced concrete next steps, and you do not expect us to prune data right away.
There is a duplicate table called UniversalLanguageSelecTor-tofu_7629564
--
note the uppercase T -- with a single row. Is that needed?
I noted that too when looking at the issue last week, but decided against calling it out, since it's just a single small table. I expect we see these artifacts from time to time. Do they get in the way somehow, or is it ok to just keep them around?
Thanks, Christian
[1] http://lists.wikimedia.org/pipermail/analytics/2014-June/002260.html [2] search for “tofu” on http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-analytics/20140625.txt
-- ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Kefermarkterstrasze 6a/3 Email: christian@quelltextlich.at 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 Fax: +43 7946 / 20 5 81 Homepage: http://quelltextlich.at/
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics