from the mysql inserter seems like the right thing to do.
I agree. We should implement partial auto-purging in
Hadoop though. In the
Echo schema some fields should still be purged.
Right, being able to move all this data to hadoop is contingent on having a
purging strategy. Filed ticket to do this:
On Wed, Dec 16, 2015 at 6:55 AM, Marcel Ruiz Forns <mforns(a)wikimedia.org>
wrote:
Sure, it doesn't have space problems, but the
problem remains that with a
table this large, it's impossible to query
and get results in our lifetime.
I see, makes sense.
I think in this case moving all of the data to Hadoop and blacklisting it
from the mysql inserter seems like the right
thing to do.
I agree. We should implement partial auto-purging in Hadoop though. In the
Echo schema some fields should still be purged.
On Wed, Dec 16, 2015 at 3:07 PM, Dan Andreescu <dandreescu(a)wikimedia.org>
wrote:
Just spoke with Jaime Crespo and he confirmed
that:
- m4-master (master EL database) only holds events for the last 45
days to avoid space problems. That's for all tables including Echo.
- analytics-storage is the replica that keeps the historical data
and is meant to apply the specific purging strategy agreed in the schema's
talk page. This database does not have space problems (yet).
Sure, it doesn't have space problems, but the problem remains that with
a
table this large, it's impossible to query and get results in our
lifetime. So we need to come up with some better solutions where we have
these huge volumes of valuable data. I think in this case moving all of
the data to Hadoop and blacklisting it from the mysql inserter seems like
the right thing to do. The only reason for data to exist in mysql should
be if we're querying data on a frequent period basis and taking actions
based on the results of those queries. Otherwise it's a waste of resources
and we should allocate that disk space to something else.
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
--
*Marcel Ruiz Forns*
Analytics Developer
Wikimedia Foundation
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics