Thanks! Always appreciate the work to improve the analytics infrastructure.
I did a quick unscientific speed comparison by running some comparable
queries in parallel on stat1004 and stat1002 (using beeline), and
didn't observe a clear difference. But maybe those were times were the
load on stat1002 was low anyway. I guess that in that case, the
execution time will mostly be determined by the database server that
both are connecting to (analytics1003.eqiad.wmnet).
On Mon, May 2, 2016 at 11:50 AM, Andrew Otto <otto(a)wikimedia.org> wrote:
Hi all!
For years now, y’all have been accessing the Analytics Hadoop Cluster using
stat1002. This works just fine, but others use stat1002 for number
crunching outside of Hadoop as well. At times stat1002 can get pretty
overloaded, which can make accessing Hadoop via this one box a little
annoying.
But fret no longer! stat1004 is here! stat1004 can now be accessed by
anyone in the analytics-privatedata-users and analytics-users groups. If
you previously had access to stat1002 AND used it to talk to Hive and
Hadoop, you may now also do this from stat1004. You don’t have to do
anything new to get access to stat1004 if you already had Hadoop accounts.
stat1002 will remain useable as is. If you are looking for a more dedicated
place from which to interact with Hadoop services, use stat1004 instead.
You don’t have to do anything to get access.
I’ve updated the wikitech documentation accordingly. Let us know if you
have any questions!
-Andrew
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
--
Tilman Bayer
Senior Analyst
Wikimedia Foundation
IRC (Freenode): HaeB