Thanks! Always appreciate the work to improve the analytics infrastructure.
I did a quick unscientific speed comparison by running some comparable queries in parallel on stat1004 and stat1002 (using beeline), and didn't observe a clear difference. But maybe those were times were the load on stat1002 was low anyway. I guess that in that case, the execution time will mostly be determined by the database server that both are connecting to (analytics1003.eqiad.wmnet).
On Mon, May 2, 2016 at 11:50 AM, Andrew Otto otto@wikimedia.org wrote:
Hi all!
For years now, y’all have been accessing the Analytics Hadoop Cluster using stat1002. This works just fine, but others use stat1002 for number crunching outside of Hadoop as well. At times stat1002 can get pretty overloaded, which can make accessing Hadoop via this one box a little annoying.
But fret no longer! stat1004 is here! stat1004 can now be accessed by anyone in the analytics-privatedata-users and analytics-users groups. If you previously had access to stat1002 AND used it to talk to Hive and Hadoop, you may now also do this from stat1004. You don’t have to do anything new to get access to stat1004 if you already had Hadoop accounts.
stat1002 will remain useable as is. If you are looking for a more dedicated place from which to interact with Hadoop services, use stat1004 instead.
You don’t have to do anything to get access.
I’ve updated the wikitech documentation accordingly. Let us know if you have any questions!
-Andrew
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics