Hi,
TL;DR: If you think your Hive queries are currently taking longer than usual, please find qchris in IRC, and if he is not responsive, kindly ask someone with root on stat1002 (like Ops) to kill the process
java -Dproc_balancer -Xmx1000m [...]
-----------------------------------------------------
Data in the Analytics cluster is not evenly distributed. Some data nodes are >90% full, while others are half empty.
Data nodes that are >90% full are considered unhealthy and no longer contribute to the pool of available resources. So unhealty data nodes no longer contribute to the total available memory in the cluster.
There are other motivations too, but the latter item alone is enough motivation to keep the data nodes balanced and hence healthy.
Rebalancing is running since 2015-02-26, but situation is getting worse quicker than rebalancing can rebalance.
We've been up to 5 unhealthy nodes. Since we're missing their memory, I decided that we should rebalance more aggressively. Hence, I bumped the rebalancer's capacity, and nodes are recovering and getting healthy again.
I am monitoring the increased-capacity rebalancer closely, but in case you're getting blocked by it without me noticing, please find me in IRC and let me know, so I can turn the rebalancer's capacity down. Or if you find me unresponsive, please find someone with root on stat1002 (like Ops) and ask thon to kill the process
java -Dproc_balancer -Xmx1000m [...]
on stat1002.
Have fun, Christian
Hi,
On Sat, Mar 07, 2015 at 10:47:50AM +0100, Christian Aistleitner wrote:
We've been up to 5 unhealthy nodes. Since we're missing their memory, [...]
We're back to 0 unhealthy nodes \o/
So the cluster can again use it's full amount of memory.
But since HDFS is still heavily misbalanced, I am leaving the balancer running for now.
If the balancer gets in the way of things, please find me in IRC and let me know, or find someone with root on stat1002 (like Ops) and ask thon to kill the process
java -Dproc_balancer -Xmx1000m [...]
on stat1002.
Have fun, Christian
Hi,
On Sun, Mar 08, 2015 at 10:18:40AM +0100, Christian Aistleitner wrote:
But since HDFS is still heavily misbalanced, I am leaving the balancer running for now.
Balancing now finished. HDFS is balanced again.
I did a short write-up about HDFS Balancing at
https://wikitech.wikimedia.org/w/index.php?title=Analytics/Cluster/Hadoop/Ad...
Have fun, Christian
Hi Analytics Dev team,
just a heads up about unbalanced HDFS. While productionizing “HDFS balancing” was discussed some weeks back, it seems other tasks took priority. Getting balancing in place before the weekend might spare you nodes getting unhealthy over the weekend (and all the issues that are chained to that).
Have fun, Christian
P.S.: https://wikitech.wikimedia.org/w/index.php?title=Analytics/Cluster/Hadoop/Ad...
I have filed a placeholder task for this: https://phabricator.wikimedia.org/T94933
Let's prioritize accordingly.
Thanks,
Nuria
On Thu, Apr 2, 2015 at 4:09 PM, Christian Aistleitner < christian@quelltextlich.at> wrote:
Hi Analytics Dev team,
just a heads up about unbalanced HDFS. While productionizing “HDFS balancing” was discussed some weeks back, it seems other tasks took priority. Getting balancing in place before the weekend might spare you nodes getting unhealthy over the weekend (and all the issues that are chained to that).
Have fun, Christian
P.S.: https://wikitech.wikimedia.org/w/index.php?title=Analytics/Cluster/Hadoop/Ad...
-- ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Kefermarkterstrasze 6a/3 Email: christian@quelltextlich.at 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 Fax: +43 7946 / 20 5 81 Homepage: http://quelltextlich.at/
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Thanks Christian, I’ve launched a balancer now. I should def automate this.
On Apr 2, 2015, at 19:21, Nuria Ruiz nuria@wikimedia.org wrote:
I have filed a placeholder task for this: https://phabricator.wikimedia.org/T94933 https://phabricator.wikimedia.org/T94933
Let's prioritize accordingly.
Thanks,
Nuria
On Thu, Apr 2, 2015 at 4:09 PM, Christian Aistleitner <christian@quelltextlich.at mailto:christian@quelltextlich.at> wrote: Hi Analytics Dev team,
just a heads up about unbalanced HDFS. While productionizing “HDFS balancing” was discussed some weeks back, it seems other tasks took priority. Getting balancing in place before the weekend might spare you nodes getting unhealthy over the weekend (and all the issues that are chained to that).
Have fun, Christian
P.S.: https://wikitech.wikimedia.org/w/index.php?title=Analytics/Cluster/Hadoop/Ad... https://wikitech.wikimedia.org/w/index.php?title=Analytics/Cluster/Hadoop/Administration&oldid=148246#Balancing_HDFS
-- ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Kefermarkterstrasze 6a/3 Email: christian@quelltextlich.at mailto:christian@quelltextlich.at 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 tel:%2B43%207946%20%2F%2020%205%2081 Fax: +43 7946 / 20 5 81 tel:%2B43%207946%20%2F%2020%205%2081 Homepage: http://quelltextlich.at/ http://quelltextlich.at/
Analytics mailing list Analytics@lists.wikimedia.org mailto:Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Thanks Christian and all.
On Fri, Apr 3, 2015 at 8:29 AM, Andrew Otto aotto@wikimedia.org wrote:
Thanks Christian, I’ve launched a balancer now. I should def automate this.
On Apr 2, 2015, at 19:21, Nuria Ruiz nuria@wikimedia.org wrote:
I have filed a placeholder task for this: https://phabricator.wikimedia.org/T94933
Let's prioritize accordingly.
Thanks,
Nuria
On Thu, Apr 2, 2015 at 4:09 PM, Christian Aistleitner < christian@quelltextlich.at> wrote:
Hi Analytics Dev team,
just a heads up about unbalanced HDFS. While productionizing “HDFS balancing” was discussed some weeks back, it seems other tasks took priority. Getting balancing in place before the weekend might spare you nodes getting unhealthy over the weekend (and all the issues that are chained to that).
Have fun, Christian
P.S.: https://wikitech.wikimedia.org/w/index.php?title=Analytics/Cluster/Hadoop/Ad...
-- ---- quelltextlich e.U. ---- \ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Kefermarkterstrasze 6a/3 Email: christian@quelltextlich.at 4293 Gutau, Austria Phone: +43 7946 / 20 5 81 Fax: +43 7946 / 20 5 81 Homepage: http://quelltextlich.at/
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics