New subject: [Ops] [Research-Internal] Fwd: [wmfresearch] Hadoop Cluster Downtime

4 May 2015


      Phew, ok, things did go wrong!  We ran into a couple of bugs recently introduced in Yarn and in Hive that took us a while to find work arounds.  Jobs are again flowing through the cluster.  However, jobs have been lagging behind since they haven’t been able to run all day.  They should eventually catch up.  For now, the cluster is back open for business, but I’d appreciate if no one ran any heavy jobs until tomorrow.
Also, it is still possible we may run into other issues we haven’t yet seen, so I can’t guarantee that I won’t have to restart things again.
Anyway, aside from those hiccups. CDH 5.4.0 is now installed, Hive 1.1 and Spark 1.3.0 are now available, weeeeee!
-Ao
...
On May 4, 2015, at 11:05, Andrew Otto aotto@wikimedia.org wrote:
Hi all, as a reminder, I will be doing this upgrade today.  Within the next hour I will turn off the Hadoop cluster.  Please do not attempt to use it again until I notify you again.
Thanks!
-AO
...
On Apr 29, 2015, at 14:57, Robert West west@cs.stanford.edu wrote:
All good!
On Wed, Apr 29, 2015 at 11:35 AM, Aaron Halfaker
ahalfaker@wikimedia.org wrote:
...

the right research list  (Andrew, remove wmfresearch@ from your contact

list :P )
All looks good to me.  Thanks. :)
On Wed, Apr 29, 2015 at 1:11 PM, Leila Zia leila@wikimedia.org wrote:
...
FYI
Ashwin, Bob, Ellery, I don't anticipate this having negative impact on our
workflow. If you see possible issues, please communicate with Andrew (cc-ing
me), or let me know and I communicate. Thanks!
---------- Forwarded message ----------
From: Andrew Otto aotto@wikimedia.org
Date: Wed, Apr 29, 2015 at 11:05 AM
Subject: [wmfresearch] Hadoop Cluster Downtime
To: Operations Engineers ops@lists.wikimedia.org, "A mailing list for
the Analytics Team at WMF and everybody who has an interest in Wikipedia and
analytics." analytics@lists.wikimedia.org,
"wmfresearch@lists.wikimedia.org Research" wmfresearch@lists.wikimedia.org
Hi all!
CDH 5.4 is out[1] and we’d like to upgrade.  We are doing this now, rather
than later, because there is an important Parquet/Hive related bug that has
been fixed in this version[2].  This upgrade will include Spark 1.3, which
should at least make one researcher happy.
To do this upgrade, I need to schedule some downtime for Hadoop.  I’d like
to do this on Monday May 4th.  I expect the upgrade to take me no more than
an hour or two, but just to be safe I’d like to schedule the downtime for
the whole day.
If anyone has critical things that they absolutely have to run on Monday,
let me know now and I will find another day.
Thanks!
-Ao
[1]
http://blog.cloudera.com/blog/2015/04/cloudera-enterprise-5-4-is-released/
[2] https://issues.apache.org/jira/browse/HIVE-9482

wmfresearch mailing list
wmfresearch@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wmfresearch

Research-Internal mailing list
Research-Internal@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/research-internal

Research-Internal mailing list
Research-Internal@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/research-internal
-- 
Up for a little language game? -- http://www.unfun.me

Ops mailing list
Ops@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/ops

Re: [Analytics] [Ops] [Research-Internal] Fwd: [wmfresearch] Hadoop Cluster Downtime