Hi everybody, here's a quick summary of notes / take-aways from the Analytics (Kraken) security review meeting.

* analytics1001 has been wiped and reimaged (restoring /home from backup)
* All proxies and externally-facing services have been disabled.
* Work is under way to bring everything that was puppetized under the analytics1001 puppetmaster into operations-puppet after proper review. Andrew is working closely with a number of people in ops to make this happen.
* All future deployments to the cluster will be puppetized and go through normal code review. Other than performance testing, these puppet confs will be tested in labs.
* The rest of the cluster will be wiped and reimaged out of puppet; data in HDFS will be preserved. This can be a rolling process allowing work to proceed while its under way.
* Schedule an Architectural Review meeting sometime during the SF Ops hackathon, including a look at additional services and auth methods that provide access to internal dashboards like Hue &such.
* Ensure all current "application" code (stuff written by WMF) gets reviewed:
** Cron doing HDFS import from Kafka
** Pig UDFs and other data tools used in processing
** Future: Storm ETL layer

We all agreed the overall goal is to get to an acceptable security state. During that process, the Analytics team still needs to continue to meet stakeholder needs and deliver on promises. We decided on keeping running a "minimum viable cluster" while reimaging boxes and civilizing cluster configuration:

* Wall off some portion of the boxes to continue recieving data and running jobs; all other boxes can be wiped (preserving HDFS partitions). Boxes would be incrementally removed from the "unsanitary" cluster, reimaged, and then added to the "sanitary" cluster. Stupid bathroom-related jokes to be avoided.
* Team Analytics to enumerate data processing jobs that will be running in the intermediate period; their configurations and tooling will be reviewed.
* Analytics and Ops engineers continue to have shell access. Jobs can be submitted and managed using the CLI tools; internal dashboards can be accessed via SSH tunnelling. Analysts working on the cluster will be approved for shell access on a case-by-case basis (afaik, just Evan Rosen (full-time analyst for Grantmaking & Programs), and Stefan Petrea (contractor for Analytics)). If more analysts desire access in the interim, we can work it out in a case-by-case basis.
* No public, external access of any box in either zone (including proxied, dashboards like Hue, or even static files) that hasn't gone through review.
* Analytics and Ops will work together to find a simple, acceptable mechanism for data export.

== Next Steps ==

* Analytics puppet manfiests fully reviewed and merged into master operations-puppet repository
** Andrew to come pow-wow before the SF Ops hackathon and buddy it up with ops to plow through some of this.
* Schedule Architecture Review
* Rolling reimaging of all analytics boxes (including hadoop data nodes but preserving data) implementing this "minimal viable cluster" plan.

Questions very welcome! It's entirely possible I've missed things or mistranslated them.

Cheers,
Team Analytics

--
David Schoonover
dsc@wikimedia.org