I see three ways for data to get into the cluster:
1. request stream, handled already, we're working on ways to pump the data back out through APIs
2. Event Logging. We're making this scale arbitrarily by moving it to Kafka. Once that's done, we should be able to instrument pretty much anything with Event Logging
3. Piwik. There is a small but growing effort to stand up our own piwik instance so we can get basic canned reports out of the box and not have to reinvent the wheel for every single feature we're trying to instrument and learn about. This could replace a lot of the use cases for Event Logging and free up Event Logging to do more free-form research rather than cookie cutter web analytics.
Answers inline: