On 10 June 2015 at 10:53, Dan Andreescu <dandreescu@wikimedia.org> wrote:
> I see three ways for data to get into the cluster:
>
> 1. request stream, handled already, we're working on ways to pump the data
> back out through APIs
Awesome, and it'd end up in the Hadoop cluster in a table? How...do we
kick that off most easily?
An API, Dan ;)>> Second: what's best practices for this? What resources are available?
>> If I'm starting a service on Labs that provides data to third-parties,
>
>
> What exactly do you mean here? That's a loaded term and possibly against
> the labs privacy policy depending on what you mean.
>
>>
>> what would analytics recommend my easiest path is to getting request
>> logs into Hadoop?
>
>
> Weighing everything on balance, right now I'd say adding your name to the
> piwik supporters. So far, off the top of my head, that list is:
>
> * wikimedia store
> * annual report
> * the entire reading vertical
> * russian wikimedia chapter (most likely all other chapters would chime in
> supporting it)
> * a bunch of labs projects (including wikimetrics, vital signs, various
> dashboards, etc.)
>
How is piwik linked to Hadoop? I'm not asking "how do we visualise the
data" I'm asking how we get it into the cluster in the first place.