Are there any plans to integrate the connection type
binary? (Sorry to
ask endless questions, but this is my jam :D)
Oliver, you are our end user, guide
us!
> On Feb 23, 2015, at 15:13, Oliver Keyes <okeyes(a)wikimedia.org> wrote:
>
> Neat! And those can then be accessed with say,
> geocoded_data['country_code'] in hive?
>
Are there any plans to integrate the connection type
binary? (Sorry to
ask endless questions, but this is my jam :D)
>
> On 23 February 2015 at 15:00, Joseph Allemandou
> <jallemandou(a)wikimedia.org> wrote:
>> Oops sorry, I forgot to answer this question :)
>> A new map field named "geocoded_data" will contain, when available:
>>
>> continent
>> country
>> country_code
>> subdivision
>> postal_code
>> city
>> timezone
>> latitude
>> longitude
>>
>> For instance:
>>
{"city":"Mukilteo","country_code":"US","longitude":"-122.3042","subdivision":"Washington","timezone":"America/Los_Angeles","postal_code":"98275","continent":"North
>>
America","latitude":"47.913","country":"United
States"}
>>
>> Cheers
>> Joseph
>>
>> On Mon, Feb 23, 2015 at 8:24 PM, Oliver Keyes <okeyes(a)wikimedia.org>
wrote:
>>>
>>> Gotcha. So, for transparency...what are we calculating? Country? City? :D
>>>
>>> On 23 February 2015 at 13:59, Joseph Allemandou
>>> <jallemandou(a)wikimedia.org> wrote:
>>>> As per the IRC discussion, we won't recompute historical data, but
start
>>>> computing new values from the deploy time onward.
>>>> A new "version" field, and associated documentation will also
be
>>>> provided,
>>>> allowing to follow changes along time.
>>>> Thanks for your inputs !
>>>> Best
>>>>
>>>>
>>>> On Mon, Feb 23, 2015 at 4:58 PM, Oliver Keyes
<okeyes(a)wikimedia.org>
>>>> wrote:
>>>>>
>>>>> I think it should be fine-ish; it depends what we're calculating.
When
>>>>> you say "geocoded information", what do you mean? Country?
City? I
>>>>> wouldn't expect country to move about a lot in 60 days (which is
the
>>>>> range of our data): I would expect city to.
>>>>>
>>>>> What's the status on getting an oozie job or similar to compute
going
>>>>> forward? To me that's more of a priority than historical data.
>>>>>
>>>>> On 23 February 2015 at 10:53, Joseph Allemandou
>>>>> <jallemandou(a)wikimedia.org> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> As part of my first assignment, I'll recompute our
historical
>>>>>> webrequest
>>>>>> dataset, adding client_ip and geocoded information.
>>>>>>
>>>>>> While it seems correct to compute historical client_ip based on
the
>>>>>> existing
>>>>>> ip and the x_forwarded_for, the use of the current state of the
>>>>>> geocoded
>>>>>> maxmind library to compute historical data is more error-prone.
>>>>>>
>>>>>> I can either compute it anyway, knowing that there'll be some
errors,
>>>>>> or
>>>>>> put
>>>>>> null values for data older than a given point in time.
>>>>>>
>>>>>> I'll launch the script to recompute the data as soon as
max(a
>>>>>> consensus
>>>>>> is
>>>>>> find on this matter, operations gives me the right to run the
script)
>>>>>> :)
>>>>>>
>>>>>> Thanks
>>>>>> --
>>>>>> Joseph Allemandou
>>>>>> Data Engineer @ Wikimedia Foundation
>>>>>> IRC: joal
>>>>>>
>>>>>> _______________________________________________
>>>>>> Analytics mailing list
>>>>>> Analytics(a)lists.wikimedia.org
>>>>>>
https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Oliver Keyes
>>>>> Research Analyst
>>>>> Wikimedia Foundation
>>>>>
>>>>> _______________________________________________
>>>>> Analytics mailing list
>>>>> Analytics(a)lists.wikimedia.org
>>>>>
https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Joseph Allemandou
>>>> Data Engineer @ Wikimedia Foundation
>>>> IRC: joal
>>>>
>>>> _______________________________________________
>>>> Analytics mailing list
>>>> Analytics(a)lists.wikimedia.org
>>>>
https://lists.wikimedia.org/mailman/listinfo/analytics
>>>>
>>>
>>>
>>>
>>> --
>>> Oliver Keyes
>>> Research Analyst
>>> Wikimedia Foundation
>>>
>>> _______________________________________________
>>> Analytics mailing list
>>> Analytics(a)lists.wikimedia.org
>>>
https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>>
>>
>> --
>> Joseph Allemandou
>> Data Engineer @ Wikimedia Foundation
>> IRC: joal
>>
>> _______________________________________________
>> Analytics mailing list
>> Analytics(a)lists.wikimedia.org
>>
https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>
>
>
> --
> Oliver Keyes
> Research Analyst
> Wikimedia Foundation
>
> _______________________________________________
> Analytics mailing list
> Analytics(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/analytics