As per the IRC discussion, we won't recompute historical data, but start computing new values from the deploy time onward.
A new "version" field, and associated documentation will also be provided, allowing to follow changes along time.
Thanks for your inputs !
Best


On Mon, Feb 23, 2015 at 4:58 PM, Oliver Keyes <okeyes@wikimedia.org> wrote:
I think it should be fine-ish; it depends what we're calculating. When
you say "geocoded information", what do you mean? Country? City? I
wouldn't expect country to move about a lot in 60 days (which is the
range of our data): I would expect city to.

What's the status on getting an oozie job or similar to compute going
forward? To me that's more of a priority than historical data.

On 23 February 2015 at 10:53, Joseph Allemandou
<jallemandou@wikimedia.org> wrote:
> Hi,
>
> As part of my first assignment, I'll recompute our historical webrequest
> dataset, adding client_ip and geocoded information.
>
> While it seems correct to compute historical client_ip based on the existing
> ip and the x_forwarded_for, the use of the current state of the geocoded
> maxmind library to compute historical data is more error-prone.
>
> I can either compute it anyway, knowing that there'll be some errors, or put
> null values for data older than a given point in time.
>
> I'll launch the script to recompute the data as soon as max(a consensus is
> find on this matter, operations gives me the right to run the script) :)
>
> Thanks
> --
> Joseph Allemandou
> Data Engineer @ Wikimedia Foundation
> IRC: joal
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>



--
Oliver Keyes
Research Analyst
Wikimedia Foundation

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics



--
Joseph Allemandou
Data Engineer @ Wikimedia Foundation
IRC: joal