As per the IRC discussion, we won't recompute historical data, but start computing new values from the deploy time onward. A new "version" field, and associated documentation will also be provided, allowing to follow changes along time. Thanks for your inputs ! Best
On Mon, Feb 23, 2015 at 4:58 PM, Oliver Keyes okeyes@wikimedia.org wrote:
I think it should be fine-ish; it depends what we're calculating. When you say "geocoded information", what do you mean? Country? City? I wouldn't expect country to move about a lot in 60 days (which is the range of our data): I would expect city to.
What's the status on getting an oozie job or similar to compute going forward? To me that's more of a priority than historical data.
On 23 February 2015 at 10:53, Joseph Allemandou jallemandou@wikimedia.org wrote:
Hi,
As part of my first assignment, I'll recompute our historical webrequest dataset, adding client_ip and geocoded information.
While it seems correct to compute historical client_ip based on the
existing
ip and the x_forwarded_for, the use of the current state of the geocoded maxmind library to compute historical data is more error-prone.
I can either compute it anyway, knowing that there'll be some errors, or
put
null values for data older than a given point in time.
I'll launch the script to recompute the data as soon as max(a consensus
is
find on this matter, operations gives me the right to run the script) :)
Thanks
Joseph Allemandou Data Engineer @ Wikimedia Foundation IRC: joal
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- Oliver Keyes Research Analyst Wikimedia Foundation
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics