If I remember correctly, Chris had the maxmind db on github with a script that update it and commit changes. Thus making possible to "play back time" and get the state of the db how it was when than data was calculated.
I think Dan has that script & cron running in his homedir, if we could productionize this .. or at least document it on wikitech it will be great.
Thanks,
Nuria
On Mon, Feb 23, 2015 at 7:53 AM, Joseph Allemandou < jallemandou@wikimedia.org> wrote:
Hi,
As part of my first assignment, I'll recompute our historical webrequest dataset, adding client_ip and geocoded information.
While it seems correct to compute historical client_ip based on the existing ip and the x_forwarded_for, the use of the current state of the geocoded maxmind library to compute historical data is more error-prone.
I can either compute it anyway, knowing that there'll be some errors, or put null values for data older than a given point in time.
I'll launch the script to recompute the data as soon as max(a consensus is find on this matter, operations gives me the right to run the script) :)
Thanks
*Joseph Allemandou* Data Engineer @ Wikimedia Foundation IRC: joal
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics