Diederik:
, I think that two issues get mixed up here:
geocoding of readers and geocoding of editors.
Not sure why you say that. I mention editors as well, not readers.
I don't think we should get too hung up on the
specific format right now, I am really not sure if a composite field is the best
implementation and at what level we want to geocode.
That was the format we more or less settled on July 2012. I am just reiterating. Of course
this is not cast in stone.
The idea then was to provide all these subfields if available on all traffic. That is if
performance allows, again performance is more an issue for state with its irregular
boundaries than for city where Maxmind probably just does simple arithmetic calculating
distance from nearest city center.
As for which data to provide: other people will want to see data broken down differently.
We got requests to analyze data from India and compare major cities. By doing geo-Ip on
all traffic and providing all geo data we can supply efficiently we do this just once for
all stakeholders .
Erik
From: analytics-bounces(a)lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org]
On Behalf Of James Hare
Sent: Tuesday, August 13, 2013 4:54 PM
To: A mailing list for the Analytics Team at WMF and everybody who has an interest in
Wikipedia and analytics.
Cc: A mailing list for the Analytics Team at WMF and everybody who has an interest in
Wikipedia and analytics.
Subject: Re: [Analytics] U.S. state-level editor retention data
On Aug 13, 2013, at 9:23 AM, Diederik van Liere <dvanliere(a)wikimedia.org> wrote:
On Mon, Aug 12, 2013 at 6:46 PM, Erik Zachte <ezachte(a)wikimedia.org> wrote:
Some thought on this:
We have been discussing adding new geo data for a long time.
I lost track of current status and latest decisions but FWIW a year ago this was the idea
for squid log:
We thought of replacing ip address by a composite field (using a different delimiter than
the field delimiter).
The field could look like this:
4|hash code|CL||Santiago|-33.5,-70.5
6|hash code|US|CA|San Francisco|-37.5,122.5
Where 4 or 6 is the #triplets in ip address.
Hash code is anonimized ip address.
Country code as used by MaxMind (
http://dev.maxmind.com/geoip/legacy/codes/iso3166/ )
Region/state when available or else empty string (*)
City name when available or else empty string (
http://www.maxmind.com/GeoIPCity-534-Location.csv )
Lastly follow latitude/longitude, rounded on purpose. This gives resolution of at best 55
km or 30 mi resolution, depending on latitude, to ensure anonimization particularly for
edits. Otherwise a very active editor in a sparsely populated region of say China could
easily be matched with edit timestamps from dumps.
I don't think we should get too hung up on the specific format right now, I am really
not sure if a composite field is the best implementation and at what level we want to
geocode. But more importantly, I think that two issues get mixed up here: geocoding of
readers and geocoding of editors.
It was my understanding that the original request pertained to geocoding of editors (if
that's not the case then my advance apologies).
@James: can you confirm that we are talking about geocoding of editors?
D
That is correct. Also, if it helps, I don't necessarily need *city*-level information,
just state. (For the purposes of this discussion, DC is a state since its stats would not
be aggregated with any other state's.)
James
* Caveat:
Supplying region code requires 'external lookup' as MaxMind puts it. (
http://www.maxmind.com/en/city )
This is probably a costly operation.
Erik
From: analytics-bounces(a)lists.wikimedia.org [mailto:analytics-bounces@lists.wikimedia.org]
On Behalf Of James Hare
Sent: Sunday, August 11, 2013 1:55 PM
To: A mailing list for the Analytics Team at WMF and everybody who has an interest in
Wikipedia and analytics.
Subject: Re: [Analytics] U.S. state-level editor retention data
That will work. Cheers!
On Aug 10, 2013, at 9:21 AM, Toby Negrin wrote:
Hi James,
We can take a look at this -- the next step for WikiMetrics is to expand the reporting
capabilities. The developer with the most context is out until Wednesday; we should be
able to get back to you by the end of the week with an estimate of how difficult it would
be to implement this changes.
Will that work?
-Toby
On Sat, Aug 10, 2013 at 4:07 AM, Wikimedia DC <james.hare(a)wikidc.org> wrote:
Greetings,
I am James Hare, president of the Washington, DC chapter. At Wikimania I have been
learning about the editor retention data the Wikimedia Foundation has been collecting and
analyzing. I was discussing it with Ryan Kaldari and he noted that while the data was
available at the national level, it was not yet available at the state level.
How difficult would it be to implement state-level analysis? Would it just be a matter of
simply changing the geolocation lookup code, or would it be a very expensive change that
would benefit relatively few people? For Wikimedia DC's sake I am interested in data
for the District of Columbia, Maryland, Delaware, Virginia, and West Virginia (our defined
chapter region).
Regards,
James Hare
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics