Yes it would be easy to couple user names in a sparsely edited wiki to geo
info (or for that matter the name of a very active editor on a very busy
wiki).
So the point is that geo info should be inherently vague. As MaxMInd only
has city names for places with 100's of thousands if not millions
inhabitants that is not a give-away. Likewise with rounded lat/long.
Erik
From: analytics-bounces(a)lists.wikimedia.org
[mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of Adam Hyland
Sent: Monday, August 12, 2013 7:06 PM
To: A mailing list for the Analytics Team at WMF and everybody who has an
interest in Wikipedia and analytics.
Subject: Re: [Analytics] U.S. state-level editor retention data
Forgive me if I'm misunderstanding but wouldn't a set up like this (even
anonimized as described above) allow someone to recover the location of an
individual editor on sparsely edited wiki's?
If we're just looking to provide a convenient lookup for IP editors what is
the advantage of doing this over requiring researchers to use publicly
available IP databases to perform geolocation?
Adam Hyland
Developer at Bocoup
Web:
http://shift command
awesome.com
On Aug 12, 2013 12:46 PM, "Erik Zachte" <ezachte(a)wikimedia.org> wrote:
Some thought on this:
We have been discussing adding new geo data for a long time.
I lost track of current status and latest decisions but FWIW a year ago this
was the idea for squid log:
We thought of replacing ip address by a composite field (using a different
delimiter than the field delimiter).
The field could look like this:
4|hash code|CL||Santiago|-33.5,-70.5
6|hash code|US|CA|San Francisco|-37.5,122.5
Where 4 or 6 is the #triplets in ip address.
Hash code is anonimized ip address.
Country code as used by MaxMind (
http://dev.maxmind.com/geoip/legacy/codes/iso3166/ )
Region/state when available or else empty string (*)
City name when available or else empty string (
http://www.maxmind.com/GeoIPCity-534-Location.csv )
Lastly follow latitude/longitude, rounded on purpose. This gives resolution
of at best 55 km or 30 mi resolution, depending on latitude, to ensure
anonimization particularly for edits. Otherwise a very active editor in a
sparsely populated region of say China could easily be matched with edit
timestamps from dumps.
* Caveat:
Supplying region code requires 'external lookup' as MaxMind puts it. (
http://www.maxmind.com/en/city )
This is probably a costly operation.
Erik
From: analytics-bounces(a)lists.wikimedia.org
[mailto:analytics-bounces@lists.wikimedia.org] On Behalf Of James Hare
Sent: Sunday, August 11, 2013 1:55 PM
To: A mailing list for the Analytics Team at WMF and everybody who has an
interest in Wikipedia and analytics.
Subject: Re: [Analytics] U.S. state-level editor retention data
That will work. Cheers!
On Aug 10, 2013, at 9:21 AM, Toby Negrin wrote:
Hi James,
We can take a look at this -- the next step for WikiMetrics is to expand the
reporting capabilities. The developer with the most context is out until
Wednesday; we should be able to get back to you by the end of the week with
an estimate of how difficult it would be to implement this changes.
Will that work?
-Toby
On Sat, Aug 10, 2013 at 4:07 AM, Wikimedia DC <james.hare(a)wikidc.org> wrote:
Greetings,
I am James Hare, president of the Washington, DC chapter. At Wikimania I
have been learning about the editor retention data the Wikimedia Foundation
has been collecting and analyzing. I was discussing it with Ryan Kaldari and
he noted that while the data was available at the national level, it was not
yet available at the state level.
How difficult would it be to implement state-level analysis? Would it just
be a matter of simply changing the geolocation lookup code, or would it be a
very expensive change that would benefit relatively few people? For
Wikimedia DC's sake I am interested in data for the District of Columbia,
Maryland, Delaware, Virginia, and West Virginia (our defined chapter
region).
Regards,
James Hare
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics