Hi Pine,

I thought that was specified in either the Privacy Policy or Terms of Use but I can't find the specific reference, and that bothers me.

This is specified in the data retention guidelines:
https://meta.wikimedia.org/wiki/Data_retention_guidelines

Cheers!

On Fri, Nov 11, 2016 at 4:11 PM, James Salsman <jsalsman@gmail.com> wrote:
Pine wrote:
>
> I tend to think that checkusers will need the plain IP addresses....

I am not suggesting removing the IP addresses or proxy information from POST requests as checkuser requires.

We need to anonymize both IP addresses and proxy information with a secure hash if we want to keep each GET request's geolocation, to be compliant with the Privacy Policy. The Privacy Policy is the most prominent policy on the far left on the footer of every page served by every editable project, and says explicitly that consent is required for the use of geolocation. The Privacy and other policies make it clear that POST requests and Visual Editor submissions aren't going to be anonymized.

However, geolocations for POST edit and visual editor submissions still require explicit consent which we have no way to obtain at present. Editors' geolocations as they edit are very useful for research, but by the same token have the most serious privacy concerns. Obtaining consent to store geolocation seems like it would interfere with, complicate, and disrupt editing. If geolocation is stored with anonymized IP addresses for GETs but not POSTs or Visual Editor submissions, both could easily be recovered because of simultaneously interleaved GET and POST requests for the same article are unavoidable.

Do we have any privacy experts on staff who can give these issues a thorough analysis in light of all the issues raised in https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1450006 ?

If Ops needs IP addresses, they should be able to use synthetic POST requests, as far as I can tell. If they anticipate a need for non-anonymous GET requests, then perhaps some kind of a debugging switch which could be used on a short term basis where an IP range or mask could be entered to allow matching addresses to log non-anonymously before expiring in an hour would solve any anticipated need?

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics




--
Marcel Ruiz Forns
Analytics Developer
Wikimedia Foundation