Do the advantages of keeping unanonymized IP reader
logs for potential
debugging needs outweigh the privacy disadvantages?
Judging from prior postings to
this list the community members interest in
correctness of pageview data, pageview tools and pageview API far
outweights the concerns with a 60 day retention of raw IPs.
Again, repeating myself: could we make this 60 days interval slightly
smaller? Yes, probably, a bit. Could we do without short term retention of
raw IPs? No, not really.
On Fri, Nov 11, 2016 at 9:07 AM, James Salsman <jsalsman(a)gmail.com> wrote:
Nuria Ruiz wrote:
.... on our end we need buffer time that allows us to know that
should there be a bug we can reprocess pageviews if needed (this does
happen). That buffer time is now 60 days and perhaps it could be a bit
smaller but it is still going to be a matter of weeks, not days for which
the raw data needs to be available.
Do the advantages of keeping unanonymized IP reader logs for potential
debugging needs outweigh the privacy disadvantages?
What are the outcomes impacting users of the hypothetical loss of
pageviews data compared to a PII leak?
On Fri, Nov 11, 2016 at 8:11 AM, James Salsman <jsalsman(a)gmail.com> wrote:
Pine wrote:
I tend to think that checkusers will need the plain IP addresses....
I am not suggesting removing the IP addresses or proxy information from
POST
requests as checkuser requires.
We need to anonymize both IP addresses and proxy information with a
secure
hash if we want to keep each GET request's
geolocation, to be compliant
with
the Privacy Policy. The Privacy Policy is the
most prominent policy on
the
far left on the footer of every page served by
every editable project,
and
says explicitly that consent is required for the
use of geolocation. The
Privacy and other policies make it clear that POST requests and Visual
Editor submissions aren't going to be anonymized.
However, geolocations for POST edit and visual editor submissions still
require explicit consent which we have no way to obtain at present.
Editors'
geolocations as they edit are very useful for
research, but by the same
token have the most serious privacy concerns. Obtaining consent to store
geolocation seems like it would interfere with, complicate, and disrupt
editing. If geolocation is stored with anonymized IP addresses for GETs
but
not POSTs or Visual Editor submissions, both
could easily be recovered
because of simultaneously interleaved GET and POST requests for the same
article are unavoidable.
Do we have any privacy experts on staff who can give these issues a
thorough
analysis in light of all the issues raised in
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1450006 ?
If Ops needs IP addresses, they should be able to use synthetic POST
requests, as far as I can tell. If they anticipate a need for
non-anonymous
GET requests, then perhaps some kind of a
debugging switch which could be
used on a short term basis where an IP range or mask could be entered to
allow matching addresses to log non-anonymously before expiring in an
hour
would solve any anticipated need?
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics