Dario Taraborelli wrote, in reply to my question:
...
Is there any legitimate research or any other need to save IP
addresses associated with HTTP GET web logs to disk prior to
creating a secure hash of them?
these are considerations that the analytics / ops team are best suited to
answer, I encourage you to relay them to analytics-l if you want to have a
more technical discussion.
I asked there, and there have been two detailed answers:
https://lists.wikimedia.org/pipermail/analytics/2016-November/005506.html
https://lists.wikimedia.org/pipermail/analytics/2016-November/005508.html
Since the analytics team considers the justification for storing
personally identifying information such as IP address, proxy
information, and geolocation (which we apparently perform on every
reader request) to be based on the needs of Research and Ops, I would
like to ask two further questions in light of this recent news
article:
https://www.washingtonpost.com/news/the-switch/wp/2016/10/11/facebook-twitt…
1. What are the advantages and disadvantages of storing each reader
request's geolocation?
2. Has Ops ever actually used reader GET request IP addresses to solve
a problem which could not have been solved, for example, with POST
requests for debugging?
3. If a research partner with access to the raw IP addresses, proxy
information, and geolocation of our readers' requests were served with
a subpoena by a US or overseas law enforcement organization, a
national security letter, or were blackmailed or bribed, would the
Foundation have any way to know?
I repeat my request that the IP and proxy information be anonymized
with a secure cryptographic has before being stored to nonvolatile
media, and suggest that storing the geolocation of every reader
request is not within the letter or the spirit of the Foundation's
privacy policy, which explicitly requires consent for the use of
geolocation:
"Some features we offer work better if we know what area you are in.
But it's completely up to you whether or not you want us to use
geolocation tools to make some features available to you. If you
consent, we can use GPS (and other technologies commonly used to
determine location) to show you more relevant content."
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1450006 may be
helpful for understanding my motivations about this issue.