Dario Taraborelli wrote, in reply to my question:
... Is there any legitimate research or any other need to save IP addresses associated with HTTP GET web logs to disk prior to creating a secure hash of them?
these are considerations that the analytics / ops team are best suited to answer, I encourage you to relay them to analytics-l if you want to have a more technical discussion.
I asked there, and there have been two detailed answers:
https://lists.wikimedia.org/pipermail/analytics/2016-November/005506.html
https://lists.wikimedia.org/pipermail/analytics/2016-November/005508.html
Since the analytics team considers the justification for storing personally identifying information such as IP address, proxy information, and geolocation (which we apparently perform on every reader request) to be based on the needs of Research and Ops, I would like to ask two further questions in light of this recent news article:
https://www.washingtonpost.com/news/the-switch/wp/2016/10/11/facebook-twitte...
1. What are the advantages and disadvantages of storing each reader request's geolocation?
2. Has Ops ever actually used reader GET request IP addresses to solve a problem which could not have been solved, for example, with POST requests for debugging?
3. If a research partner with access to the raw IP addresses, proxy information, and geolocation of our readers' requests were served with a subpoena by a US or overseas law enforcement organization, a national security letter, or were blackmailed or bribed, would the Foundation have any way to know?
I repeat my request that the IP and proxy information be anonymized with a secure cryptographic has before being stored to nonvolatile media, and suggest that storing the geolocation of every reader request is not within the letter or the spirit of the Foundation's privacy policy, which explicitly requires consent for the use of geolocation:
"Some features we offer work better if we know what area you are in. But it's completely up to you whether or not you want us to use geolocation tools to make some features available to you. If you consent, we can use GPS (and other technologies commonly used to determine location) to show you more relevant content."
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1450006 may be helpful for understanding my motivations about this issue.