Re: [Wikimedia-l] DEITYBOUNCE and reader logs (was Re: Introducing Victoria Coleman, WMF Chief Technology Officer)

8 Nov 2016

Leila Zia wrote:
...
 ... we are not aware of any reader logs being shipped
out of the
 WMF servers. 
Page 20 of http://infolab.stanford.edu/~west1/pubs/West_Dissertation-2016.pdf
says, "We have access to Wikimedia’s full server logs, containing all
HTTP requests to Wikimedia projects." Page 19 indicates that this
information includes the "IP address, proxy information, and user agent."

At https://youtu.be/jQ0NPhT-fsE&t=25m40s Dr. West says, "we have
the complete ... server logs from Wikipedia ... about 14 terabytes of
raw logs per month."

If this does not imply that the logs are copied from Foundation
servers, that is certainly advantageous over the apparent meaning
of the language used. But I question whether recording the personally
identifying data in the first place is wise.

I understand that there are currently two other university research
laboratories which have similar access. Is that correct?

Would anyone in the Foundation have any way to know whether any
of the researchers with access are subject to National Security
Letters, a subpoena from a US or foreign law enforcement agency,
or blackmail, extortion, or bribery, for that matter?

Is creating the MD5 has described on page 19 of Dr. West's
dissertation after filtering bots from the user agents and discarding
the IP address before ever storing the log files to disk an
appropriate solution to this problem?

Should SHA-512 be used instead of MD5?

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [Wikimedia-l] DEITYBOUNCE and reader logs (was Re: Introducing Victoria Coleman, WMF Chief Technology Officer)