[Foundation-l] Release of squid log data

Tim Starling tstarling at wikimedia.org
Fri Sep 14 12:46:05 UTC 2007


Mathias Schindler wrote:
> On 9/14/07, Tim Starling <tstarling at wikimedia.org> wrote:
> 
>>For a while now, we've been releasing squid log data,
> 
> 
> Is there a public url for accessing that data?

What data? You mean information about the project? The data itself is 
only available as a UDP stream, there's no URL.

> And just two question: Do they need the actual IP-address or would
> just a distinct number to tell different IP addresses be sufficient?

I wouldn't recommend using a hashed IP address to anyone involved in 
academic work. I've worked in the academic sector, I know how important 
it is for data to be above any criticism. Any data using unique IP 
addresses as an estimate of individual user population would be severely 
skewed by proxies and NAT.

> When you say stripped of personally identifying information, does this
> include information such as search queries to our side that might to a
> certain degree be used to identify persons? People digging into the
> AOL-data did not need IP addresses to identify individual people.

Yes it includes search queries, user page queries, etc., but they're all 
mixed in together in a homogeneous stream. There is no referrer data or 
user agent data. So there is no way to correlate requests.

Also, we are only sending them 1 in every 10 requests. You can't tell 
much about a person from one tenth of their requests, uniformly mixed in 
with requests from 100 million other people.

-- Tim Starling




More information about the foundation-l mailing list