[Foundation-l] Release of squid log data
Tim Starling
tstarling at wikimedia.org
Fri Sep 14 12:46:05 UTC 2007
Mathias Schindler wrote:
> On 9/14/07, Tim Starling <tstarling at wikimedia.org> wrote:
>
>>For a while now, we've been releasing squid log data,
>
>
> Is there a public url for accessing that data?
What data? You mean information about the project? The data itself is
only available as a UDP stream, there's no URL.
> And just two question: Do they need the actual IP-address or would
> just a distinct number to tell different IP addresses be sufficient?
I wouldn't recommend using a hashed IP address to anyone involved in
academic work. I've worked in the academic sector, I know how important
it is for data to be above any criticism. Any data using unique IP
addresses as an estimate of individual user population would be severely
skewed by proxies and NAT.
> When you say stripped of personally identifying information, does this
> include information such as search queries to our side that might to a
> certain degree be used to identify persons? People digging into the
> AOL-data did not need IP addresses to identify individual people.
Yes it includes search queries, user page queries, etc., but they're all
mixed in together in a homogeneous stream. There is no referrer data or
user agent data. So there is no way to correlate requests.
Also, we are only sending them 1 in every 10 requests. You can't tell
much about a person from one tenth of their requests, uniformly mixed in
with requests from 100 million other people.
-- Tim Starling
More information about the foundation-l
mailing list