Mathias Schindler wrote:
On 9/14/07, Tim Starling
<tstarling(a)wikimedia.org> wrote:
For a while now, we've been releasing squid
log data,
Is there a public url for accessing that data?
What data? You mean information about the project? The data itself is
only available as a UDP stream, there's no URL.
And just two question: Do they need the actual
IP-address or would
just a distinct number to tell different IP addresses be sufficient?
I wouldn't recommend using a hashed IP address to anyone involved in
academic work. I've worked in the academic sector, I know how important
it is for data to be above any criticism. Any data using unique IP
addresses as an estimate of individual user population would be severely
skewed by proxies and NAT.
When you say stripped of personally identifying
information, does this
include information such as search queries to our side that might to a
certain degree be used to identify persons? People digging into the
AOL-data did not need IP addresses to identify individual people.
Yes it includes search queries, user page queries, etc., but they're all
mixed in together in a homogeneous stream. There is no referrer data or
user agent data. So there is no way to correlate requests.
Also, we are only sending them 1 in every 10 requests. You can't tell
much about a person from one tenth of their requests, uniformly mixed in
with requests from 100 million other people.
-- Tim Starling