Re: [Wikitech-l] request; access to (anonymized) apache log data for a recommendation system

27 Jul 2005


      tpryor@media.mit.edu wrote:
...
Hello,
Apologies for this being a repeat; I was just informed that the original ended
up being read as part of an existing thread.
I'd like to make a request to obtain access to anonymized apache logs for
wikipedia user data.
I am creating a browsing interface for wikipedia that requires clustered user
data
(in that sense it is akin to finding articles using the amazon recommendation
system or the earlier movielens recommendation system).
For this I need access to user page requests over time- preferably stored in a
database. I can provide a script that will translate users' ip addresses to a
unique signature so that the users themselves remain anonymous, stuff the data
into a reasonably size efficient mysql table, etc.
I was told that I might need to talk to Kate about the feasibility of doing
this. Are there any existing objections to retaining anonymized apache log data
for research purposes?
Using publicly-available data you can find out the set of pages edited 
by each username.  Then it is possible, with some degree of uncertainty, 
to link some usernames to one or more "unique signature"s (from your 
quoted text above), by matching sets of user page requests to sets of 
pages edited.  Thus some of the data we would release to you is 
bordering on, if not definitely, personally identifiable data which is 
not already publicly available.  The privacy policy [1] says that 
"personally identifiable data collected in the server logs will not be 
released by the developers who have access to it," except under certain 
circumstances, none of which cover this case.
[1] http://wikimediafoundation.org/wiki/Privacy_policy
...
Tony Pryor
---
en:user:jeronim
---
Send instant messages to your online friends http://au.messenger.yahoo.com

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] request; access to (anonymized) apache log data for a recommendation system