[Foundation-l] Wikipedia Web Logs for scientific research
Mirco Nanni
mirco.nanni at isti.cnr.it
Tue May 10 15:56:58 UTC 2005
Dori wrote:
>>[Addendum: the sensible information in web logs is
>>essentially located in the "client IP" field ("who visited
>>that page"). However, for our research purposes such field
>>is not strictly needed as an encrypted version of it would
>>be enough, thus avoiding most of the privacy issues.]
>
> The problem is if you substitute the IP with a unique number, and you still
> show accesses to user pages, you can probably identify the logged in users.
> I'd be OK if the IPs were masked AND accesses to non-article namespace pages
> were not given out.
Well, our objective is not to make web accesses public,
but to apply analysis techniques on them and possibly make
some selected results public (something like -- but a bit
more sophisticated and specific than -- the Webalizer system
which is now used to build the Wikipedia usage statistics).
However, you are right, masking IPs does not solve
privacy problems once and for all. I agree with restricting
to web traffic relative to articles, discarding personal
pages and similar -- moreover, they are not very interesting
for our research purposes.
- Mirco
====================================
http://ercolino.isti.cnr.it/mirco
====================================
More information about the foundation-l
mailing list