I'm working on a tool/skin/interface for wikipedia that makes use of pages
visited to enable a statistically-driven (centroid cluster) recommendation
system.
I would like to be able to use the wikipedia apache logs for the purpose of
inferring interest clusters. I certainly won't need the user's IP addresses as
they appear in the logs, and I would be willing to write a script that would
encrypt each logs, using standard encryption technology, before it is made
available- which would leave users distinct but anonymous. Also, I do not need
to know the order or even the day that the pages are accessed- an accumulated
week or month without dates is still useful, as long as I can be sure that the
data is valid. This data could perhaps also be made available to others
who are
doing statistical research.
Would this be acceptable use?
Thanks,
Tony Pryor