Hi John
The fact that you are using public data for your analysis does NOT mean that it's compliant with the policy.
In fact, this policy was put into place precisely to make clear that even when using public data, making available an analysis may STILL constitute a privacy violation. from the toolserver policy page:
"analysis of publically available data (data mining) may well lead to information that compromizes the privacy of individuals (profiling). The fact that anyone could in theory perform this analysis does not justify the publication of such information."
Making this kind of information available to a closed circle of users entrusted by the community with special powers, such as admins with checkuser privileges, os one thing. Making them available to the public is quite another.
Please make sure that your tools do make available any analysis that allows insight into peoples habits or lifestyle, beyond what is easily and directly visible on wikipedia itself.
Regards, Daniel
On 01.04.2011 12:52, John wrote:
He has not, and the data collected via user-compare is generated solely via data collected from the API and almost exclusively used for SPI http://en.wikipedia.org/wiki/Wikipedia:Sockpuppet_investigations where gathering and analyzing this data is standard practice. Had I been using non-public data (anything generated from the sql databases that normal users do not have access to) I would agree that there may be privacy issues, however every piece of data that is used for that tool comes from the en.wikipedia.org http://en.wikipedia.org API.