I appreciate that Amir is acknowledging that as neat as this tool sounds, its use is fraught with risk. The comparison that immediately jumped to my mind is predictive algorithms used in the criminal justice system to assess risk of bail jumping or criminal recidividism. These algorithms have been largely secret, their use hidden, their conclusions non-public. The more we learn about them, the more deeply flawed it's clear they are. Obviously the real-world consequences of these tools are more severe in that they directly lead to the incarceration of many people, but I think the comparison is illustrative of the risks. It also suggests the type of ongoing comprehensive review that should be involved in making this tool available to users.
The potential misuse here to be concerned about is by amateurs with unsociable intent, or by intended users who are wreckless or ignorant of the risks. Major governments have the resources to easily build this themselves, and if they care enough about fingerprinting Wikipedians they likely already have.
I think if the tool is useful and there's a demand for it, everything about it - how it works, who uses it, what conclusions and actions are taken as a result of its use, etc - should be made public. That's the only way we'll discover the multiple ways in which it will surely eventually be misused. SPI has been using these 'techniques' in a manual way, or with unsophisticated tools, for many years. But like any tool, the data fed into it can be training the system incorrectly. The results it returns can be misunderstood or intentionally misused. Knowledge of its existence will lead the most sophisticated to beat it, or intentionally misdirect it. People who are innocent of any violation of our norms will be harmed by its use. Please establish the proper cultural and procedural safeguards to limit the harm as much as possible.