I appreciate that Amir is acknowledging that as neat as this tool sounds,
its use is fraught with risk. The comparison that immediately jumped to my
mind is predictive algorithms used in the criminal justice system to assess
risk of bail jumping or criminal recidividism. These algorithms have been
largely secret, their use hidden, their conclusions non-public. The more we
learn about them, the more deeply flawed it's clear they are. Obviously the
real-world consequences of these tools are more severe in that they
directly lead to the incarceration of many people, but I think the
comparison is illustrative of the risks. It also suggests the type of
ongoing comprehensive review that should be involved in making this tool
available to users.
The potential misuse here to be concerned about is by amateurs with
unsociable intent, or by intended users who are wreckless or ignorant of
the risks. Major governments have the resources to easily build this
themselves, and if they care enough about fingerprinting Wikipedians they
likely already have.
I think if the tool is useful and there's a demand for it, everything about
it - how it works, who uses it, what conclusions and actions are taken as a
result of its use, etc - should be made public. That's the only way we'll
discover the multiple ways in which it will surely eventually be misused.
SPI has been using these 'techniques' in a manual way, or with
unsophisticated tools, for many years. But like any tool, the data fed into
it can be training the system incorrectly. The results it returns can be
misunderstood or intentionally misused. Knowledge of its existence will
lead the most sophisticated to beat it, or intentionally misdirect it.
People who are innocent of any violation of our norms will be harmed by its
use. Please establish the proper cultural and procedural safeguards to
limit the harm as much as possible.