On Sat, Aug 8, 2020 at 7:43 PM Amir Sarabadani ladsgroup@gmail.com wrote:
- By closed source, I don't mean it will be only accessible to me, It's
already accessible by another CU and one WMF staff, and I would gladly share the code with anyone who has signed NDA and they are of course more than welcome to change it. Github has a really low limit for people who can access a private repo but I would be fine with any means to fix this.
Closed source is commonly understood to mean the code is not under an OSI-approved open-source license (such code is banned from Toolforge). Contrary to common misconceptions, many OSI-approved open-source licenses (such as the GPL) allow keeping the code private, as long as the software itself is also kept private. IMO it would be less confusing to use the "public"/"private" terminology here - yes the code should be open-sourced, but that's mostly orthogonal to the concerns discussed here.
* It has been pointed out by people in the checkuser mailing list that
there's no point in logging accessing this tool, since the code is accessible to CUs (if they want to), so they can download and run it on their computer without logging anyway.
There's a significant difference between your actions not being logged vs. your actions being logged unless you actively circumvent the logging (in ways which would probably seem malicious). Clear red lines work well in a community project even when there's nothing physically stopping people from stepping over them.
* There is a huge difference between CU and this AI tool in matters of
privacy. While both are privacy sensitive but CU reveals much more, as a CU, I know where lots of people are living or studying because they showed up in my CUs (...) but this tool only reveals a connection between accounts if one of them is linked to a public identity and the other is not which I wholeheartedly agree is not great but it's not on the same level as seeing people's IPs.
On the other hand, IP checks are very unreliable. A hypothetical tool that is reliable would be a bigger privacy concern, since it would be used more often and more successfully to extract private details. (On the other other hand, as a Wikipedia editor I have a reasonable expectation of privacy of the site not telling its administrators where I live. Do I have a reasonable expectation of privacy for not telling them what my alt accounts are? Arguably not.)
Also, how much help would such a tool be in off-wiki stylometry? If it can be used (on its own or with additional tooling) to connect wiki accounts to other online accounts, that would subjectively seem to me to have a significantly larger privacy impact than IP addresses.