This interesting bot showed up on hackernews today: https://news.ycombinator.com/item?id=8018284
While in this instance the access to anonymous' editors IP addresses is definitely useful in terms of identifying edits with probable conflict of interest, it makes me wonder what the history is behind the fact that anonymous editors are identified by their IP addresses on WMF-hosted wikis.
IP addresses are closely guarded for registered users, why wouldn't anonymous users be identified by a hash of their IP address in order to protect their privacy as well? The exact same functionality of being able to see all edits by a given anonymous IP would still exist, the IP itself just wouldn't be publicly available, protected with the same access rights as registered users'.
The "use case" that makes me think of that is someone living in a totalitarian regime making a sensitive edit and forgetting that they're logged out. Or just being unaware that being anonymous on the wiki doesn't mean that their local authorities can figure out who they are based on IP address and time. Understanding that they're somewhat protected when logged in and not when logged out requires a certain level of technical understanding. The easy way out of this argument is to state that these users should be using Tor or something similar. But I still wonder why we have this double standard of protecting registered users' privacy in regards to IP addresses and not applying the same for anonymous users, when simple hashing would do the job.
I agree that it’s a double standard, but looking at the bright side, it becomes a big encouragement to anonymous users to register and log in. The Account Creation Experience Team (or whoever the hell is in charge of that) can correct me, but I would imagine that we would see a big drop in registered accounts if IPs were hashed.
Also, it’d be really annoying to have hashes as usernames, so we’d have to think of an alternative scheme that makes things more readable. -- Tyler Romeo 0x405D34A7C86B42DF
From: Gilles Dubuc gilles@wikimedia.org Reply: Wikimedia developers wikitech-l@lists.wikimedia.org> Date: July 11, 2014 at 9:34:18 To: Wikimedia developers wikitech-l@lists.wikimedia.org> Subject: [Wikitech-l] Anonymous editors & IP addresses
This interesting bot showed up on hackernews today: https://news.ycombinator.com/item?id=8018284
While in this instance the access to anonymous' editors IP addresses is definitely useful in terms of identifying edits with probable conflict of interest, it makes me wonder what the history is behind the fact that anonymous editors are identified by their IP addresses on WMF-hosted wikis.
IP addresses are closely guarded for registered users, why wouldn't anonymous users be identified by a hash of their IP address in order to protect their privacy as well? The exact same functionality of being able to see all edits by a given anonymous IP would still exist, the IP itself just wouldn't be publicly available, protected with the same access rights as registered users'.
The "use case" that makes me think of that is someone living in a totalitarian regime making a sensitive edit and forgetting that they're logged out. Or just being unaware that being anonymous on the wiki doesn't mean that their local authorities can figure out who they are based on IP address and time. Understanding that they're somewhat protected when logged in and not when logged out requires a certain level of technical understanding. The easy way out of this argument is to state that these users should be using Tor or something similar. But I still wonder why we have this double standard of protecting registered users' privacy in regards to IP addresses and not applying the same for anonymous users, when simple hashing would do the job. _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
I would imagine that we would see a big drop in registered accounts if IPs were hashed.
Why? Most casual web users don't even know what an IP address is, let alone what their own address is. In fact the evolution of browsers tends to even hide the URL. This is the sort of technical information that an ever-shrinking portion of web users know about these days.
an alternative scheme that makes things more readable
A hash can take many forms. In fact it could be formatted just like an IP address. Even if the hash format mixes letters and numbers, as long as the length is similar, I don't see how IP addresses are superior in terms of readability.
On Fri, Jul 11, 2014 at 10:25 AM, Tyler Romeo tylerromeo@gmail.com wrote:
I agree that it’s a double standard, but looking at the bright side, it becomes a big encouragement to anonymous users to register and log in. The Account Creation Experience Team (or whoever the hell is in charge of that) can correct me, but I would imagine that we would see a big drop in registered accounts if IPs were hashed.
Also, it’d be really annoying to have hashes as usernames, so we’d have to think of an alternative scheme that makes things more readable. -- Tyler Romeo 0x405D34A7C86B42DF
From: Gilles Dubuc gilles@wikimedia.org gilles@wikimedia.org Reply: Wikimedia developers wikitech-l@lists.wikimedia.org> wikitech-l@lists.wikimedia.org Date: July 11, 2014 at 9:34:18 To: Wikimedia developers wikitech-l@lists.wikimedia.org> wikitech-l@lists.wikimedia.org Subject: [Wikitech-l] Anonymous editors & IP addresses
This interesting bot showed up on hackernews today: https://news.ycombinator.com/item?id=8018284
While in this instance the access to anonymous' editors IP addresses is definitely useful in terms of identifying edits with probable conflict of interest, it makes me wonder what the history is behind the fact that anonymous editors are identified by their IP addresses on WMF-hosted wikis.
IP addresses are closely guarded for registered users, why wouldn't anonymous users be identified by a hash of their IP address in order to protect their privacy as well? The exact same functionality of being able to see all edits by a given anonymous IP would still exist, the IP itself just wouldn't be publicly available, protected with the same access rights as registered users'.
The "use case" that makes me think of that is someone living in a totalitarian regime making a sensitive edit and forgetting that they're logged out. Or just being unaware that being anonymous on the wiki doesn't mean that their local authorities can figure out who they are based on IP address and time. Understanding that they're somewhat protected when logged in and not when logged out requires a certain level of technical understanding. The easy way out of this argument is to state that these users should be using Tor or something similar. But I still wonder why we have this double standard of protecting registered users' privacy in regards to IP addresses and not applying the same for anonymous users, when simple hashing would do the job. _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
This is one of those perennial proposals that never quite seems to take off; I can remember having some version of this discussion back in 2008, and I know that some of our earliest edits show a partially obscured IP address, not the whole thing. It might require Brion or Tim or someone else of that length of experience to explain the original thinking.
Some of the "pros" of keeping the IP address as the "username" for unregistered users:
- Even in this day and age, there are plenty of people with stable IPs; they choose to edit as unregistered users for philosophical reasons, and their IP's edit history is essentially their own editing history - Especially on smaller projects (but also big ones), range blocks are usually calculated and applied by administrators, not checkusers/stewards.
Some of the "cons" of publishing the IP address as the username:
- Privacy - IPv6 addresses in particular are including more and more very specific information that could be used to link RealLife Name with the edits. (My own ISP now gives enough information in many cases to narrow geolocation down to a one-block radius - a big change from 2 years ago when geolocation was about an 800 mile radius.) - Privacy - more and more jurisdictions consider a person's IP address to be "private" information. Our page histories could be considered one gigantic privacy violation. - Increasingly dynamic IP addresses, often rotating within very large ranges that no longer link with any certainty to geolocation - Freaked out new users who didn't really get that their IP address was going to be very publicly displayed.
I'm pretty sure there are a whole pile more pros and cons that we can pull out of the archives from various mailing lists, and I know that there have periodically been discussions amongst developers and the rest of the engineering team to try to come up with a "better way" - but like many other interesting, good and even potentially necessary ideas, it's never made it to the top of the priority heap.
Putting on my checkuser hat for just a minute...it's essential information for having any chance at all of identifying multiple accounts or pattern editing; however, the tables used by checkusers are non-public so Checkusers continuing to have access to IP data should not be an issue.
Risker/Anne
On 11 July 2014 10:25, Tyler Romeo tylerromeo@gmail.com wrote:
I agree that it’s a double standard, but looking at the bright side, it becomes a big encouragement to anonymous users to register and log in. The Account Creation Experience Team (or whoever the hell is in charge of that) can correct me, but I would imagine that we would see a big drop in registered accounts if IPs were hashed.
Also, it’d be really annoying to have hashes as usernames, so we’d have to think of an alternative scheme that makes things more readable. -- Tyler Romeo 0x405D34A7C86B42DF
From: Gilles Dubuc gilles@wikimedia.org Reply: Wikimedia developers wikitech-l@lists.wikimedia.org> Date: July 11, 2014 at 9:34:18 To: Wikimedia developers wikitech-l@lists.wikimedia.org> Subject: [Wikitech-l] Anonymous editors & IP addresses
This interesting bot showed up on hackernews today: https://news.ycombinator.com/item?id=8018284
While in this instance the access to anonymous' editors IP addresses is definitely useful in terms of identifying edits with probable conflict of interest, it makes me wonder what the history is behind the fact that anonymous editors are identified by their IP addresses on WMF-hosted wikis.
IP addresses are closely guarded for registered users, why wouldn't anonymous users be identified by a hash of their IP address in order to protect their privacy as well? The exact same functionality of being able to see all edits by a given anonymous IP would still exist, the IP itself just wouldn't be publicly available, protected with the same access rights as registered users'.
The "use case" that makes me think of that is someone living in a totalitarian regime making a sensitive edit and forgetting that they're logged out. Or just being unaware that being anonymous on the wiki doesn't mean that their local authorities can figure out who they are based on IP address and time. Understanding that they're somewhat protected when logged in and not when logged out requires a certain level of technical understanding. The easy way out of this argument is to state that these users should be using Tor or something similar. But I still wonder why we have this double standard of protecting registered users' privacy in regards to IP addresses and not applying the same for anonymous users, when simple hashing would do the job. _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Even in this day and age, there are plenty of people with stable IPs
With hashing, a given IP would always give the same hash. So this uniqueness property would remain for people with stable IPs.
On Fri, Jul 11, 2014 at 10:55 AM, Risker risker.wp@gmail.com wrote:
This is one of those perennial proposals that never quite seems to take off; I can remember having some version of this discussion back in 2008, and I know that some of our earliest edits show a partially obscured IP address, not the whole thing. It might require Brion or Tim or someone else of that length of experience to explain the original thinking.
Some of the "pros" of keeping the IP address as the "username" for unregistered users:
- Even in this day and age, there are plenty of people with stable IPs;
they choose to edit as unregistered users for philosophical reasons, and their IP's edit history is essentially their own editing history
- Especially on smaller projects (but also big ones), range blocks are
usually calculated and applied by administrators, not checkusers/stewards.
Some of the "cons" of publishing the IP address as the username:
- Privacy - IPv6 addresses in particular are including more and more
very specific information that could be used to link RealLife Name with the edits. (My own ISP now gives enough information in many cases to narrow geolocation down to a one-block radius - a big change from 2 years ago when geolocation was about an 800 mile radius.)
- Privacy - more and more jurisdictions consider a person's IP address
to be "private" information. Our page histories could be considered one gigantic privacy violation.
- Increasingly dynamic IP addresses, often rotating within very large
ranges that no longer link with any certainty to geolocation
- Freaked out new users who didn't really get that their IP address was
going to be very publicly displayed.
I'm pretty sure there are a whole pile more pros and cons that we can pull out of the archives from various mailing lists, and I know that there have periodically been discussions amongst developers and the rest of the engineering team to try to come up with a "better way" - but like many other interesting, good and even potentially necessary ideas, it's never made it to the top of the priority heap.
Putting on my checkuser hat for just a minute...it's essential information for having any chance at all of identifying multiple accounts or pattern editing; however, the tables used by checkusers are non-public so Checkusers continuing to have access to IP data should not be an issue.
Risker/Anne
On 11 July 2014 10:25, Tyler Romeo tylerromeo@gmail.com wrote:
I agree that it’s a double standard, but looking at the bright side, it becomes a big encouragement to anonymous users to register and log in.
The
Account Creation Experience Team (or whoever the hell is in charge of
that)
can correct me, but I would imagine that we would see a big drop in registered accounts if IPs were hashed.
Also, it’d be really annoying to have hashes as usernames, so we’d have
to
think of an alternative scheme that makes things more readable.
Tyler Romeo 0x405D34A7C86B42DF
From: Gilles Dubuc gilles@wikimedia.org Reply: Wikimedia developers wikitech-l@lists.wikimedia.org> Date: July 11, 2014 at 9:34:18 To: Wikimedia developers wikitech-l@lists.wikimedia.org> Subject: [Wikitech-l] Anonymous editors & IP addresses
This interesting bot showed up on hackernews today: https://news.ycombinator.com/item?id=8018284
While in this instance the access to anonymous' editors IP addresses is definitely useful in terms of identifying edits with probable conflict of interest, it makes me wonder what the history is behind the fact that anonymous editors are identified by their IP addresses on WMF-hosted
wikis.
IP addresses are closely guarded for registered users, why wouldn't anonymous users be identified by a hash of their IP address in order to protect their privacy as well? The exact same functionality of being able to see all edits by a given anonymous IP would still exist, the IP itself just wouldn't be publicly available, protected with the same access
rights
as registered users'.
The "use case" that makes me think of that is someone living in a totalitarian regime making a sensitive edit and forgetting that they're logged out. Or just being unaware that being anonymous on the wiki
doesn't
mean that their local authorities can figure out who they are based on IP address and time. Understanding that they're somewhat protected when
logged
in and not when logged out requires a certain level of technical understanding. The easy way out of this argument is to state that these users should be using Tor or something similar. But I still wonder why we have this double standard of protecting registered users' privacy in regards to IP addresses and not applying the same for anonymous users,
when
simple hashing would do the job. _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
As a quick implementation note, we would not be using a hash for the IP address.
Most likely, we would encrypt the IP with AES or something using a configuration-based secret key. That way checkusers can still reverse the hash back into normal IP addresses without having to store the mapping in the database.
-- Tyler Romeo 0x405D34A7C86B42DF
From: Gilles Dubuc gilles@wikimedia.org Reply: Wikimedia developers wikitech-l@lists.wikimedia.org> Date: July 11, 2014 at 10:59:55 To: Wikimedia developers wikitech-l@lists.wikimedia.org> Subject: Re: [Wikitech-l] Anonymous editors & IP addresses
With hashing, a given IP would always give the same hash. So this uniqueness property would remain for people with stable IPs.
Am 11.07.2014 17:19, schrieb Tyler Romeo:
Most likely, we would encrypt the IP with AES or something using a configuration-based secret key. That way checkusers can still reverse the hash back into normal IP addresses without having to store the mapping in the database.
There are two problems with this, I think.
1) No forward secrecy. If that key is ever leaked, all IPs become "plain". And it will be, sooner or later. This would probably not be obvious, so this feature would instill a false sense of security.
2) No range blocks. It's often quite useful to be able to block a range of IPs. This is an important tool in the fight against spammers, taking it away would be a problem.
-- daniel
On Friday, July 11, 2014, Daniel Kinzler daniel@brightbyte.de wrote:
Am 11.07.2014 17:19, schrieb Tyler Romeo:
Most likely, we would encrypt the IP with AES or something using a configuration-based secret key. That way checkusers can still reverse the hash back into normal IP addresses without having to store the mapping
in the
database.
There are two problems with this, I think.
- No forward secrecy. If that key is ever leaked, all IPs become "plain".
And it will be, sooner or later. This would probably not be obvious, so this feature would instill a false sense of security.
This is probably the biggest issue. Even if we hmac it, it's trivial to brute force the entire ipv4 (and with intelligent assumptions about generation, most of the ipv6) range in seconds, if the key was ever known.
- No range blocks. It's often quite useful to be able to block a range of
IPs. This is an important tool in the fight against spammers, taking it away would be a problem.
Range blocks, I imagine, would continue working the same way they do. Someone would have to identify the correct range (which is very difficult when administrators can't see IP's), but on submission, we have the IP address to check against the blocks. (Unless someone proposes to store block ranges as hashes, that would definitely get rid of range blocks).
-- daniel
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l
++the EFF for more ideas, they are actively doing great work on so-called perfect forward secrecy.
There are simple things we could do to achieve a better balance between privacy and sockpantsing, such as cryptolog [1], in which IP addresses are hashed using a salt that changes every day. In theory, nobody can reverse the function to reveal the IP, but you can still correlate all of an address's edits for the day, week, or whatever, making CheckUser possible.
IP range blocking obviously needs to happen up-front, before the IP is mangled. I have no suggestions, but maybe browser and preferences fingerprinting would be more effective anyway, since: tor.
-Adam
[1] https://git.eff.org/?p=cryptolog.git;a=summary
On Fri, Jul 11, 2014 at 8:45 AM, Chris Steipp csteipp@wikimedia.org wrote:
On Friday, July 11, 2014, Daniel Kinzler daniel@brightbyte.de wrote:
Am 11.07.2014 17:19, schrieb Tyler Romeo:
Most likely, we would encrypt the IP with AES or something using a configuration-based secret key. That way checkusers can still reverse
the
hash back into normal IP addresses without having to store the mapping
in the
database.
There are two problems with this, I think.
- No forward secrecy. If that key is ever leaked, all IPs become
"plain".
And it will be, sooner or later. This would probably not be obvious, so this feature would instill a false sense of security.
This is probably the biggest issue. Even if we hmac it, it's trivial to brute force the entire ipv4 (and with intelligent assumptions about generation, most of the ipv6) range in seconds, if the key was ever known.
- No range blocks. It's often quite useful to be able to block a range
of
IPs. This is an important tool in the fight against spammers, taking it away would be a problem.
Range blocks, I imagine, would continue working the same way they do. Someone would have to identify the correct range (which is very difficult when administrators can't see IP's), but on submission, we have the IP address to check against the blocks. (Unless someone proposes to store block ranges as hashes, that would definitely get rid of range blocks).
-- daniel
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
but maybe browser and preferences fingerprinting would be more effective anyway, since: tor.
Probably not as effective as straight up blocking tor as we do now? :P (Although seriously - I would love if we didn't block tor like we do now. However you can't abuse the site with tor when you can't use tor at all)
I'm somewhat doubtful about fingerprinting (Without doing any research on it, so I may be out of tune here). We have millions of users, mostly using commodity software. I'm doubtful we would be able to get a fingerprint specific enough to uniquely identify a single user. Not to mention that a sophisticated attacker would probably be able to easily modify their fingerprint, especially if the fingerprint criteria is open source [OTOH, a sophisticated attacker can get around an IP block too].
The cryptolog approach - This has the property that there's a specific time where all anon identifiers suddenly change (e.g. Midnight every day in the setup cryptolog uses). Having an arbitrary point in time where suddenly identifiers shift is probably an unwanted property. (Although maybe it doesn't matter that much in practice? Someone who actually deals with abuse on wiki would be better able to answer that).
I suppose a related approach could be something like *If this is first time IP edits (recently), make a (pseudo?) random salt for that IP, throw it in memcached with an expiry time of a week *Hash the IP with the salt *Next time IP edits, if salt can be accessed from memcached, use that, and update the expiry time so that it expires a week from this edit, otherwise start over with new salt.
This would have the property that if an IP is continuously editing, their identifier doesn't change, but if they stop editing for a week, then the identifier switches. Still has the downside that in order for someone to effectively make a range block they would have to have checkuser rights (Although perhaps one could make checkuser-lite right that just exposes IPs of anons, which normal admins get access to). Also it would be much harder for admins to notice patterns, such as if a specific subnet seems to be dealing out similar abuse, or if a specific IP has been blocked once a month for the last 2 years.
--bawolff
On 7/29/14, Adam Wight awight@wikimedia.org wrote:
++the EFF for more ideas, they are actively doing great work on so-called perfect forward secrecy.
There are simple things we could do to achieve a better balance between privacy and sockpantsing, such as cryptolog [1], in which IP addresses are hashed using a salt that changes every day. In theory, nobody can reverse the function to reveal the IP, but you can still correlate all of an address's edits for the day, week, or whatever, making CheckUser possible.
IP range blocking obviously needs to happen up-front, before the IP is mangled. I have no suggestions, but maybe browser and preferences fingerprinting would be more effective anyway, since: tor.
-Adam
[1] https://git.eff.org/?p=cryptolog.git;a=summary
On Fri, Jul 11, 2014 at 8:45 AM, Chris Steipp csteipp@wikimedia.org wrote:
On Friday, July 11, 2014, Daniel Kinzler daniel@brightbyte.de wrote:
Am 11.07.2014 17:19, schrieb Tyler Romeo:
Most likely, we would encrypt the IP with AES or something using a configuration-based secret key. That way checkusers can still reverse
the
hash back into normal IP addresses without having to store the mapping
in the
database.
There are two problems with this, I think.
- No forward secrecy. If that key is ever leaked, all IPs become
"plain".
And it will be, sooner or later. This would probably not be obvious, so this feature would instill a false sense of security.
This is probably the biggest issue. Even if we hmac it, it's trivial to brute force the entire ipv4 (and with intelligent assumptions about generation, most of the ipv6) range in seconds, if the key was ever known.
- No range blocks. It's often quite useful to be able to block a range
of
IPs. This is an important tool in the fight against spammers, taking it away would be a problem.
Range blocks, I imagine, would continue working the same way they do. Someone would have to identify the correct range (which is very difficult when administrators can't see IP's), but on submission, we have the IP address to check against the blocks. (Unless someone proposes to store block ranges as hashes, that would definitely get rid of range blocks).
-- daniel
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Friday, July 11, 2014, Risker risker.wp@gmail.com wrote:
This is one of those perennial proposals that never quite seems to take off; I can remember having some version of this discussion back in 2008, and I know that some of our earliest edits show a partially obscured IP address, not the whole thing. It might require Brion or Tim or someone else of that length of experience to explain the original thinking.
As I recall, UseModWiki (the perl-based wiki software we used before switching to a custom solution which evolved into MediaWiki) obscured the last octet of the IP address, which still left you with enough information in most cases to track down an ISP or school/business/govt institution. I think UseMod also exposed the IP addresses of logged-in users, but the way logins worked were very different and it was possible to set your name to someone else's name or some such oddities...
I'm not sure offhand if there was explicit discussion of switching to not obscuring the last octet in the PHP software/nascent MediaWiki... But this was back in 2001 when the internet was a little younger and everybody was spewing their IP addresses all over their email and newsgroup posts too. Folks are a lot more paranoid about that today.
In general I favor migrating away from publicly exposing IP addresses, but not sure to what exactly would be best... I kinda like the idea of an anonymous-but-consistent "proto-account" that can be transformed into a named login if desired, but it needs to be thought out in more detail to resolve potential difficulties.
-- brion
I kinda like the idea of an anonymous-but-consistent "proto-account" that can be transformed into a named login if desired, but it needs to be thought out in more detail to resolve potential difficulties.
One could automatically create a pseudo-account ("Anonymous #12345") upon first edit. And that account would always be authenticated automaticallly upon future edits coming from the same IP address. I don't think it should be allowed to turn those pseudo-accounts into proper accounts, though, they'd be marked as anonymous pseudo-accounts forever. Otherwise having a way to upgrade to a proper account while conserving edits which were potentially written by other people could get hairy, especially from a legal standpoint.
Maybe it's a cookie-based approach you had in mind? Where we automatically create an account tied to the user agent. That would mitigate the issue of converting a pseudo-account that might have been shared between several people to a proper account, but not completely get rid of it.
On Fri, Jul 11, 2014 at 11:45 AM, Brion Vibber bvibber@wikimedia.org wrote:
On Friday, July 11, 2014, Risker risker.wp@gmail.com wrote:
This is one of those perennial proposals that never quite seems to take off; I can remember having some version of this discussion back in 2008, and I know that some of our earliest edits show a partially obscured IP address, not the whole thing. It might require Brion or Tim or someone
else
of that length of experience to explain the original thinking.
As I recall, UseModWiki (the perl-based wiki software we used before switching to a custom solution which evolved into MediaWiki) obscured the last octet of the IP address, which still left you with enough information in most cases to track down an ISP or school/business/govt institution. I think UseMod also exposed the IP addresses of logged-in users, but the way logins worked were very different and it was possible to set your name to someone else's name or some such oddities...
I'm not sure offhand if there was explicit discussion of switching to not obscuring the last octet in the PHP software/nascent MediaWiki... But this was back in 2001 when the internet was a little younger and everybody was spewing their IP addresses all over their email and newsgroup posts too. Folks are a lot more paranoid about that today.
In general I favor migrating away from publicly exposing IP addresses, but not sure to what exactly would be best... I kinda like the idea of an anonymous-but-consistent "proto-account" that can be transformed into a named login if desired, but it needs to be thought out in more detail to resolve potential difficulties.
-- brion _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 11 July 2014 17:10, Gilles Dubuc gilles@wikimedia.org wrote:
Maybe it's a cookie-based approach you had in mind? Where we automatically create an account tied to the user agent. That would mitigate the issue of converting a pseudo-account that might have been shared between several people to a proper account, but not completely get rid of it.
I'd have thought the chain of events "go to a library computer, do some edits, decide to upgrade to a real account, do so, realise you've inadvertently swept up all the unsalubrious penis vandalism that has been made on that computer previously" would be unacceptably common.
--HM
On 07/11/2014 11:45 AM, Brion Vibber wrote:
As I recall, UseModWiki (the perl-based wiki software we used before switching to a custom solution which evolved into MediaWiki) obscured the last octet of the IP address, which still left you with enough information in most cases to track down an ISP or school/business/govt institution. I think UseMod also exposed the IP addresses of logged-in users, but the way logins worked were very different and it was possible to set your name to someone else's name or some such oddities...
Yeah, the main benefit to the current setup (which probably doesn't really require the last octet in most cases) is detecting casual abuse, which includes (but is not limited to) both blatant vandalism and conflict of interest edits. (People have lunch breaks, and I don't claim every edit from a organizational IP is a conflict of interest, but many true COI edits have been caught this way).
If we look into something like proto-accounts or hashing or such, it would be good to try to maintain this benefit (do the lookup on the server, and expose who the IP block belongs to?), but I don't know if it's possible to have it both ways.
Matt Flaschen
To my knowledge, there are currently six of these Twitter bots (Canada, Denmark, France, Sweden, UK, US). I have collected them in a Twitter list: https://twitter.com/palnatoke/lists/wikiedit
Please speak up if you notice more, so I can include them in the list, too.
Regards, Ole
On Fri, Jul 11, 2014 at 3:34 PM, Gilles Dubuc gilles@wikimedia.org wrote:
This interesting bot showed up on hackernews today: https://news.ycombinator.com/item?id=8018284
While in this instance the access to anonymous' editors IP addresses is definitely useful in terms of identifying edits with probable conflict of interest, it makes me wonder what the history is behind the fact that anonymous editors are identified by their IP addresses on WMF-hosted wikis.
IP addresses are closely guarded for registered users, why wouldn't anonymous users be identified by a hash of their IP address in order to protect their privacy as well? The exact same functionality of being able to see all edits by a given anonymous IP would still exist, the IP itself just wouldn't be publicly available, protected with the same access rights as registered users'.
The "use case" that makes me think of that is someone living in a totalitarian regime making a sensitive edit and forgetting that they're logged out. Or just being unaware that being anonymous on the wiki doesn't mean that their local authorities can figure out who they are based on IP address and time. Understanding that they're somewhat protected when logged in and not when logged out requires a certain level of technical understanding. The easy way out of this argument is to state that these users should be using Tor or something similar. But I still wonder why we have this double standard of protecting registered users' privacy in regards to IP addresses and not applying the same for anonymous users, when simple hashing would do the job. _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Fri, 11 Jul 2014, at 23:34, Gilles Dubuc wrote:
IP addresses are closely guarded for registered users, why wouldn't anonymous users be identified by a hash of their IP address in order to protect their privacy as well?
While I don't horribly mind some changes in the direction you're writing, I think that:
1) Privacy is defined as "The state of being free from unsanctioned intrusion". An IP, as a fundamental identifier, has as much to do with privacy as a car number you see on a street. (Anyone can look up a name by car number, in my area, which I expect to be common.)
Firefox folks are, iirc, considering providing IP-based links in the new tab with one of the next releases. These links would include local shops and restaurants. I've seen some argue that such decision goes against "privacy", but I think it's the wrong term.
2) There are other nicer things to enable for anonymous readers that would make their editing experience more efficient. Such things include enabling some preferences and features for these contributors, which may be useful to a group of people editing from one IP:
https://meta.wikimedia.org/wiki/Musings_about_unregistered_contributors#Exam...
Gryllida.
(a little off topic diversion)
On Tue, Jul 15, 2014 at 06:22:17PM +1000, Gryllida wrote:
An IP, as a fundamental identifier, has as much to do with privacy as a car number you see on a street. (Anyone can look up a name by car number, in my area, which I expect to be common.)
Actually numberplates were originally conceived as a privacy enhancing technology. The first numberplates had peoples' names on them, but that was considered too intrusive.
The CC BY-SA license, used on most WMF projects, requires /attribution/. Attribution for edits made by unregistered/unlogged users is done by the exclusive means of their IP address. By clicking the 'Save' button, they agreed to release their edits under CC BY-SA, and that their IP address would have been the only form of attribution of their changes to them. While we can assume that there aren't any collisions between hashes of IP addresses, and we could change the attribution requirements for new edits, hiding or modifying the way IP addresses /of unregistered users who edited before that change/ are shown would be a substantial CC BY-SA infringement, as would be a change of registered users' names without their consent and without public logs of that change.
Il 11/07/2014 15:34, Gilles Dubuc ha scritto:
This interesting bot showed up on hackernews today: https://news.ycombinator.com/item?id=8018284
While in this instance the access to anonymous' editors IP addresses is definitely useful in terms of identifying edits with probable conflict of interest, it makes me wonder what the history is behind the fact that anonymous editors are identified by their IP addresses on WMF-hosted wikis.
IP addresses are closely guarded for registered users, why wouldn't anonymous users be identified by a hash of their IP address in order to protect their privacy as well? The exact same functionality of being able to see all edits by a given anonymous IP would still exist, the IP itself just wouldn't be publicly available, protected with the same access rights as registered users'.
The "use case" that makes me think of that is someone living in a totalitarian regime making a sensitive edit and forgetting that they're logged out. Or just being unaware that being anonymous on the wiki doesn't mean that their local authorities can figure out who they are based on IP address and time. Understanding that they're somewhat protected when logged in and not when logged out requires a certain level of technical understanding. The easy way out of this argument is to state that these users should be using Tor or something similar. But I still wonder why we have this double standard of protecting registered users' privacy in regards to IP addresses and not applying the same for anonymous users, when simple hashing would do the job. _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 7/18/14, Ricordisamoa ricordisamoa@openmailbox.org wrote:
The CC BY-SA license, used on most WMF projects, requires /attribution/. Attribution for edits made by unregistered/unlogged users is done by the exclusive means of their IP address. By clicking the 'Save' button, they agreed to release their edits under CC BY-SA, and that their IP address would have been the only form of attribution of their changes to them. While we can assume that there aren't any collisions between hashes of IP addresses, and we could change the attribution requirements for new edits, hiding or modifying the way IP addresses /of unregistered users who edited before that change/ are shown would be a substantial CC BY-SA infringement, as would be a change of registered users' names without their consent and without public logs of that change.
Additionally, if we used the same hash function as for new edits, it would make it pretty trivial to figure out what most of the hashes are. I think its safe to say we wouldn't modify old edits. After all, you can still look at https://en.wikipedia.org/wiki/Special:Contributions/216.143.215.xxx despite us not using that scheme anymore.
--bawolff
wikitech-l@lists.wikimedia.org