Summary: It's no longer practical for developers to look up the IP addresses of logged in users, so I want to implement a method which will make it much easier. Comments? ------------------------------------------------------------------------
Developers used to be able to look up the IP addresses of logged in users, by searching the apache logs. However over the past year, due to the increased complexity of our cluster setup and the increased traffic, this has been getting more and more difficult. For example, it took me 3 or 4 hours to determine the identities of several sock puppets when asked to do so by the arbitration committee in relation to the Wik case. The basic difficulties are:
1. Clock lag. The time in the squid logs is not necessarily the same as the time in the database. Under heavy load, the difference can change from edit to edit by 30s or so. 2. Edit conflicts and previews are indistinguishable from saves 3. Large amount of data makes searching slow 4. Recent edits, less than 1-1.5 days old, have logs spread across 3 squid servers
The first two items combine to make it difficult to assign IP addresses to users, if the users are involved in a rapid edit war.
We had some difficulty recently merging and splitting the squid logs. A large amount of recent log data was lost, when they almost filled up the disk on zwinger. With this and the points listed above, I'm inclined to declare the task of determining the IP address of a logged in user impractical. It could potentially take hours to check if a given user is a sock puppet, and even if I spent hours, I couldn't guarantee that I could give any useful information even if they were using the same IP address.
The reason we don't log IP addresses in a readable format is for privacy. The idea is to discourage "casual snooping" by developers. Perhaps this was appropriate 6 months ago, but the current setup makes any kind of IP lookup impractical.
I'm of the opinion that checking the identity of sock puppets or trolls is important for preserving the mental health of our regular contributors. Hence I am proposing to store IP addresses of logged-in users in the recentchanges table. This table has one entry per edit, and older entries are automatically deleted. It is not available for public download. This will make it easy for developers to match a username to an IP address, for anyone who has edited the wiki recently.
Since this constitutes a weakening of the effective level of privacy for contributors, it may be that some people will be opposed to this feature. That's why I'm writing this post -- to find out what everyone thinks of this. Is determining the identity of trolls important? Or would you prefer to have IP addresses unlogged? Please speak up.
-- Tim Starling
Tim wrote:
I think there are three possible circumstances to consider here.
1: The "pure" anonymous wiki. The current arrangement works just fine. 2: The big wiki (i.e., Wikipedia). Practicality demands a change much as you suggest. More to the point, this is in fact a *non* change: it simply preserves the status quo - developers can examine IP addresses if need be - despite the changes brought by higher traffic and hardware additions. 3: Special-purpose wikis. May well be better served by a more permanent IP logging system.
I'm of the opinion that checking the identity of sock puppets or trolls is important for preserving the mental health of our regular contributors.
Absolutely. Full support on this.
I appreciate the difficulties involved, and I do not feel particularly concerned about my privacy on Wikipedia, but I understand that there are others who may feel differently.
One suggestion might be to do it only for flagged users. All users banned banned for more than 24-hours would be flaqgged, and a special database of their edit history from that moment on (including their AC appeals correspondence) would be created. If there is a complaint that User:New Troll is a reincarnation of User:Banned Troll then New Troll will also be flagged. Of course this will only be effective for New Troll's edits after that moment, but that's OK. If he makes no edits after he has been flagged it only means that he is observing the ban, which was the intent in the first place.
Ec
Tim Starling wrote:
Summary: It's no longer practical for developers to look up the IP addresses of logged in users, so I want to implement a method which will make it much easier. Comments?
Developers used to be able to look up the IP addresses of logged in users, by searching the apache logs. However over the past year, due to the increased complexity of our cluster setup and the increased traffic, this has been getting more and more difficult. For example, it took me 3 or 4 hours to determine the identities of several sock puppets when asked to do so by the arbitration committee in relation to the Wik case. The basic difficulties are:
- Clock lag. The time in the squid logs is not necessarily the same
as the time in the database. Under heavy load, the difference can change from edit to edit by 30s or so. 2. Edit conflicts and previews are indistinguishable from saves 3. Large amount of data makes searching slow 4. Recent edits, less than 1-1.5 days old, have logs spread across 3 squid servers
The first two items combine to make it difficult to assign IP addresses to users, if the users are involved in a rapid edit war.
We had some difficulty recently merging and splitting the squid logs. A large amount of recent log data was lost, when they almost filled up the disk on zwinger. With this and the points listed above, I'm inclined to declare the task of determining the IP address of a logged in user impractical. It could potentially take hours to check if a given user is a sock puppet, and even if I spent hours, I couldn't guarantee that I could give any useful information even if they were using the same IP address.
The reason we don't log IP addresses in a readable format is for privacy. The idea is to discourage "casual snooping" by developers. Perhaps this was appropriate 6 months ago, but the current setup makes any kind of IP lookup impractical.
I'm of the opinion that checking the identity of sock puppets or trolls is important for preserving the mental health of our regular contributors. Hence I am proposing to store IP addresses of logged-in users in the recentchanges table. This table has one entry per edit, and older entries are automatically deleted. It is not available for public download. This will make it easy for developers to match a username to an IP address, for anyone who has edited the wiki recently.
Since this constitutes a weakening of the effective level of privacy for contributors, it may be that some people will be opposed to this feature. That's why I'm writing this post -- to find out what everyone thinks of this. Is determining the identity of trolls important? Or would you prefer to have IP addresses unlogged? Please speak up.
On Monday 14 June 2004 08:50, Tim Starling wrote:
Summary: It's no longer practical for developers to look up the IP addresses of logged in users, so I want to implement a method which will make it much easier. Comments?
I think that IP address of every edit should be logged, but only available to the most trustworthy users. phpBB, for example, does this.
wikitech-l@lists.wikimedia.org