Summary: It's no longer practical for developers to look up the IP
addresses of logged in users, so I want to implement a method which will
make it much easier. Comments?
------------------------------------------------------------------------
Developers used to be able to look up the IP addresses of logged in
users, by searching the apache logs. However over the past year, due to
the increased complexity of our cluster setup and the increased traffic,
this has been getting more and more difficult. For example, it took me 3
or 4 hours to determine the identities of several sock puppets when
asked to do so by the arbitration committee in relation to the Wik case.
The basic difficulties are:
1. Clock lag. The time in the squid logs is not necessarily the same as
the time in the database. Under heavy load, the difference can change
from edit to edit by 30s or so.
2. Edit conflicts and previews are indistinguishable from saves
3. Large amount of data makes searching slow
4. Recent edits, less than 1-1.5 days old, have logs spread across 3
squid servers
The first two items combine to make it difficult to assign IP addresses
to users, if the users are involved in a rapid edit war.
We had some difficulty recently merging and splitting the squid logs. A
large amount of recent log data was lost, when they almost filled up the
disk on zwinger. With this and the points listed above, I'm inclined to
declare the task of determining the IP address of a logged in user
impractical. It could potentially take hours to check if a given user is
a sock puppet, and even if I spent hours, I couldn't guarantee that I
could give any useful information even if they were using the same IP
address.
The reason we don't log IP addresses in a readable format is for
privacy. The idea is to discourage "casual snooping" by developers.
Perhaps this was appropriate 6 months ago, but the current setup makes
any kind of IP lookup impractical.
I'm of the opinion that checking the identity of sock puppets or trolls
is important for preserving the mental health of our regular
contributors. Hence I am proposing to store IP addresses of logged-in
users in the recentchanges table. This table has one entry per edit, and
older entries are automatically deleted. It is not available for public
download. This will make it easy for developers to match a username to
an IP address, for anyone who has edited the wiki recently.
Since this constitutes a weakening of the effective level of privacy for
contributors, it may be that some people will be opposed to this
feature. That's why I'm writing this post -- to find out what everyone
thinks of this. Is determining the identity of trolls important? Or
would you prefer to have IP addresses unlogged? Please speak up.
-- Tim Starling