On 25/06/05, Jake Waskett jake@waskett.org wrote:
Of those, only 20 have proxy or cache in the name.
Thoughts on how useful this sort of data would be, given the reasonably sized sample above?
Ok, so of 126 addresses, we have about 20 proxies. So about 16% of anonymous Wikipedias users are recognised as being behind a proxy, using this scheme. I don't know the answer to this question, but does anybody know roughly what proportion of web users go through a proxy server? Is it close to 16%? If so, we've got a pretty good scheme here.
I've spoken to a friend working at one of the larger ISPs; the answer is that it varies quite a bit. Only a minority of ISPs use them, but those tend to be large ISPs (the canonical example is all the AOL proxies you see around).
The upside is that most people are pragmatic, and call their proxy servers things like "proxy-43765". So it looks like this is a fairly effective way of identifying *most* proxies.
[He notes that there's also a "forwarded" header through most ISP caches, which contains the "original" originating IP; I don't know if this is accessible in this context or not, but it's useful to know it exists]
[I also did another test on a larger sample - this brought it down to ~10% having "proxy" or "cache" in them. I may do further as resources and tuits permit.]
Of course, a determined user could create a sub-domain with 'proxy' or 'cache' in the title, which would fool a simple software implementation, but perhaps not a human.
In reply to geni's comment, we're talking about a minor change to the software anyway, so all that's needed is to present the admin with this information at the time that he or she chooses to block a user.
Ideally, the software could give the admin a "no IP block" option, to exercise at his or her discretion (the software may already do this; I don't know).
I'm not an admin, so can't really comment how the process actually works. Can I just check I have the mechanism right here? User:XYZ goes and vandalises an article; an admin bans them; the system then automatically slaps a short ban on the associated IP address, to prevent them logging out and trying again?
It looks like in 80%+ of cases, telling people what the IP resolves to won't make any difference; it'll just be extra noise (with some occasional amusement, as when you notice a .gov domain). How does this sound -
a) Admin goes to block a user. System does a check on IP address, resolves it to 473a.residence.some.edu, doesn't flag it as a proxy, keeps quiet, IP blocked.
b) Admin goes to block a user. System does a check on IP address, resolves it to usercache.admin.some.edu, and flags it because it contains *cache*. Puts up a signal to the user - "The associated IP address identifies as USERCACHE.admin.some.edu, and blocking it may affect multiple people. Do you wish to block it anyway?". Admin makes the call.
This would leave us with the functionality we have now, but give an option for a simple override when it's likely the IP address isn't "personal". The fact that the display only comes up when it contains one of the keywords means that the privacy implications are low - and if you want it trimmed further, you can have it say that "...identifies as USERCACHE.admin.*.*" or the like. It also limits the amount of time wasted by admins, since it seems to be the case that without one of the keywords, in most cases, a cache/proxy server won't be apparent from the address alone.
Thoughts?