We have a long-standing problem with AOL, which is that they insist on being a single giant cluster of anonymizing proxies. Should we consider sending a cookie to AOL browsers which issue edit requests, to give them some kind of identity? This would, of course, mean some loss of privacy, but no more than that of any other IP user who is not behind an anonymizing proxy.
We could simply give them a random number, generated from a high-quality PRNG, and send it to them as a resonably long-lived cookie when they make their first edit request. This could then be used in lieu of an IP address. So, we would have three types of name:
* IP addresses (with addresses in dotted-quad or IP6 notation) for normal anons * Logged-in users (names starting with a capital letter) * Anons with cookies (dotless strings starting with a digit, say, generated from a _hash_ of the cookie we sent)
Note that we only display a hash of the cookie contents. This allows us to verify that the cookie is a genuine one sent by us, making spoofing very hard to do. This could be as simple as keeping a table of valid cookies; alternatively, some digital-signature scheme could be used to remove the need for a database lookup. This would also prevent mischevious users from impersonating AOL users by stealing their cookie.
All of this could be done with very little change to the code, if I understand correctly how it works. This would let us watch and block AOL users in much the same way as logged-in or IP users. The downside is that we would probably have to block AOL users without cookies set from editing to get the full benefit from this policy. We could easily send them a message "Dear AOL user: you currently have cookies disabled; you will need to enable cookies to edit this page. See here for more information...".
Benefits: * we can track AOL users for vandalism, at last * they can still browse without needing cookies set * no need for extra user interaction, if they have cookies set (which they do by default) * no other anons need to have cookies set at all * this scheme can be extended to other totally anonymising ISPs, if needed, including schools/colleges with proxy servers
Downside: * AOL users lose a bit of anonymity (but, hey, that's the upside, too!). * highly clueful AOL users could still work around this somewhat by technical means, but: re-read the first clause of this sentence -- and it will still deal with 99%.of the problem
Note that they are still _pseudonymous_, so there's no way of tracing through to their real identities save through the AOL abuse department, so we are still protecting their privacy.
So, this provides a nice tier between 'open' and 'blocked' that should go a long way towards preventing the need for indiscriminate range-blocks.
How about it?
-- N.
Neil Harris wrote:
We have a long-standing problem with AOL, which is that they insist on being a single giant cluster of anonymizing proxies. Should we consider sending a cookie to AOL browsers which issue edit requests, to give them some kind of identity?
Some comments to make this clearer: * we detect AOL users by IP address range, not browser user-agent * when I say "hash", I mean "keyed hash with a secret key"
We can also generate the cookie values nicely by generating them as
"seed" + hash(key1 + "seed" + key1)
where + is string concatenation, key1 is a secret key only our server knows, and the seed is a random string.
By the nature of the hash algorithm, it is hard to discover key1, but we (who know key1) can trivially verify whether it is a genuine cookie or not without doing a database lookup, which will scale nicely for the long-term future. Making the strings at least 512 bits (64 chars) long and using a good hash will effectively frustrate brute-force attacks on the key.
* we can re-use the blocking subsystem to do this, by giving it a variety of policies which can be enforced in the blocking table, eg "block" or "force-anon-cookie"
To generate the user pseudonyms, after checking that
hash(key1 + "seed" + key1) is the same as the second part of the cookie
we then generate
hash(key2 + "seed" + key2)
where key2 is another secret key, and truncate it to some reasonable length. Note that we don't need the authenticating hash from the cookie string, as it is simply dependent on the seed and adds no greater security once it has been verified.
A nicer way of showing the hashes of the cookie-strings might be as strings with a leading hash thus: #ekls9fjr5i39 (using base-36 encoding using [a-z][0-9]). For added revelatory power, we could even present it as #aol:ekls9fjr5i39, so that the originating ISP information was not lost.
Note that even though the actual keyed hash of the cookie string may be 512 bits long, we only need use the first 8 base-36 encoded characters as a user pseudonym to have less than one chance in 2 * 10^12 of an accidental collision, so using 12 as shown above is plenty.
So, the cost of doing this is mostly the cost of hashing, at two hashes per edit, one to verify the cookie and one to generate the pseudonym string. On a modern processor, each hash will take roughly 2 or 3 milliseconds to execute, which is trivial at Wikipedia's current edit rate of an edit every 2 or 3 seconds. (We could also memo-cache verified cookie -> name lookups in memcached, if we wanted to accelerate things further, thus eliminating the hashing entirely in most cases except for the first time, but that's probably too much complexity for too little gain).
-- N.
Before going to far-fetched solutions, we should first agree that there actually is a problem. Is there any actual evidence that there are AOL-users vandalising us in such amounts that it becomes bothersome from ever-changing IP addresses?
Andre Engels
Andre Engels wrote:
Before going to far-fetched solutions, we should first agree that there actually is a problem. Is there any actual evidence that there are AOL-users vandalising us in such amounts that it becomes bothersome from ever-changing IP addresses?
In a word: yes. I speak from personal experience here.
Because of the way that AOL runs their network, even a single user in a single session will appear to edit from a wide number of different IPs. More than this, they deliberately hide the IP addresses of their customers, by not revealing it in the proxy headers. Even if you go to HTTPS, you will still get a proxied connection. In effect, AOL is a giant anonymizing server farm for its users - and some of them know it.
AOL hosts quite a lot of good users, as well as a fair few relatively simple-minded vandals. However, it also hosts some seriously badly behaved users who respond very badly to feedback, as they have not only not been exposed to the idea of netiquette, but believe that once they've bought the AOL service, they own the Internet. They then behave in ways that would result in instant banning (edit warring, personal abuse, threats), were they from any other ISP. This then wastes a lot of admin time, as they struggle to clean up the mess generated by these by-now incorrigible users. I've seen this scenario played out many times.
This makes it much more difficult to track down their edits; you have to memorize the multiple AOL netblocks and grep recent changes for edits by IP addresses within them, then inspect edits one by one until to try to tell whether they are edits by the vandal, or by legitimate AOL users (whose edits may also be coming via the same IP). This means that vandal-wrangling against a determined AOL vandal is easily ten times harder than that against a normal anon IP.
The same is true of recent cases of edits from schools networks, where individual good editors are accompanied by vandals, and where the only alternatives are to check edits one-by-one or to block the proxy or whole netblock.
Implementing a scheme such as the one suggested (which is probably about 200 lines of code, assuming there is already native support for something like SHA-1 in PHP) would greatly reduce the amount of admin time and effort taken up in dealing with these problems.
We could also do similar forced cookie identification for networks containing persistent vandals on dial-up or cable networks with dynamic addressing.
-- N.
Neil Harris (usenet@tonal.clara.co.uk) [050122 06:06]:
Because of the way that AOL runs their network, even a single user in a single session will appear to edit from a wide number of different IPs. More than this, they deliberately hide the IP addresses of their customers, by not revealing it in the proxy headers. Even if you go to HTTPS, you will still get a proxied connection. In effect, AOL is a giant anonymizing server farm for its users - and some of them know it. AOL hosts quite a lot of good users, as well as a fair few relatively
The real problem is not its anonymising nature - we have several known special-case ISP proxies which are effectively unblockable to avoid collateral damage, including one current en: Arbitration Committee case - but its sheer size. AOL has 30% of all Internet users. That's one hell of a special case!
- d.
Andre Engels wrote:
Before going to far-fetched solutions, we should first agree that there actually is a problem. Is there any actual evidence that there are AOL-users vandalising us in such amounts that it becomes bothersome from ever-changing IP addresses?
Yes. I would say that the primary *personal* motivation that danny in en wikipedia had for "mentoring" Michael (a formerly notorious vandal who has been reformed) was that danny had an ongoing problem of finding himself blocked from editing due to people blocking Michael.
In this case, there has been a happy ending. But in the general case, legitimate editors who are AOL customers have an ongoing problem, which we deal with partly by being reluctant to block AOL ip numbers (which is not good of course for other reasons).
--Jimbo
wikitech-l@lists.wikimedia.org