[Foundation-l] One week later and I am still blocked, nobody is doing a fucking thing

Mon Feb 19 02:37:50 UTC 2007

>From: Tim Starling <tstarling at wikimedia.org>
>Subject: Re: [Foundation-l] One week later and I am still blocked,
>	nobody is doing a fucking thing
>To: foundation-l at lists.wikimedia.org
>Message-ID: <eraa1e$9ht$1 at sea.gmane.org>
>Content-Type: text/plain; charset=ISO-8859-1
>
>
>Cascaded open proxies? Do you mean to say that you are blocking these
>secure ISP proxies because there is an open proxy behind them, on a
>customer computer?

Tim,

I think my scanning method is a bit more sophisticated than just noticing or
omitting a VIA or FORWARDED FOR header.

My scanner requests a page from a server (my own) via the suspect proxy.
Provisions are taken that the program recognises that page. The page returns
the complete header info. Following cases can happen:

* The header being returned includes the ip of the proxy (in REMOTE_HOST or
REMOTE_ADDR) and my own (in HTTP_VIA or HTTP_X_FOWARDED_FOR): transparent
proxy - only reactive blocking after vandalism
* The header includes the ip of the proxy: elite (or anonymous) proxy - same
as above
* The header includes a totally different ip (in REMOTE_HOST or
REMOTE_ADDR): high anonymous proxy.

The latter case means that the proxy, as chosen by the program, is followed
by at least one other proxy, being the one that shows up in the header (a
cascade more or less similar to TOR). What is between, I don't know.

The exit node ip is added to a seperate table of exit nodes. The program
retries building up a connection via the same (entrant) proxy to find wether
there are other exit nodes used by the same. That loop ends when
new==previous and the program continues with the next (actually many
instances run in parallel).

In this way some 1000 exit proxies have been found till now. These are fed
by over 5000 entrant open proxies (and whatever might be in between). The
champion is serving even >800 different entrant proxies, some 250 serve 2 or
more. The other way around: one entrant ip is feeding 32 exit servers (a
kind of mini-TOR) and 600 entrant IP's are feeding 2 or more exit servers
(status of yesterday, continuously growing).

The exit nodes, which might also be a normal open proxy, are today's issue.
Because of the impossibility to trace the originating ip and the possibly
changing exit nodes, these are pro-actively blocked.

After reading the XXF article, an improvement I could think of is testing
the VIA/FORWARDED of the exit server as well (which I don't care of at the
moment). And subsequently checking whether the entrant node (and/or VIA)
fits in a similar ip range as the exit node, or more loosely the same
country. Depending on the result the blocking of exit node would be hard or
soft, regardless wether it is a trusted proxy.

Just had a look to the table (in human readible format, ip's are stored as
longs) and sampled some records. Some combinations are within the same
submask, others are more weird (e.g. an US exit fed by ip's from US, CA, FR,
GB, SE, AE).

I'll send you info on trusted proxies which appear to have an open proxy
behind them, according my list.

Rgds Ronald