Live mirroring from Portuguese Wikipedia

I suppose we could try to record massive request rates from the same IP, but then you'd probably catch some large-scale proxies too. So not really. They just request a page like anyone else, and the request is superficially impossible to distinguish reliably from a request made by an ordinary user.

Steve Sanbeg

29 Jan 29 Jan

7:36 p.m.

On Sat, 27 Jan 2007 19:24:13 -0500, Simetrical wrote:

...

On 1/27/07, Luiz Augusto lugusto@gmail.com wrote:

...
Tiny question: isn't possible to block any non-Wikimedia site from live mirroring instead of blocking one-by-one?

I suppose we could try to record massive request rates from the same IP, but then you'd probably catch some large-scale proxies too. So not really. They just request a page like anyone else, and the request is superficially impossible to distinguish reliably from a request made by an ordinary user.

There is one difference, in that proxy traffic would be generated by humans who use the proxies, while mirror traffic would presumably come from the web crawlers that are indexing the mirrors, and should thus be much more predictable.

Tim Starling

8:07 p.m.

Steve Sanbeg wrote:

...

On Sat, 27 Jan 2007 19:24:13 -0500, Simetrical wrote:

...
On 1/27/07, Luiz Augusto lugusto@gmail.com wrote:

...
Tiny question: isn't possible to block any non-Wikimedia site from live mirroring instead of blocking one-by-one?

I suppose we could try to record massive request rates from the same IP, but then you'd probably catch some large-scale proxies too. So not really. They just request a page like anyone else, and the request is superficially impossible to distinguish reliably from a request made by an ordinary user.

There is one difference, in that proxy traffic would be generated by humans who use the proxies, while mirror traffic would presumably come from the web crawlers that are indexing the mirrors, and should thus be much more predictable.

Some remote loaders use a rotating open proxy list, for example tramadol.tfres.net (warning, site contains some nasty javascript). This makes them slow, but apparently fast enough for the search engines to index them. What can we do about that, besides DoS the website?

-- Tim Starling

Gregory Maxwell

8:25 p.m.

On 1/29/07, Tim Starling tstarling@wikimedia.org wrote:

...

Some remote loaders use a rotating open proxy list, for example tramadol.tfres.net (warning, site contains some nasty javascript). This makes them slow, but apparently fast enough for the search engines to index them. What can we do about that, besides DoS the website?

At some point a layer-8 approach becomes the most reasonable, although technical folks tend to shy away from interactions outside of their domain of expertise. ;)

Neil Harris

9:04 p.m.

Tim Starling wrote:

...

Steve Sanbeg wrote:

...
On Sat, 27 Jan 2007 19:24:13 -0500, Simetrical wrote:

...
On 1/27/07, Luiz Augusto lugusto@gmail.com wrote:

...
Tiny question: isn't possible to block any non-Wikimedia site from live mirroring instead of blocking one-by-one?

I suppose we could try to record massive request rates from the same IP, but then you'd probably catch some large-scale proxies too. So not really. They just request a page like anyone else, and the request is superficially impossible to distinguish reliably from a request made by an ordinary user.

There is one difference, in that proxy traffic would be generated by humans who use the proxies, while mirror traffic would presumably come from the web crawlers that are indexing the mirrors, and should thus be much more predictable.

Some remote loaders use a rotating open proxy list, for example tramadol.tfres.net (warning, site contains some nasty javascript). This makes them slow, but apparently fast enough for the search engines to index them. What can we do about that, besides DoS the website?

-- Tim Starling

That would actually be quite a good way of finding those actively exploited open proxies, by serving up fictitious (and steganographically identified) hook pages that don't get served by any legitimate distribution channel such as backups or live feeds, and then matching up fetching IPs to their appearance in live-loaders.

Details on request...

-- Neil

Platonides

9:55 p.m.

Tim Starling wrote:

...

Some remote loaders use a rotating open proxy list, for example tramadol.tfres.net (warning, site contains some nasty javascript).

2002 popups blocked :O

...

This makes them slow, but apparently fast enough for the search engines to index them. What can we do about that, besides DoS the website?

Make an script doing a special request to autoblock it. Run it daily/hourly. They *can't* update their proxy list as fast as a bot requesting pages. You can even use their proxy list to avoid them filtering the server ip.

Tim Starling

10:30 p.m.

Platonides wrote:

...

Tim Starling wrote:

...
Some remote loaders use a rotating open proxy list, for example tramadol.tfres.net (warning, site contains some nasty javascript).

2002 popups blocked :O

...
This makes them slow, but apparently fast enough for the search engines to index them. What can we do about that, besides DoS the website?

Make an script doing a special request to autoblock it. Run it daily/hourly. They *can't* update their proxy list as fast as a bot requesting pages. You can even use their proxy list to avoid them filtering the server ip.

That's all very well, until they use a proxy that we don't want to block. It's been our policy so far to block anonymous editing from open proxies and p2p anonymity networks, but to allow page views. If they were blocked at the squid level, neither would be possible.

Gregory Maxwell wrote:

...

At some point a layer-8 approach becomes the most reasonable, although technical folks tend to shy away from interactions outside of their domain of expertise. ;)

Their money apparently comes from an affiliate scheme with DriveCleaner, which apparently is a scam where the "free version" of a purported spyware/virus checker reports a threat, and recommends that the user buy the "full version" of the software in order to remove it. It's been known to Symantec since June 2006, and what have the authorities done in that time to stop it? You can still go to drivecleaner.com and pick up a copy. There's a lack of international coordination, a lack of political will and a lack of funding.

And if they can't stop a blatant scam, defrauding people of money every day, so what makes you think they could be bothered to stop a borderline abuse of service?

As for vigilante action, we know from anti-spam activities that a technical attack could well be met by with revenge in kind, and we know that we could easily find ourselves outgunned. I don't really want to spend my time dealing with multi-gigabit DDoS attacks against Wikipedia once every week or two.

Civil action may be the only remaining option, but of course it is expensive. The hosting providers, ISPs and banks will defend their clients' right to privacy every step of the way.

Welcome to the Internet.

-- Tim Starling

Steve Sanbeg

10:51 p.m.

On Mon, 29 Jan 2007 21:30:01 +0000, Tim Starling wrote:

...

Platonides wrote:

...
Tim Starling wrote:

...
Some remote loaders use a rotating open proxy list, for example tramadol.tfres.net (warning, site contains some nasty javascript).

2002 popups blocked :O

...
This makes them slow, but apparently fast enough for the search engines to index them. What can we do about that, besides DoS the website?

Make an script doing a special request to autoblock it. Run it daily/hourly. They *can't* update their proxy list as fast as a bot requesting pages. You can even use their proxy list to avoid them filtering the server ip.

That's all very well, until they use a proxy that we don't want to block. It's been our policy so far to block anonymous editing from open proxies and p2p anonymity networks, but to allow page views. If they were blocked at the squid level, neither would be possible.

Gregory Maxwell wrote:

...
At some point a layer-8 approach becomes the most reasonable, although technical folks tend to shy away from interactions outside of their domain of expertise. ;)

Their money apparently comes from an affiliate scheme with DriveCleaner, which apparently is a scam where the "free version" of a purported spyware/virus checker reports a threat, and recommends that the user buy the "full version" of the software in order to remove it. It's been known to Symantec since June 2006, and what have the authorities done in that time to stop it? You can still go to drivecleaner.com and pick up a copy. There's a lack of international coordination, a lack of political will and a lack of funding.

And if they can't stop a blatant scam, defrauding people of money every day, so what makes you think they could be bothered to stop a borderline abuse of service?

As for vigilante action, we know from anti-spam activities that a technical attack could well be met by with revenge in kind, and we know that we could easily find ourselves outgunned. I don't really want to spend my time dealing with multi-gigabit DDoS attacks against Wikipedia once every week or two.

Civil action may be the only remaining option, but of course it is expensive. The hosting providers, ISPs and banks will defend their clients' right to privacy every step of the way.

Welcome to the Internet.

-- Tim Starling

There is still the case of the disposable websites that just want an easy source of content, so they can quickly create a new content-rich site whenever the old one gets blocked.

Search engines may be more responsive than the government. Interestingly, it seems like only Google indexes this site; I couldn't find anything on Ask or Yahoo.

Platonides

30 Jan 30 Jan

10:21 p.m.

Tim Starling wrote:

...

That's all very well, until they use a proxy that we don't want to block. It's been our policy so far to block anonymous editing from open proxies and p2p anonymity networks, but to allow page views. If they were blocked at the squid level, neither would be possible.

Well, if we don't want to block it, it's easier not to block it ;) At least, if they use some of them, that proxy will be having legitimate queries, and balancing wmf servers load.

6513

Age (days ago)

6515

Last active (days ago)

wikitech-l@lists.wikimedia.org

9 comments

7 participants

tags (0)

participants (7)

Gregory Maxwell
Luiz Augusto
Neil Harris
Platonides
Simetrical
Steve Sanbeg
Tim Starling