I just wanted to give folks a heads up that in response to a few traffic storms in the Beta Cluster (deployment-prep CLoud VPS project) we have started using the very coarse protection of blocking IP ranges. These blocks are being applied at the Beta Cluster CDN edge where we have Varnish configuration that can discard traffic based on a list of CIDR ranges.
The ranges blocked at any point in time should be visible in the deployment-prep project's Hiera configuration that is logged in the cloud/instance-puppet.git repo. [0]
The hardly scientific process of choosing what to block so far has been done with processes like the one documented at https://phabricator.wikimedia.org/T392003. Hashar came up with a shell one-liner to count requests by IP address or IP address prefix depending on the regex provided. We then take the top addresses produced by that log filtering and perform a `whois` lookup to find the associated IP address allocation. The CIDR blocks associated with the allocation are then put into hiera config, a Puppet run is forced, and Varnish is restarted. Repeat as necessary to get to a reasonable rate of requests passing through Varnish to the backing MediaWiki instances where we are examining the logs.
If you feel that you have legitimate traffic for the Beta Cluster to handle that has gotten swept up in one of these blocks, please reach out by filing task on the #beta-cluster-infrastructure Phabricator board. [1]
If you think working to make this process of blocking easier or unnecessary sounds like a fun project I would love to chat more. Hit me up via email, libera.chat irc, or on-wiki with your ideas.
[0]: https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/refs/... [1]: https://phabricator.wikimedia.org/tag/beta-cluster-infrastructure/
Bryan
On Tue, Apr 15, 2025 at 2:27 PM Bryan Davis bd808@wikimedia.org wrote:
I just wanted to give folks a heads up that in response to a few traffic storms in the Beta Cluster (deployment-prep CLoud VPS project) we have started using the very coarse protection of blocking IP ranges. These blocks are being applied at the Beta Cluster CDN edge where we have Varnish configuration that can discard traffic based on a list of CIDR ranges.
The ranges blocked at any point in time should be visible in the deployment-prep project's Hiera configuration that is logged in the cloud/instance-puppet.git repo. [0]
The hardly scientific process of choosing what to block so far has been done with processes like the one documented at https://phabricator.wikimedia.org/T392003. Hashar came up with a shell one-liner to count requests by IP address or IP address prefix depending on the regex provided. We then take the top addresses produced by that log filtering and perform a `whois` lookup to find the associated IP address allocation. The CIDR blocks associated with the allocation are then put into hiera config, a Puppet run is forced, and Varnish is restarted. Repeat as necessary to get to a reasonable rate of requests passing through Varnish to the backing MediaWiki instances where we are examining the logs.
A week goes by and we find ourselves back in the same "beta crushed by bot traffic" place again. [2] I tried blocking selectively at first [3], but I was not making much progress in lowering the load. After noticing that a lot of the traffic was coming from ranges assigned to orgs in Brazil I tried blocking a lot of Class B networks (X.Y.0.0/16) that were on https://ipnetinfo.com/country/BR and showing traffic in the logs. [4] This helped a bit, but things were still looking pretty bad.
I got frustrated and decided to see if blocking Class A networks (X.0.0.0/8) would do anything. I wrote a delightfully horrible script that buckets the last 50,000 requests by Class A network and outputs a cut-and-paste ready list of all of them with more than 500 requests. [5] I blocked these IP ranges, waited to see what happened for a bit, and repeated a few times.
This seems to have worked so far, but does not make me very happy. The blocks are really wide and almost certain to sweep up legitimate traffic sooner or later if we keep doing things this way. We have some newer tools in use with the production networks that might make it easier for us to rate limit aggressively at the edge rather than applying outright blocks to large ranges.
If you feel that you have legitimate traffic for the Beta Cluster to handle that has gotten swept up in one of these blocks, please reach out by filing task on the #beta-cluster-infrastructure Phabricator board. [1]
If you think working to make this process of blocking easier or unnecessary sounds like a fun project I would love to chat more. Hit me up via email, libera.chat irc, or on-wiki with your ideas.
[2]: https://phabricator.wikimedia.org/T392534 [3]: https://phabricator.wikimedia.org/T392534#10763059 [4]: https://phabricator.wikimedia.org/T392534#10763134 [5]: https://phabricator.wikimedia.org/T392534#10763235
Bryan
Have you tried looking up associated IPs? IE all IPs by the same provider and blocking those? Have we looked at what kind of IPs are being used either collocation hosts, VPNs, or random private IPs?
On Wed, Apr 23, 2025 at 7:37 PM Bryan Davis bd808@wikimedia.org wrote:
On Tue, Apr 15, 2025 at 2:27 PM Bryan Davis bd808@wikimedia.org wrote:
I just wanted to give folks a heads up that in response to a few traffic storms in the Beta Cluster (deployment-prep CLoud VPS project) we have started using the very coarse protection of blocking IP ranges. These blocks are being applied at the Beta Cluster CDN edge where we have Varnish configuration that can discard traffic based on a list of CIDR ranges.
The ranges blocked at any point in time should be visible in the deployment-prep project's Hiera configuration that is logged in the cloud/instance-puppet.git repo. [0]
The hardly scientific process of choosing what to block so far has been done with processes like the one documented at https://phabricator.wikimedia.org/T392003. Hashar came up with a shell one-liner to count requests by IP address or IP address prefix depending on the regex provided. We then take the top addresses produced by that log filtering and perform a `whois` lookup to find the associated IP address allocation. The CIDR blocks associated with the allocation are then put into hiera config, a Puppet run is forced, and Varnish is restarted. Repeat as necessary to get to a reasonable rate of requests passing through Varnish to the backing MediaWiki instances where we are examining the logs.
A week goes by and we find ourselves back in the same "beta crushed by bot traffic" place again. [2] I tried blocking selectively at first [3], but I was not making much progress in lowering the load. After noticing that a lot of the traffic was coming from ranges assigned to orgs in Brazil I tried blocking a lot of Class B networks (X.Y.0.0/16) that were on https://ipnetinfo.com/country/BR and showing traffic in the logs. [4] This helped a bit, but things were still looking pretty bad.
I got frustrated and decided to see if blocking Class A networks (X.0.0.0/8) would do anything. I wrote a delightfully horrible script that buckets the last 50,000 requests by Class A network and outputs a cut-and-paste ready list of all of them with more than 500 requests. [5] I blocked these IP ranges, waited to see what happened for a bit, and repeated a few times.
This seems to have worked so far, but does not make me very happy. The blocks are really wide and almost certain to sweep up legitimate traffic sooner or later if we keep doing things this way. We have some newer tools in use with the production networks that might make it easier for us to rate limit aggressively at the edge rather than applying outright blocks to large ranges.
If you feel that you have legitimate traffic for the Beta Cluster to handle that has gotten swept up in one of these blocks, please reach out by filing task on the #beta-cluster-infrastructure Phabricator board. [1]
If you think working to make this process of blocking easier or unnecessary sounds like a fun project I would love to chat more. Hit me up via email, libera.chat irc, or on-wiki with your ideas.
[0]:
https://gerrit.wikimedia.org/r/plugins/gitiles/cloud/instance-puppet/+/refs/...
Bryan
Bryan Davis Wikimedia Foundation Principal Software Engineer Boise, ID USA [[m:User:BDavis_(WMF)]] irc: bd808 _______________________________________________ Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
On Wed, Apr 23, 2025 at 5:47 PM John phoenixoverride@gmail.com wrote:
Have you tried looking up associated IPs? IE all IPs by the same provider and blocking those?
Casually, yes. A better workflow for doing this is really needed to make it a consistently viable option.
Have we looked at what kind of IPs are being used either collocation hosts, VPNs, or random private IPs?
I think this is getting at what I vaguely described as "newer tools in use with the production networks" in my last message. https://wikitech.wikimedia.org/wiki/Requestctl has some interesting capabilities to store and use predefined ranges we source from elsewhere.
Bryan
The Brazil IP addresses are expected to be a bunch of compromised TV set-top boxes being used by an AI scraper. They are difficult to block, and being on a residential ISP makes collateral damage all but certain.
https://anubis.techaro.lol/ is currently being deployed by a number of other sites, small and large, from the Arch Wiki to UNESCO. It is MIT licensed, sits between a front proxy and the appserver, and uses a proof-of-work CAPTCHA to prevent bots. It is a blunt hammer, but it's probably better than IP blocking. There is some ability to allow acceptable bots: https://anubis.techaro.lol/docs/admin/policies/
https://git.gammaspectra.live/git/go-away is a similar project with more configuration available, but I haven't heard as many folks deploying it.
I don't like advocating for these masures. I'm not sure there are any other reasonable options for resource-limited projects.
On Wed, Apr 23, 2025, at 7:36 PM, Bryan Davis wrote:
On Tue, Apr 15, 2025 at 2:27 PM Bryan Davis bd808@wikimedia.org wrote:
I just wanted to give folks a heads up that in response to a few traffic storms in the Beta Cluster (deployment-prep CLoud VPS project) we have started using the very coarse protection of blocking IP ranges. These blocks are being applied at the Beta Cluster CDN edge where we have Varnish configuration that can discard traffic based on a list of CIDR ranges.
The ranges blocked at any point in time should be visible in the deployment-prep project's Hiera configuration that is logged in the cloud/instance-puppet.git repo. [0]
The hardly scientific process of choosing what to block so far has been done with processes like the one documented at https://phabricator.wikimedia.org/T392003. Hashar came up with a shell one-liner to count requests by IP address or IP address prefix depending on the regex provided. We then take the top addresses produced by that log filtering and perform a `whois` lookup to find the associated IP address allocation. The CIDR blocks associated with the allocation are then put into hiera config, a Puppet run is forced, and Varnish is restarted. Repeat as necessary to get to a reasonable rate of requests passing through Varnish to the backing MediaWiki instances where we are examining the logs.
A week goes by and we find ourselves back in the same "beta crushed by bot traffic" place again. [2] I tried blocking selectively at first [3], but I was not making much progress in lowering the load. After noticing that a lot of the traffic was coming from ranges assigned to orgs in Brazil I tried blocking a lot of Class B networks (X.Y.0.0/16) that were on https://ipnetinfo.com/country/BR and showing traffic in the logs. [4] This helped a bit, but things were still looking pretty bad.
I got frustrated and decided to see if blocking Class A networks (X.0.0.0/8) would do anything. I wrote a delightfully horrible script that buckets the last 50,000 requests by Class A network and outputs a cut-and-paste ready list of all of them with more than 500 requests. [5] I blocked these IP ranges, waited to see what happened for a bit, and repeated a few times.
This seems to have worked so far, but does not make me very happy. The blocks are really wide and almost certain to sweep up legitimate traffic sooner or later if we keep doing things this way. We have some newer tools in use with the production networks that might make it easier for us to rate limit aggressively at the edge rather than applying outright blocks to large ranges.
If you feel that you have legitimate traffic for the Beta Cluster to handle that has gotten swept up in one of these blocks, please reach out by filing task on the #beta-cluster-infrastructure Phabricator board. [1]
If you think working to make this process of blocking easier or unnecessary sounds like a fun project I would love to chat more. Hit me up via email, libera.chat irc, or on-wiki with your ideas.
Bryan
Bryan Davis Wikimedia Foundation Principal Software Engineer Boise, ID USA [[m:User:BDavis_(WMF)]] irc: bd808 _______________________________________________ Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
On Wed, Apr 23, 2025 at 5:59 PM AntiCompositeNumber acn@anticomposite.net wrote:
https://anubis.techaro.lol/ is currently being deployed by a number of other sites, small and large, from the Arch Wiki to UNESCO. It is MIT licensed, sits between a front proxy and the appserver, and uses a proof-of-work CAPTCHA to prevent bots. It is a blunt hammer, but it's probably better than IP blocking. There is some ability to allow acceptable bots: https://anubis.techaro.lol/docs/admin/policies/
https://git.gammaspectra.live/git/go-away is a similar project with more configuration available, but I haven't heard as many folks deploying it.
I don't like advocating for these masures. I'm not sure there are any other reasonable options for resource-limited projects.
Anubis is on my list of potential tricks to try. I agree that proof of work proxies are not an ideal solution, but maybe they are slightly less terrible than outright blocks on 12% of the internet as I have done today.
Bryan
As an aside, can I say thank you for the suggestion in this public setting.
I know it's not nessesarily a MediaWiki specific topic, but, can we mention it on MW.org somewhere because I know a few MW site owners who've been getting absolutely hammered by the same problem, and like Bryan, don't want to end up blocking potential legitimate traffic or using third party services.
I've rolled Anubis out today on ShoutWiki and initial results are looking positive - even if the documentation left a bit to be desired.
-- Lewis Cawte
On Thu, 24 Apr 2025, 01:06 Bryan Davis, bd808@wikimedia.org wrote:
On Wed, Apr 23, 2025 at 5:59 PM AntiCompositeNumber acn@anticomposite.net wrote:
https://anubis.techaro.lol/ is currently being deployed by a number of
other sites, small and large, from the Arch Wiki to UNESCO. It is MIT licensed, sits between a front proxy and the appserver, and uses a proof-of-work CAPTCHA to prevent bots. It is a blunt hammer, but it's probably better than IP blocking. There is some ability to allow acceptable bots: https://anubis.techaro.lol/docs/admin/policies/
https://git.gammaspectra.live/git/go-away is a similar project with
more configuration available, but I haven't heard as many folks deploying it.
I don't like advocating for these masures. I'm not sure there are any other reasonable options for resource-limited
projects.
Anubis is on my list of potential tricks to try. I agree that proof of work proxies are not an ideal solution, but maybe they are slightly less terrible than outright blocks on 12% of the internet as I have done today.
Bryan
Bryan Davis Wikimedia Foundation Principal Software Engineer Boise, ID USA [[m:User:BDavis_(WMF)]] irc: bd808 _______________________________________________ Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
Some of us in the MediaWiki Stakeholders’ Group created a Matrix/Element chat as a joint task force to discuss this specific topic further.
You can join with this link: https://matrix.to/#/%23WART%3Amatrix.org
We were going to discuss options like Anubis, HAProxy, Cloudflare, etc., but let’s all pool together our experiences so we don’t have to figure these things out on our own.
aapkaa jeffrey
On Thu, Apr 24, 2025 at 12:13 PM Lewis Cawte via Wikitech-l < wikitech-l@lists.wikimedia.org> wrote:
As an aside, can I say thank you for the suggestion in this public setting.
I know it's not nessesarily a MediaWiki specific topic, but, can we mention it on MW.org somewhere because I know a few MW site owners who've been getting absolutely hammered by the same problem, and like Bryan, don't want to end up blocking potential legitimate traffic or using third party services.
I've rolled Anubis out today on ShoutWiki and initial results are looking positive - even if the documentation left a bit to be desired.
-- Lewis Cawte
On Thu, 24 Apr 2025, 01:06 Bryan Davis, bd808@wikimedia.org wrote:
On Wed, Apr 23, 2025 at 5:59 PM AntiCompositeNumber acn@anticomposite.net wrote:
https://anubis.techaro.lol/ is currently being deployed by a number of
other sites, small and large, from the Arch Wiki to UNESCO. It is MIT licensed, sits between a front proxy and the appserver, and uses a proof-of-work CAPTCHA to prevent bots. It is a blunt hammer, but it's probably better than IP blocking. There is some ability to allow acceptable bots: https://anubis.techaro.lol/docs/admin/policies/
https://git.gammaspectra.live/git/go-away is a similar project with
more configuration available, but I haven't heard as many folks deploying it.
I don't like advocating for these masures. I'm not sure there are any other reasonable options for
resource-limited projects.
Anubis is on my list of potential tricks to try. I agree that proof of work proxies are not an ideal solution, but maybe they are slightly less terrible than outright blocks on 12% of the internet as I have done today.
Bryan
Bryan Davis Wikimedia Foundation Principal Software Engineer Boise, ID USA [[m:User:BDavis_(WMF)]] irc: bd808 _______________________________________________ Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
On Thu, Apr 24, 2025 at 1:29 PM Jeffrey Wang j@mywikis.com wrote:
Some of us in the MediaWiki Stakeholders’ Group created a Matrix/Element chat as a joint task force to discuss this specific topic further.
You can join with this link: https://matrix.to/#/%23WART%3Amatrix.org
We were going to discuss options like Anubis, HAProxy, Cloudflare, etc., but let’s all pool together our experiences so we don’t have to figure these things out on our own.
aapkaa jeffrey
On Thu, Apr 24, 2025 at 12:13 PM Lewis Cawte via Wikitech-l wikitech-l@lists.wikimedia.org wrote:
As an aside, can I say thank you for the suggestion in this public setting.
I know it's not nessesarily a MediaWiki specific topic, but, can we mention it on MW.org somewhere because I know a few MW site owners who've been getting absolutely hammered by the same problem, and like Bryan, don't want to end up blocking potential legitimate traffic or using third party services.
I've rolled Anubis out today on ShoutWiki and initial results are looking positive - even if the documentation left a bit to be desired.
I think it would be great to see some project on mediawiki.org that tries to collect options and think about best practices for defending MediaWiki deployments against aggressive traffic patterns. I imagine most of the ideas will end up being generally applicable to various HTTP services as MediaWiki itself is a relatively heavy stack to use for edge traffic defense. That is no reason not to think about things from the perspective of our technical community however where there tend to be a lot of shared values about protecting privacy and preserving open solutions as much as possible.
Bryan
Collateral damage isn't really a concern with the Beta Cluster, it's meant specifically for Wikimedia developers.
Maybe we could just put the whole thing behind idp.wikimedia.org? I think that can be done at the Apache level. Making it more different from production is not great, but having to spend constant human effort on IP blocks is less great.
On Thu, Apr 24, 2025 at 2:48 PM Gergo Tisza gtisza@wikimedia.org wrote:
Collateral damage isn't really a concern with the Beta Cluster, it's meant specifically for Wikimedia developers.
Maybe we could just put the whole thing behind idp.wikimedia.org? I think that can be done at the Apache level. Making it more different from production is not great, but having to spend constant human effort on IP blocks is less great.
I do agree that collateral damage in this testing/proving environment is not as impactful as preventing people from accessing our open knowledge projects. To my knowledge there is no automated authorization workflow for idp.wikimedia.org (or any other deployment of the CAS-SSO service). [0] I don't think it is reasonable at this point to block all automated uses of Beta Cluster.
[0]: https://phabricator.wikimedia.org/T377372
Bryan
Collateral damage isn't really a concern with the Beta Cluster, it's
meant specifically for Wikimedia developers.
And quality assurance. My team's QA engineer uses a VPN, and he had to request to be unblocked.
Note that this exercise of IP range whack-a-mole is nothing new to VPS tools. I maintain two VPS projects (XTools, WS Export) that constantly suffer from aggressive web crawlers and disruptive automation. We've been doing the manual IP block thing for years :(
I suggest the IP denylist be applied to all of WMCS < https://phabricator.wikimedia.org/T226688%3E. We're able to get by for XTools and WS Export because XFF headers were specially enabled for this counter-abuse purpose. However most VPS tools and all of Toolforge don't have such luxury. If there are bots pounding away, there's no means to stop them currently (unless they are good bots with an identifiable UA). Even if we could detect them, it seems better to reduce the repetitive effort and give all of WMCS the same treatment.
I'll also note that some farms of web crawlers can't feasibly be blocked whack-a-mole style. This is the situation we're currently dealing with over at https://phabricator.wikimedia.org/T384711#10759017.
~ MA
On Thu, Apr 24, 2025 at 4:48 PM Gergo Tisza gtisza@wikimedia.org wrote:
Collateral damage isn't really a concern with the Beta Cluster, it's meant specifically for Wikimedia developers.
Maybe we could just put the whole thing behind idp.wikimedia.org? I think that can be done at the Apache level. Making it more different from production is not great, but having to spend constant human effort on IP blocks is less great. _______________________________________________ Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
On Thu, Apr 24, 2025 at 3:16 PM MusikAnimal musikanimal@gmail.com wrote:
Note that this exercise of IP range whack-a-mole is nothing new to VPS tools. I maintain two VPS projects (XTools, WS Export) that constantly suffer from aggressive web crawlers and disruptive automation. We've been doing the manual IP block thing for years :(
An interesting aspect of both of those Cloud VPS projects is that they are directly linked to from a number of content wikis. I think this greatly extends their exposure to crawler traffic in general.
I suggest the IP denylist be applied to all of WMCS https://phabricator.wikimedia.org/T226688. We're able to get by for XTools and WS Export because XFF headers were specially enabled for this counter-abuse purpose. However most VPS tools and all of Toolforge don't have such luxury. If there are bots pounding away, there's no means to stop them currently (unless they are good bots with an identifiable UA). Even if we could detect them, it seems better to reduce the repetitive effort and give all of WMCS the same treatment.
You are talking about three completely separate HTTP edges at this point. They all live on the same core Cloud VPS infrastructure, but there is no common HTTPS connection between the *.toolforge.org proxy, the *.wmcloud.org proxy, and the Beta Cluster CDN. The first two share some nginx stack configuration, but in practice are very different deployments with independent public IP addresses. The third is fundamentally a partial clone of the production wiki's CDN edge although scaled down and missing some newer components that nobody has yet done the work to introduce.
I'll also note that some farms of web crawlers can't feasibly be blocked whack-a-mole style. This is the situation we're currently dealing with over at https://phabricator.wikimedia.org/T384711#10759017.
Truly distributed attack patterns (bot net traffic) are really hard to defend against with just an Apache2 instance. This is actually a place where someone could try experimenting with some filtering proxy like Anubis [0], go-away [1], or openappsec [2]. Having some experience with these tools could then lead us into better discussions about deploying them more widely or making them easier to use in targeted projects.
[0]: https://anubis.techaro.lol/ [1]: https://git.gammaspectra.live/git/go-away [2]: https://github.com/openappsec/openappsec
Bryan
I started a page [0] to track the problem and some of the emerging solutions. I'm still transferring information from a private wiki over, but it would be great to get others to document what they've been using. I'll start expanding on the tools I know about to give more information about the tradeoffs when using them.
Thanks for all the great info so far! I know this has been consuming a lot of web admins time the last few months.
[0]: https://www.mediawiki.org/wiki/Handling_web_crawlers
On Thu, Apr 24, 2025 at 2:39 PM Bryan Davis bd808@wikimedia.org wrote:
On Thu, Apr 24, 2025 at 3:16 PM MusikAnimal musikanimal@gmail.com wrote:
Note that this exercise of IP range whack-a-mole is nothing new to VPS
tools. I maintain two VPS projects (XTools, WS Export) that constantly suffer from aggressive web crawlers and disruptive automation. We've been doing the manual IP block thing for years :(
An interesting aspect of both of those Cloud VPS projects is that they are directly linked to from a number of content wikis. I think this greatly extends their exposure to crawler traffic in general.
I suggest the IP denylist be applied to all of WMCS <
https://phabricator.wikimedia.org/T226688%3E. We're able to get by for XTools and WS Export because XFF headers were specially enabled for this counter-abuse purpose. However most VPS tools and all of Toolforge don't have such luxury. If there are bots pounding away, there's no means to stop them currently (unless they are good bots with an identifiable UA). Even if we could detect them, it seems better to reduce the repetitive effort and give all of WMCS the same treatment.
You are talking about three completely separate HTTP edges at this point. They all live on the same core Cloud VPS infrastructure, but there is no common HTTPS connection between the *.toolforge.org proxy, the *.wmcloud.org proxy, and the Beta Cluster CDN. The first two share some nginx stack configuration, but in practice are very different deployments with independent public IP addresses. The third is fundamentally a partial clone of the production wiki's CDN edge although scaled down and missing some newer components that nobody has yet done the work to introduce.
I'll also note that some farms of web crawlers can't feasibly be blocked
whack-a-mole style. This is the situation we're currently dealing with over at https://phabricator.wikimedia.org/T384711#10759017.
Truly distributed attack patterns (bot net traffic) are really hard to defend against with just an Apache2 instance. This is actually a place where someone could try experimenting with some filtering proxy like Anubis [0], go-away [1], or openappsec [2]. Having some experience with these tools could then lead us into better discussions about deploying them more widely or making them easier to use in targeted projects.
Bryan
Bryan Davis Wikimedia Foundation Principal Software Engineer Boise, ID USA [[m:User:BDavis_(WMF)]] irc: bd808 _______________________________________________ Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
wikitech-l@lists.wikimedia.org