Hi!
Forgive my ignorance. I don't know much about infrastructure of WDQS and how it works. I just want to mention how application servers do it. In appservers, there are dedicated nodes both for apache and the replica database. So if a bot overdo things in Wikipedia (which happens quite a lot), users won't feel anything but the other bots take the hit. Routing based on UA seems hard though while it's easy in mediawiki (if you hit api.php, we assume it's a bot).
We have two clusters - public and internal, with the latter serving only Wikimedia tasks thus isolated from outside traffic. However, we do not have a practical way right now to separate bot and non-bot traffic, and I don't think we now have resources for another cluster.
Routing based on UA seems hard though while it's easy in mediawiki
I don't think our current LB setup can route based on user agent. There could be a gateway that does that, but given that we don't have resources for another cluster for now, it's not too useful to spend time on developing something like that for now.
Even if we did separate browser and bot traffic, we'd still have the problem on bot cluster - most bots are benign and low-traffic, and we want to do our best to enable them to function smoothly. But for this to work, we need ways to weed out outliners that consume too much resources. In a way, the bucketing policy is a sort of version of what you described - if you use proper identification, you are judged on your traffic. If you use generic identification, you are bucketed with other generic agents, and thus may be denied if that bucket is full. This is not the best final solution, but experience so far shows it reduced the incidence of problems. Further ideas on how to improve it of course are welcome.