Re: [Wikidata] Wikidata Query Service User-Agent requirements for script users

23 Jul 2019


      Hi!
...
Forgive my ignorance. I don't know much about infrastructure of WDQS and
how it works. I just want to mention how application servers do it. In
appservers, there are dedicated nodes both for apache and the replica
database. So if a bot overdo things in Wikipedia (which happens quite a
lot), users won't feel anything but the other bots take the hit. Routing
based on UA seems hard though while it's easy in mediawiki (if you hit
api.php, we assume it's a bot).
We have two clusters - public and internal, with the latter serving only
Wikimedia tasks thus isolated from outside traffic. However, we do not
have a practical way right now to separate bot and non-bot traffic, and
I don't think we now have resources for another cluster.
...
Routing based on UA seems hard though while it's easy in mediawiki
I don't think our current LB setup can route based on user agent. There
could be a gateway that does that, but given that we don't have
resources for another cluster for now, it's not too useful to spend time
on developing something like that for now.
Even if we did separate browser and bot traffic, we'd still have the
problem on bot cluster - most bots are benign and low-traffic, and we
want to do our best to enable them to function smoothly. But for this to
work, we need ways to weed out outliners that consume too much
resources. In a way, the bucketing policy is a sort of version of what
you described - if you use proper identification, you are judged on your
traffic. If you use generic identification, you are bucketed with other
generic agents, and thus may be denied if that bucket is full. This is
not the best final solution, but experience so far shows it reduced the
incidence of problems. Further ideas on how to improve it of course are
welcome.
-- 
Stas Malyshev
smalyshev@wikimedia.org

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikidata] Wikidata Query Service User-Agent requirements for script users