(cc'ing the discovery mailing list, as that team owns both the
implementation and operation of search.)
I can partially answer this as one of the people responsible for search,
but I have to defer to others about API, bots, and such.
This would be a noticeable portion of our traffic, for reference:
action=opensearch (and generator variants): 1.5k RPS
action=query&list=search (and generator variants): 600 RPS
all api: 8k RPS (might be a bit higher, this is averaged over an hour)
opensearch is relatively cheap, the p95 to our search servers is ~30ms,
with p50 at 7ms. So 600 RPS of opensearch traffic wouldn't be too hard on
our search cluster. Using action=query is going to be too heavy, the full
text searches are computationally more expensive to serve.
Might I ask, which wiki(s) would you be querying against? opensearch
traffic is spread across our search cluster, but individual wikis only hit
portions of it. For example opensearch on en.wikipedia.org
is served by
~40% of the cluster, but zh.wikipedia.org
(chinese) is only served by ~13%.
If you are going to send heavy traffic to zh I might need to adjust those
numbers to spread the load to more servers (easy enough, just need to know).
Additionally, you mentioned descriptions and keywords. These would not be
provided directly by the opensearch api so you might be thinking of using
the generator version of it (action=query&generator=prefixsearch) to get
the results augmented
I'm not personally sure how expensive that is, someone else would have to
So, from a computational point of view and only with respect to the search
portion of our cluster, this seems plausible as long as we coordinate so
that we know the traffic is coming. Others will have to chime in about the
On Mon, Nov 14, 2016 at 4:40 PM, Eric Kuo <erickuo(a)yahoo-inc.com> wrote:
This is Eric from Yahoo. My team develops mobile apps for Taiwan and Hong
Kong users. We want to provide wiki description on keywords in our
contents, and we consider using MediaWiki API:OpenSearch and/or API:Query
to achieve this. Our estimated RPS is 900, and we will cache the query
result on our side. We would like to know if there is any concern with
respect to our RPS, and if so, what is the best practice.
Any comments and suggestions are welcome. Thank you for your time.
Mediawiki-api mailing list