Wikidata Query Service and SPARQL endpoint usage limits

List overview All Threads
Download

newer

older

Presenting the new EveryPolitician...

Tool for consuming left-over data...

Deborah Tankersley

9 Aug 2017 9 Aug '17

6:20 p.m.

Due to some continued overuse of the Wikidata Query Service, and the SPARQL endpoint, we recently implemented a throttling feature to prevent users and bots from using too many resources on the servers.

Here are the new limits: * any user that is identified by IP and User Agent, can use the service for 60 seconds of query time per minute (burst at 120 seconds per minute) * any user query can generate up to 30 errors per minute (burst to 60 errors per minute)

Please let us know if there are questions or concerns with the new usage limits, as we are able to fine tune them if it is causing problems with reasonable use cases.

Thanks for your understanding!

-- deb tankersley irc: debt Product Manager, Discovery Wikimedia Foundation

Attachments:

attachment.htm (text/html — 2.8 KB)

Show replies by date

Gerard Meijssen

9 Aug 9 Aug

8:11 p.m.

New subject: Wikidata Query Service and SPARQL endpoint usage limits

Hoi, You ask for understanding but there is so little to go on. When you say overuse, what does that mean? Obviously it was to be expected that there is growth in the use of the Wikidata Query Service, and the SPARQL endpoint. They are recently added to Wikidata and they have proven to be really popular.

What is the expectation of the growth of the Wikidata Query Service, and the SPARQL endpoint? What preparations have been made to accommodate this? What will happen when this growth is 1000 times bigger than what you expect?

The reason why I am asking is that there are organisations that would like to make use of Wikidata. They will rely on Wikidata for their service and they are considering how they can be of service to our editors and readers.

As a statement of fact with a validity of say a week your message is fine. After this week it is really important to know and experience what you deliver to make the existing growth happen without resorting to the throttling the current use of our services. Thanks, GerardM

On 9 August 2017 at 18:20, Deborah Tankersley dtankersley@wikimedia.org wrote:

...

Due to some continued overuse of the Wikidata Query Service, and the SPARQL endpoint, we recently implemented a throttling feature to prevent users and bots from using too many resources on the servers.

Here are the new limits:

any user that is identified by IP and User Agent, can use the service

for 60 seconds of query time per minute (burst at 120 seconds per minute)

any user query can generate up to 30 errors per minute (burst to 60

errors per minute)

Please let us know if there are questions or concerns with the new usage limits, as we are able to fine tune them if it is causing problems with reasonable use cases.

Thanks for your understanding!

-- deb tankersley irc: debt Product Manager, Discovery Wikimedia Foundation

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Stas Malyshev

8:30 p.m.

Hi!

...

You ask for understanding but there is so little to go on. When you say overuse, what does that mean? Obviously it was to be expected that there

That means people (not frequently, but occasionally) were running tons of heavy queries that got service clogged and unusable for others. We want to ensure maximum availability for everyone, but the resources are finite. So, somebody who runs heavy queries will have to pace them so that others could use the service too. It doesn't block heavy queries outright, but it doesn't allow to produce so many of them that it affects service availability.

...

What is the expectation of the growth of the Wikidata Query Service, and the SPARQL endpoint? What preparations have been made to accommodate this? What will happen when this growth is 1000 times bigger than what you expect?

Frankly, as of now, I do not know the full answer to that. But you are right, it is something that needs to be thought about, and we are thinking about it. However, this particular issue is not about that - this one is about ensuring that existing resources are fairly shared between the users. I don't know any service in existence that has infinite resources, and they all have usage limits, explicit or implicit. We are not the exception.

If your use case requires more resources than the default allocation allows, please talk to us and we can tune it or seek some alternative solution.

...

As a statement of fact with a validity of say a week your message is fine. After this week it is really important to know and experience what you deliver to make the existing growth happen without resorting to the throttling the current use of our services.

As I said, I don't think it is possible to run public service without at least *some* kind of limits. We are just making those limits explicit and allocated in a way that allows fair sharing of existing resources. If current limits look too strict, they are configurable and we can learn from experience and re-allocate them.

If it happens that current resources are not enough, we'll request more, but it is not the case now - we are now completely OK with regular workloads. Unfortunately - most probably due to mistakes in coding - occasionally we get clients that produce exceptionally heavy workloads that block the service for other users. Throttling is meant to reduce the effect of such workloads and enforce availability to all clients. It is not our purpose to block any legit workload, and if that happens, please tell us and we'll look into adjusting the limits.

Thanks,

-- Stas Malyshev smalyshev@wikimedia.org

2682

Age (days ago)

2682

Last active (days ago)

wikidata@lists.wikimedia.org

2 comments

3 participants

tags (0)

participants (3)

Deborah Tankersley
Gerard Meijssen
Stas Malyshev