Hello all!
We are currently dealing with a bot overloading the Wikidata Query Service. This bot does not look actively malicious, but does create enough load to disrupt the service. As a stop gap measure, we had to deny access to all bots using python-request user agent.
As a reminder, any bot should use a user agent that allows to identify it [1]. If you have trouble accessing WDQS, please check that you are following those guidelines.
More information and a proper incident report will be communicated as soon as we are on top of things again.
Thanks for your understanding!
Guillaume
[1] https://meta.wikimedia.org/wiki/User-Agent_policy
Hello all!
We now have an incident report [1] describing in more detail this overload of Wikidata Query Service. The ban of python-request is still in effect and will remain so until we have a throttling solution in place for generic user agents.
Thanks all for your patience!
Guillaume
[1] https://wikitech.wikimedia.org/wiki/Incident_documentation/20190613-wdqs
On Thu, Jun 13, 2019 at 7:52 PM Guillaume Lederrey glederrey@wikimedia.org wrote:
Hello all!
We are currently dealing with a bot overloading the Wikidata Query Service. This bot does not look actively malicious, but does create enough load to disrupt the service. As a stop gap measure, we had to deny access to all bots using python-request user agent.
As a reminder, any bot should use a user agent that allows to identify it [1]. If you have trouble accessing WDQS, please check that you are following those guidelines.
More information and a proper incident report will be communicated as soon as we are on top of things again.
Thanks for your understanding!
Guillaume
[1] https://meta.wikimedia.org/wiki/User-Agent_policy
-- Guillaume Lederrey Engineering Manager, Search Platform Wikimedia Foundation UTC+2 / CEST
Hoi, Does this mean that the retrieval of data has priority over updates ? Thanks, GerardM
On Mon, 17 Jun 2019 at 14:52, Guillaume Lederrey glederrey@wikimedia.org wrote:
Hello all!
We now have an incident report [1] describing in more detail this overload of Wikidata Query Service. The ban of python-request is still in effect and will remain so until we have a throttling solution in place for generic user agents.
Thanks all for your patience!
Guillaume
[1] https://wikitech.wikimedia.org/wiki/Incident_documentation/20190613-wdqs
On Thu, Jun 13, 2019 at 7:52 PM Guillaume Lederrey glederrey@wikimedia.org wrote:
Hello all!
We are currently dealing with a bot overloading the Wikidata Query Service. This bot does not look actively malicious, but does create enough load to disrupt the service. As a stop gap measure, we had to deny access to all bots using python-request user agent.
As a reminder, any bot should use a user agent that allows to identify it [1]. If you have trouble accessing WDQS, please check that you are following those guidelines.
More information and a proper incident report will be communicated as soon as we are on top of things again.
Thanks for your understanding!
Guillaume
[1] https://meta.wikimedia.org/wiki/User-Agent_policy
-- Guillaume Lederrey Engineering Manager, Search Platform Wikimedia Foundation UTC+2 / CEST
-- Guillaume Lederrey Engineering Manager, Search Platform Wikimedia Foundation UTC+2 / CEST
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
No, there isn't any prioritization. Updates are guaranteed as they stay in the update queue if they could not be written, but both read and writes are impacted by resource saturation.
On Mon, 17 Jun 2019, 15:35 Gerard Meijssen, gerard.meijssen@gmail.com wrote:
Hoi, Does this mean that the retrieval of data has priority over updates ? Thanks, GerardM On Mon, 17 Jun 2019 at 14:52, Guillaume Lederrey glederrey@wikimedia.org wrote:
Hello all!
We now have an incident report [1] describing in more detail this overload of Wikidata Query Service. The ban of python-request is still in effect and will remain so until we have a throttling solution in place for generic user agents.
Thanks all for your patience!
Guillaume
[1] https://wikitech.wikimedia.org/wiki/Incident_documentation/20190613-wdqs
On Thu, Jun 13, 2019 at 7:52 PM Guillaume Lederrey glederrey@wikimedia.org wrote:
Hello all!
We are currently dealing with a bot overloading the Wikidata Query Service. This bot does not look actively malicious, but does create enough load to disrupt the service. As a stop gap measure, we had to deny access to all bots using python-request user agent.
As a reminder, any bot should use a user agent that allows to identify it [1]. If you have trouble accessing WDQS, please check that you are following those guidelines.
More information and a proper incident report will be communicated as soon as we are on top of things again.
Thanks for your understanding!
Guillaume
[1] https://meta.wikimedia.org/wiki/User-Agent_policy
-- Guillaume Lederrey Engineering Manager, Search Platform Wikimedia Foundation UTC+2 / CEST
-- Guillaume Lederrey Engineering Manager, Search Platform Wikimedia Foundation UTC+2 / CEST
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hi!
We are currently dealing with a bot overloading the Wikidata Query Service. This bot does not look actively malicious, but does create enough load to disrupt the service. As a stop gap measure, we had to deny access to all bots using python-request user agent.
As a reminder, any bot should use a user agent that allows to identify it [1]. If you have trouble accessing WDQS, please check that you are following those guidelines.
To add to this, we have had this trouble because two events that WDQS currently does not deal well with have coincided:
1. An edit bot that edited with 200+ edits per minute. This is too much. Over 60/m is really almost always too much. And also it would be a good thing to consider if your bots does multiple changes (e.g. adds multiple statements) doing it in one call instead of several, since WDQS currently will do an update on each change separately, and this may be expensive. We're looking into various improvements to this, but it is the state currently.
2. Several bots have been flooding the service query endpoint with requests. There is recently a growth in bots that a) completely ignore both regular limits and throttling hints b) do not have proper identifying user agent and c) use distributed hosts so our throttling system has a problem to deal with them automatically. We intend to crack down more and more on such clients, because they look a lot like DDOS and ruin the service experience for everyone.
I will write down more detailed rules probably a bit later, but so far these: https://www.mediawiki.org/wiki/Wikidata_Query_Service/Implementation#Usage_c... and additionally having distinct User-Agent if you're running a bot is a good idea.
And for people who are thinking it's a good idea to launch a max-requests-I-can-stuff-into-the-pipe bot, put it on several Amazon machines so that throttling has hard time detecting it, and then when throttling does detect it neglecting to check for a week that all the bot is doing is fetching 403s from the service and wasting everybody's time - please think again. If you want to do something non-trivial querying WDQS and limits get in the way - please talk to us (and if you know somebody who isn't reading this list but is considering wiring a bot interfacing with WDQS - please educate them and refer them for help, we really prefer to help than to ban). Otherwise, we'd be forced to put more limitations on it that will affect everyone.
Hoi, I make use of the SourceMD environment, it is well behaved allows for throttling and when I have multiple jobs it only runs one at a time. I do understand that my jobs are put on hold when the situation warrants it, I even put them myself on hold when I think about it.
When someone else puts my job on hold, I cannot release them at a better time and I now have seven jobs doing nothing. A new job progresses normally. The point is that management is ok but given that what I do is well behaved, I expect my jobs to run and when held to be released at a later time. When I cannot depend on jobs to finish, my work is not finished and I do not know if I should run more jobs and what jobs to get the data to a finished state. Thanks, GerardM
On Tue, 18 Jun 2019 at 06:35, Stas Malyshev smalyshev@wikimedia.org wrote:
Hi!
We are currently dealing with a bot overloading the Wikidata Query Service. This bot does not look actively malicious, but does create enough load to disrupt the service. As a stop gap measure, we had to deny access to all bots using python-request user agent.
As a reminder, any bot should use a user agent that allows to identify it [1]. If you have trouble accessing WDQS, please check that you are following those guidelines.
To add to this, we have had this trouble because two events that WDQS currently does not deal well with have coincided:
- An edit bot that edited with 200+ edits per minute. This is too much.
Over 60/m is really almost always too much. And also it would be a good thing to consider if your bots does multiple changes (e.g. adds multiple statements) doing it in one call instead of several, since WDQS currently will do an update on each change separately, and this may be expensive. We're looking into various improvements to this, but it is the state currently.
- Several bots have been flooding the service query endpoint with
requests. There is recently a growth in bots that a) completely ignore both regular limits and throttling hints b) do not have proper identifying user agent and c) use distributed hosts so our throttling system has a problem to deal with them automatically. We intend to crack down more and more on such clients, because they look a lot like DDOS and ruin the service experience for everyone.
I will write down more detailed rules probably a bit later, but so far these:
https://www.mediawiki.org/wiki/Wikidata_Query_Service/Implementation#Usage_c... and additionally having distinct User-Agent if you're running a bot is a good idea.
And for people who are thinking it's a good idea to launch a max-requests-I-can-stuff-into-the-pipe bot, put it on several Amazon machines so that throttling has hard time detecting it, and then when throttling does detect it neglecting to check for a week that all the bot is doing is fetching 403s from the service and wasting everybody's time - please think again. If you want to do something non-trivial querying WDQS and limits get in the way - please talk to us (and if you know somebody who isn't reading this list but is considering wiring a bot interfacing with WDQS - please educate them and refer them for help, we really prefer to help than to ban). Otherwise, we'd be forced to put more limitations on it that will affect everyone.
-- Stas Malyshev smalyshev@wikimedia.org
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hello all,
Since we recently received several messages about tools unable to access the Query Service, I wanted to remind you about this email from Guillaume, and especially this part:
"As a reminder, any bot should use a user agent that allows to identify it: https://meta.wikimedia.org/wiki/User-Agent_policy."
If you're a tool builder and encountering issues with WDQS at the moment, please check that your tool is compliant with those guidelines.
Thanks for your understanding, Léa
---------- Forwarded message --------- From: Guillaume Lederrey glederrey@wikimedia.org Date: Thu, 13 Jun 2019 at 19:53 Subject: [Wikidata] Overload of query.wikidata.org To: Discussion list for the Wikidata project. wikidata@lists.wikimedia.org
Hello all!
We are currently dealing with a bot overloading the Wikidata Query Service. This bot does not look actively malicious, but does create enough load to disrupt the service. As a stop gap measure, we had to deny access to all bots using python-request user agent.
As a reminder, any bot should use a user agent that allows to identify it [1]. If you have trouble accessing WDQS, please check that you are following those guidelines.
More information and a proper incident report will be communicated as soon as we are on top of things again.
Thanks for your understanding!
Guillaume
[1] https://meta.wikimedia.org/wiki/User-Agent_policy
Python tools can also use the nifty toolforge library https://wikitech.wikimedia.org/wiki/User:Legoktm/toolforge_library to generate a suitable user agent automatically.
Cheers, Lucas
Am Do., 4. Juli 2019 um 15:54 Uhr schrieb Léa Lacroix < lea.lacroix@wikimedia.de>:
Hello all,
Since we recently received several messages about tools unable to access the Query Service, I wanted to remind you about this email from Guillaume, and especially this part:
"As a reminder, any bot should use a user agent that allows to identify it: https://meta.wikimedia.org/wiki/User-Agent_policy."
If you're a tool builder and encountering issues with WDQS at the moment, please check that your tool is compliant with those guidelines.
Thanks for your understanding, Léa
---------- Forwarded message --------- From: Guillaume Lederrey glederrey@wikimedia.org Date: Thu, 13 Jun 2019 at 19:53 Subject: [Wikidata] Overload of query.wikidata.org To: Discussion list for the Wikidata project. < wikidata@lists.wikimedia.org>
Hello all!
We are currently dealing with a bot overloading the Wikidata Query Service. This bot does not look actively malicious, but does create enough load to disrupt the service. As a stop gap measure, we had to deny access to all bots using python-request user agent.
As a reminder, any bot should use a user agent that allows to identify it [1]. If you have trouble accessing WDQS, please check that you are following those guidelines.
More information and a proper incident report will be communicated as soon as we are on top of things again.
Thanks for your understanding!
Guillaume
[1] https://meta.wikimedia.org/wiki/User-Agent_policy
-- Guillaume Lederrey Engineering Manager, Search Platform Wikimedia Foundation UTC+2 / CEST
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- Léa Lacroix Project Manager Community Communication for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207. _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Thank you very much, Léa and Lucas! It is something that I have to apply to my bot and the nifty toolforge library seems very useful to simplify this issue.
Regards, Iván
On 4/7/19 15:32, Lucas Werkmeister wrote:
Python tools can also use the nifty toolforge library https://wikitech.wikimedia.org/wiki/User:Legoktm/toolforge_library to generate a suitable user agent automatically.
Cheers, Lucas
Am Do., 4. Juli 2019 um 15:54 Uhr schrieb Léa Lacroix <lea.lacroix@wikimedia.de mailto:lea.lacroix@wikimedia.de>:
Hello all, Since we recently received several messages about tools unable to access the Query Service, I wanted to remind you about this email from Guillaume, and especially this part: "As a reminder, any bot should use a user agent that allows to identify it: https://meta.wikimedia.org/wiki/User-Agent_policy." If you're a tool builder and encountering issues with WDQS at the moment, please check that your tool is compliant with those guidelines. Thanks for your understanding, Léa ---------- Forwarded message --------- From: *Guillaume Lederrey* <glederrey@wikimedia.org <mailto:glederrey@wikimedia.org>> Date: Thu, 13 Jun 2019 at 19:53 Subject: [Wikidata] Overload of query.wikidata.org <http://query.wikidata.org> To: Discussion list for the Wikidata project. <wikidata@lists.wikimedia.org <mailto:wikidata@lists.wikimedia.org>> Hello all! We are currently dealing with a bot overloading the Wikidata Query Service. This bot does not look actively malicious, but does create enough load to disrupt the service. As a stop gap measure, we had to deny access to all bots using python-request user agent. As a reminder, any bot should use a user agent that allows to identify it [1]. If you have trouble accessing WDQS, please check that you are following those guidelines. More information and a proper incident report will be communicated as soon as we are on top of things again. Thanks for your understanding! Guillaume [1] https://meta.wikimedia.org/wiki/User-Agent_policy -- Guillaume Lederrey Engineering Manager, Search Platform Wikimedia Foundation UTC+2 / CEST _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata -- Léa Lacroix Project Manager Community Communication for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de <http://www.wikimedia.de> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207. _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata
-- Lucas Werkmeister (he/er) Full Stack Developer
Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin Phone: +49 (0)30 219 158 26-0 https://wikimedia.de
Imagine a world in which every single human being can freely share in the sum of all knowledge. Help us to achieve our vision! https://spenden.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
If you're a tool builder and encountering issues with WDQS at the moment, please check that your tool is compliant with those guidelines.
I like and I am using the WDQS - code generator parts. ( [image: image.png] )
And as I see - 2 Python related code can be generated - but I don't see any "user agent" in the generated codes.
Is it possible to add an example "user agent" to the generated code ? ( ~ default is the logged user name or my IP address )
from SPARQLWrapper import SPARQLWrapper, JSON endpoint_url = "https://query.wikidata.org/sparql" sparql_user_agent = "myWikidataUserName" # see User-Agent policy query = """#Cats .... }"""
thanks in advance, Imre
Léa Lacroix lea.lacroix@wikimedia.de ezt írta (időpont: 2019. júl. 4., Cs, 15:53):
Hello all,
Since we recently received several messages about tools unable to access the Query Service, I wanted to remind you about this email from Guillaume, and especially this part:
"As a reminder, any bot should use a user agent that allows to identify it: https://meta.wikimedia.org/wiki/User-Agent_policy."
If you're a tool builder and encountering issues with WDQS at the moment, please check that your tool is compliant with those guidelines.
Thanks for your understanding, Léa
---------- Forwarded message --------- From: Guillaume Lederrey glederrey@wikimedia.org Date: Thu, 13 Jun 2019 at 19:53 Subject: [Wikidata] Overload of query.wikidata.org To: Discussion list for the Wikidata project. < wikidata@lists.wikimedia.org>
Hello all!
We are currently dealing with a bot overloading the Wikidata Query Service. This bot does not look actively malicious, but does create enough load to disrupt the service. As a stop gap measure, we had to deny access to all bots using python-request user agent.
As a reminder, any bot should use a user agent that allows to identify it [1]. If you have trouble accessing WDQS, please check that you are following those guidelines.
More information and a proper incident report will be communicated as soon as we are on top of things again.
Thanks for your understanding!
Guillaume
[1] https://meta.wikimedia.org/wiki/User-Agent_policy
-- Guillaume Lederrey Engineering Manager, Search Platform Wikimedia Foundation UTC+2 / CEST
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- Léa Lacroix Project Manager Community Communication for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207. _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Yes, that would be T226709 https://phabricator.wikimedia.org/T226709.
Cheers, Lucas
On 04.07.19 19:10, Imre Samu wrote:
If you're a tool builder and encountering issues with WDQS at the
moment,
please check that your tool is compliant with those guidelines.
I like and I am using the WDQS - code generator parts. ( image.png )
And as I see - 2 Python related code can be generated - but I don't see any "user agent" in the generated codes.
Is it possible to add an example "user agent" to the generated code ? ( ~ default is the logged user name or my IP address )
from SPARQLWrapper import SPARQLWrapper, JSON endpoint_url = "https://query.wikidata.org/sparql" sparql_user_agent = "myWikidataUserName" # see User-Agent policy query = """#Cats .... }"""
thanks in advance, Imre
Léa Lacroix <lea.lacroix@wikimedia.de mailto:lea.lacroix@wikimedia.de> ezt írta (időpont: 2019. júl. 4., Cs, 15:53):
Hello all, Since we recently received several messages about tools unable to access the Query Service, I wanted to remind you about this email from Guillaume, and especially this part: "As a reminder, any bot should use a user agent that allows to identify it: https://meta.wikimedia.org/wiki/User-Agent_policy." If you're a tool builder and encountering issues with WDQS at the moment, please check that your tool is compliant with those guidelines. Thanks for your understanding, Léa ---------- Forwarded message --------- From: *Guillaume Lederrey* <glederrey@wikimedia.org <mailto:glederrey@wikimedia.org>> Date: Thu, 13 Jun 2019 at 19:53 Subject: [Wikidata] Overload of query.wikidata.org <http://query.wikidata.org> To: Discussion list for the Wikidata project. <wikidata@lists.wikimedia.org <mailto:wikidata@lists.wikimedia.org>> Hello all! We are currently dealing with a bot overloading the Wikidata Query Service. This bot does not look actively malicious, but does create enough load to disrupt the service. As a stop gap measure, we had to deny access to all bots using python-request user agent. As a reminder, any bot should use a user agent that allows to identify it [1]. If you have trouble accessing WDQS, please check that you are following those guidelines. More information and a proper incident report will be communicated as soon as we are on top of things again. Thanks for your understanding! Guillaume [1] https://meta.wikimedia.org/wiki/User-Agent_policy -- Guillaume Lederrey Engineering Manager, Search Platform Wikimedia Foundation UTC+2 / CEST _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata -- Léa Lacroix Project Manager Community Communication for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de <http://www.wikimedia.de> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207. _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata