Hi,
Thanks for pointing the confusion out. I didn't remember the picturesque wording in the API:Etiquette page :D
But I think that page is more about what MediaWiki can do in terms of rate-limiting, and indeed MediaWiki doesn't do rate-limiting on reads.
What the WMF infrastructure does, OTOH, is different. So maybe it's a good idea to add a {{Note}} at the top of the etiquette page clarifying that these are general MW-related rules, and that for the Foundation infrastructure people should refer to the Robot policy.
Do you think that would help?
Cheers,
Giuseppe
On Tue, Apr 8, 2025 at 8:38 PM Novem Linguae novemlinguae@gmail.com wrote:
Hi Giuseppe,
Thanks for updating the robots policy. I do see some overlap between https://wikitech.wikimedia.org/wiki/Robot_policy#Action_API_rules_(i.e._http...) and https://www.mediawiki.org/wiki/API:Etiquette, so it may be worth thinking about if one or both of those pages needs an update to keep everything in sync. For example API Etiquette doesn’t link to the Robot Policy.
Speaking anecdotally, I didn’t know the Robot Policy existed and I assumed API Etiquette was the canonical page for this kind of thing.
Hope this helps.
*Novem Linguae*
novemlinguae@gmail.com
*From:* Giuseppe Lavagetto glavagetto@wikimedia.org *Sent:* Tuesday, April 8, 2025 8:08 AM *To:* Wikimedia developers wikitech-l@lists.wikimedia.org *Subject:* [Wikitech-l] Updates to the Robot policy
Hi all,
I’ve updated our Robot Policy[0], which was vastly outdated, the main revision being from 2009.
The new policy isn’t more restrictive than the older one for general crawling of the site or the API; on the contrary we allow higher limits than previously stated. But the new policy clarifies a few points and adds quite a few systems not covered in the old policy… because they didn’t exist at the time.
My intention is to keep this page relevant, one that we update along as our infrastructure evolves, trying to direct more and more web spiders and high-volume scrapers to use the patterns and reduce their impact on the infrastructure.
This update is a part of a coordinated effort[1] to try to guarantee fairer use of our very limited hardware resources to our technical community and users, so we will progressively start enforcing these rules for non-community users[2] that currently violate these guidelines copiously.
If you have suggestions on how to improve the policy, please use the talk page to provide feedback.
Cheers,
Giuseppe
[0] https://wikitech.wikimedia.org/wiki/Robot_policy
[1] See the draft of the annual plan objective here: https://w.wiki/DkD4
[2] While the general guidelines of the policy apply to any user, the goal is not to place restrictions on our community, or any other research/community crawler whose behaviour is in line with the aforementioned guidelines. In fact, any bot running in toolsforge or cloud VPS is already part of an allow-list that excludes this traffic from rate limiting we apply at the CDN.
--
Giuseppe Lavagetto Principal Site Reliability Engineer, Wikimedia Foundation _______________________________________________ Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/