That's a good point. I actually need a W* media dump now in my work (incl. usage + all captions), if you do too perhaps we should see how effectively we can compile one via a WMC tool. Alternately if WME can offer same for a fee I would be glad to pay that.
S.
🌍🌏🌎🌑
On Thu, Apr 10, 2025, 2:57 PM Federico Leva (Nemo) nemowiki@gmail.com wrote:
Il 08/04/25 18:08, Giuseppe Lavagetto ha scritto:
I’ve updated our Robot Policy[0], which was vastly outdated, the main revision being from 2009.
Thanks for working on an update! It seems there was a misalignment of expectations, which is in itself a problem to fix.
The new policy isn’t more restrictive than the older one for general crawling of the site or the API; on the contrary we allow higher limits than previously stated.
I find this hard to believe, considering this new sentence for upload.wikimedia.org: «Always keep a total concurrency of at most 2, and limit your total download speed to 25 Mbps (as measured over 10 second intervals).»
This is a ridiculously low limit. It's a speed which is easy to breach in casual browsing of Wikimedia Commons categories, let alone with any kind of media-related bots.
At the suggested speed, it would take over 150 years for a person to download Wikimedia Commons files alone.
Needless to say, I breached such a threshold all the time when I compiled the https://archive.org/details/wikimediacommons collection. I typically aimed to saturate my upload bandwidth at all times when updating it, so I must have tried to download at about 100 Mbps, and it still took me months. (I used to run those scripts from my home in Milan, downloading the files to an external HDD. I stopped updating the collection after 2016 in part because I don't have FTTH in Helsinki, and the daily downloads were far too big for any storage in Wikimedia Cloud.)
I appreciate that some exceptions for Wikimedia Cloud bots were added after the discussion at https://phabricator.wikimedia.org/T391020#10716478 , but the fact remains that this comes off as a big change.
Il 09/04/25 19:10, AntiCompositeNumber ha scritto:
I'll just note that both API:Etiquette and the Robot Policy have been
incorporated by reference into the Terms of Use: https://foundation.wikimedia.org/wiki/Policy:Terms_of_Use/en#12._API_Terms
Undiscussed changes to the Terms of Use should be avoided.
This is a good point.
There are parts of the terms of use which assume the [[m:Right to fork]] is upheld by the availability of mirrored dumps. But the media tarballs have not been updated since 2012. Now in effect the WMF is explicitly saying that no mirrors are allowed for media, unless by gracious exemption to individual requesters.
Best, Federico _______________________________________________ Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/