Dear Wikimedia cloud support

What storage options does the Wikimedia cloud have? Can external developers (i.e. people not employed by the Wikimedia foundation) write to Cinder and/or Swift? Either from Toolforge or from Cloud VPS?

See below for context. (Actually, is this the right list, or should I ask elsewhere?)

For Wikidata QRank [https://qrank.toolforge.org/], I run a cronjob on the toolforge Kubernetes cluster. The cronjob mainly works on Wikidata dumps and anonymized Wikimedia access logs, which it reads from the NFS-mounted  /public/dumps/public directory. Currently, the job produces 40 internal files with a total size of 21G; these files need to be preserved between individual cronjob runs. (In a forthcoming version of the cronjob, this will grow to ~200 files with a total size of ~40G). For storing these intermediate files, Cinder might be a good solution. However, afaik Cinder isn’t available on Toolforge. Therefore, I’m currently storing the intermediate files in the account’s home directory on NFS. Presumably (but not sure, but speculating because I’ve seen NFS crumbling elsewhere) Wikimedia’s NFS server will be easily overloaded; in any case, Wikimedia’s NFS server seems to protect itself by throttling access. Because of the throttling, the cronjob is slow when working with its intermediate files.
* Will Cinder be made available to Toolforge users? When?
* Or should I move from Toolforge to Cloud-VPS, so I can store my intermediate files on Cinder?
* Or should I store my intermediate files in some object storage? Swift? Ceph? Something else?
* Is access to Cinder and Swift subject to the same throttling as NFS? Or will moving away from NFS increase the available I/O throughput?

The final output of the QRank system is a single file, currently ~100M in size but eventually growing to ~1G. When the cronjob has computed a fresh version of its output, it deletes any old outputs from previous runs (with the exception of the previous last two versions, which are kept around internally for debugging). Typical users are other bots or external pipelines who need a signal for prioritizing Wikidata entities, not end users on the web. Users typically check for updates with HTTP HEAD, or with conditional HTTP GET requests (using the standard If-Modified-Since and If-None-Match headers). Currently, I’m serving the output file with a custom-written HTTP server that runs as a web service on Toolforge behind Toolforge’s nginx instance. My server reads its content from the NFS-mounted home directory that’s getting populated by the cronjob. Now, it’s not exactly a great idea to serve large data files from NFS, but afaik it’s the only option available in the Wikimedia cloud, at least for Toolforge users. Of course I might be wrong.
* Should I move from Toolforge to Cloud-VPS, so I can serve my final output files from Cinder instead of NFS?
* Or should I rather store my final output files in some object storage? Swift? Ceph? Something else?
* Or is NFS just fine, even if the size of my data grows from 100M to 1G+?

The cronjob also uses ~5G of temporary files in /tmp, which it deletes towards the end of each run. The temp files are used for external sorting, so all access is sequential. I’m not sure where these temporary files currently sit when running on Toolforge Kubernetes. Given their volume, I presume that the tmpfs of the Kubernetes nodes will eventually run out of memory and then fall back to disk, but I wouldn’t know how to find this out. _If_ the backing store disk for tmpfs eventually ends up being mounted on NFS, it sounds wasteful for the poor NFS server;, especially since the files get deleted at job completion. In that case, I’d love to save common resources by using a local disk. (It doesn’t have to be an SSD; a spinning hard drive would be fine, given the sequential access pattern). But I’m not sure how to set this up on Toolforge Kubernetes, and I couldn’t find docs on wikitech. Actually, this might be a micro-optimization, so perhaps not worth the trouble. But then, I’d like to be nice with the precious shared resources in the Wikimedia cloud.

Sorry that I couldn’t find the answers online. While searching, I came across the following pointers:
https://wikitech.wikimedia.org/wiki/Ceph: This page has a warning that it’s probably “no longer true”. If the warning is correct, perhaps the page could be deleted entirely? Or maybe it could link to the current docs?
https://wikitech.wikimedia.org/wiki/Swift: This sounds perfect, but the page doesn’t mention how the files are getting populated, what the ACLs are managed, and if Wikimedia’s Swift cluster is even accessible to external developers.
https://wikitech.wikimedia.org/wiki/Media_storage: This seems current (I guess?), but the page doesn’t mention if/how external Toolforge/Cloud-VPS users may upload objects, or if this is just for the current users.

Thanks for your help, and happy holidays,

— Sascha, sascha@brawer.ch