Hi
There is a topic I wanted to talk about here for a long time and for which I never have achieved to take the time to write something. A few recent events have been a healthy remember that I should present one our most recent and most useful tool: Zimfarm.
The Zimfarm is the online tool which is in charge of building and publishing all our ZIM files. After years of creating ZIM files by launching scrapers more or less manually, we had to automatise the process to just be able to scale the operations, ie. publishing more and more often ZIM files.
The effort started 3 years ago with the support of the WMF but we use it only since Spring 2019 in production. The tool is now perfectly running and we fully rely on it now. If we can publish an update of all our wikis one time a month, this is thanks to this piece of software too.
The Zimfarm is a half-decentralized solution which has a central node (called "dispatcher") in charge of orchestrating the work to do and multiple decentralized nodes (called "workers") which run the scraping tasks.
The dispatcher provides an API to manage the ZIM recipes and tasks, have a look to https://api.farm.openzim.org/. We have setup a Web frontend on this API to allow easy mgmt through a Web browser. For a better transparency, even anonymous users can have a look and monitor what is going on. Look at https://farm.openzim.org/.
One important point is that, like all the rest of our infrastructure, the whole system is Dockerized. Which means, this is really easy to install a Zimfarm worker and we invite anybody having a spare server to help us to provide offline snapshots of the best of the Web. The procedure is documented and a few volunteers have already joined in. Look at https://farm.openzim.org/about for more details.
The development is fully transparent at https://github.com/openzim/zimfarm. We have a few things which are on the roadmap which would welcome volunteer Python developers. Look at the good first issues and make your first PR! https://github.com/openzim/zimfarm/issues?q=is%3Aissue+is%3Aopen+label%3A%22...
Regards Emmanuel