👏 👏 👏
Sounds good for things like OCR scans, and book generation, the latter being pushed to external wmf cloud resources.
Thanks for your work in this space. Sounds as if this will give extensions a lot more scope for interesting things.
-- billinghurst
------ Original Message ------ From: "Kunal Mehta" legoktm@debian.org To: "wikitech-l" wikitech-l@lists.wikimedia.org Sent: 6/10/2021 9:46:13 AM Subject: [Wikitech-l] Score, Kubernetes and switching to Shellbox
Hi everyone,
tl;dr: External shell outs are now run via Shellbox. Any deployed code needs to use Shellbox/BoxedCommand, and documentation is available to help migrate.
To safely re-enable Score (LilyPond) on Wikimedia wikis, we developed Shellbox, a way to run shell commands in a remote, isolated container. This is (hopefully) a stronger level of isolation than we previously had with firejail, since it's relying on Linux containers and Kubernetes to do the isolation. At the same time, this helps us in moving towards running MediaWiki on Kubernetes, as we don't want to include all these external commands inside the MediaWiki container. For the most part, any new shelling out to external commands needs to be done via Shellbox.
A lot of the design and rationale behind Shellbox is captured in the RfC: https://phabricator.wikimedia.org/T260330.
In Wikimedia production, so far Score, Timeline, SyntaxHighlight and Wikidata constraint regex checking are all using Shellbox. Details about that and links to dashboards are available at https://wikitech.wikimedia.org/wiki/Shellbox. The main things that are left are media-handling code that extracts metadata: DjVu, PdfHandler and PagedTiffHandler, which is tracked at https://phabricator.wikimedia.org/T289228, and videoscaling (TimedMediaHandler).
Some work has to be done in MediaWiki to make code compatible with Shellbox, specifically switching to "BoxedCommand", which now has its own documentation page: https://www.mediawiki.org/wiki/Manual:BoxedCommand. BoxedCommand works transparently whether you have a separate Shellbox service set up or not. This is the preferred way to write new shellouts going forward, though Shell::command() isn't officially deprecated yet. So far all shellouts that are used in Wikimedia production have already been converted except for TimedMediaHandler.
Looking forward, I think this also gives us a lot of flexibility in using more external commands in the future. First, we're less tied to whatever OS version MediaWiki is running on, as long as it can be built/shipped in a container, we can use it. And secondly, it's probably OK if external commands aren't super well behaved (e.g. use too much memory) since they're no longer sharing the same resources as an appserver (this shouldn't be interpreted as a free pass for super inefficient stuff of course).
I tried to keep this summary short, and am intending to write a longer blog post that explains some more history in detail. But if you have any questions or something isn't clear, please ask!
-- Kunal