The on-wiki version of this newsletter can be found here:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2023-10-25
--
A few weeks ago we opened up Wikifunctions for some community members – but
have yet to open it up to wider contribution and usage. Thanks to the
brilliant input of some community members, most notably Lockal
<https://ru.wikipedia.org/wiki/%D0%A3%D1%87%D0%B0%D1%81%D1%82%D0%BD%D0%B8%D0…>,
we were made aware of some potential security issues before they could be
exploited. This led us to limit function calls to logged-in users while we
implemented some security mitigations.
<https://meta.wikimedia.org/wiki/File:Wikifunctions_-_Top-level_architectura…>Top-level
architecture of Wikifunctions
Our original plan was to rely on a multi-layered approach to security,
where we split up the backend into two parts, one being the orchestrator,
which collects all necessary data, and the other the evaluator, which
actually runs the code written by Wikifunctions editors. The evaluator
would be running in a Docker virtual machine with very limited rights. But,
as we opened up Wikifunctions, issues arose that, although not yet
exploitable themselves, might become so in the future.
We partnered with the SRE and Security teams in response to the new
concerns, and together we brainstormed ideas and hammered out potential
solutions to add further layers of protection. The idea is to provide
additional security in depth. One major component of our revised security
strategy required a complete rewrite of the evaluator encapsulation
service: instead of running user-written code in language runtimes directly
in Docker, we will run them on top of a WebAssembly runtime inside the
container.
What is WebAssembly <https://en.wikipedia.org/wiki/WebAssembly>?
WebAssembly, or "WASM" for short, is a low level programming language,
meaning it is comparably simple and doesn’t directly support higher levels
of abstractions. There are many different runtimes for WebAssembly, the
most prominent of which are basically all modern browsers (thus the “Web”
in the name). As with many other low level programming languages, it can
also serve as a compilation target for other programming languages, meaning
that you can take, for example, code written in C or Rust and compile it to
WebAssembly. This allows programs that were written for the desktop to be
run in the browser. One example is the Jump-and-run game SuperTux
<https://supertux.semphris.com/play/>, which was originally written in C++,
and can now be run in the browser.
WebAssembly does not have to be run in the browser; it can also be run on a
server. In the last few years, a flurry of activity has created dozens of
runtimes. One advantage of WebAssembly is that the runtime that runs
WebAssembly is easy to control and limit; thus, translating code to
WebAssembly adds an additional layer of security.
As of this week, we have deployed the new version of the evaluator for
JavaScript. We will be monitoring how this change will affect the
performance and cost of running Wikifunctions. Note that the WebAssembly
runtime does not replace the other security measures, but is being added in
addition to the existing measures. If you inspect the "Details" of a
function run on JavaScript now, you'll see that it's run on QuickJS v0.5.0
inside WASM (specifically, on WasmEdge <https://wasmedge.org/>), rather
than Node v16.17.1. We are working on also switching the evaluator for
Python to one based on WebAssembly soon.
One previous decision has made things a bit more challenging, though: our
choice to start with JavaScript and Python. WebAssembly is geared towards
compiled programming languages such as C, Rust, or Go, whereas Python and
JavaScript are interpreted languages. Eventually, we found Python and
JavaScript interpreters that can be compiled to WebAssembly, and then these
compiled builds are used to run the actual Python and JavaScript code. We
live in interesting times.
In fact, the tooling around WASM for Python and JS is so novel and
bleeding-edge as to have caused some "fascinating" bugs during adoption. At
one point, we had got our Python executor running on WebAssembly, using
(among other things) a great tool called wasmtime <https://wasmtime.dev/>,
written by Bytecode Alliance <https://bytecodealliance.org/>. Our tests
were reliably green for a couple of weeks, even up to the day we decided to
switch our staging Python executor to use WASM. However, once our new
release reached the staging area, Python function calls mysteriously
failed. After debugging, we found that our call to the wasm command line
tool was the culprit. It turned out that the wasm runner we were using had
pushed a new major version, flagged as a breaking change, less than an hour
before we built the image for deploy. The fix for that issue was easy–we
simply re-specified that our code download and use the previous version of
the command line tool–but this demonstrates how fast-moving the world of
WASM can be.
Where will we go next? We will be monitoring the load that the new
architecture puts on our servers, to see if the system is sustainable.
There will be some change in the speed of evaluating functions, but we
expect that the change will be, overall, barely noticeable at all. We hope
that the additional layer of protection will hold up, but if you do find a
way past it, let us know
<https://www.mediawiki.org/wiki/Reporting_security_bugs>.
We think there is quite some room for improvement in terms of runtime
speed. WebAssembly runtimes have seen a whirlwind of development in the
last few years, and it seems that particularly for interpreted languages it
is still rife with opportunities. One way to improve the runtime
characteristics of Wikifunctions is to add support for languages that are
more natural fits for WebAssembly, such as Rust or C. Given the automatic
support for the fastest implementation, this might swiftly consolidate to
more efficient implementations. But compiled languages would also need a
slightly different architecture, as the compilation results would need to
be stored somehow. One interesting option would be to also push the
function evaluation to the user’s browser, since it contains a WebAssembly
runtime as well. But we would need to understand the consequences of that,
particularly for slower devices.
We used this change also as an opportunity to experimentally switch on the
right for everybody to run community-approved functions on Wikifunctions,
not just for logged-in users. As you can see, this change is buried deep in
this update, and it might be pulled back anytime again. We will monitor the
system to see how stable it is. We will keep you up-to-date in this
newsletter.
Thanks to Cory <https://meta.wikimedia.org/wiki/User:CMassaro_(WMF)> for
taking the lead on this project, James
<https://meta.wikimedia.org/wiki/User:Jdforrester_(WMF)> for taking it to
production, and the Security and SRE teams, who supported us so helpfully!
It is great to see it deployed, taking us a big step closer to opening up
Wikifunctions to everyone.
The on-wiki version of this newsletter can be found here:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2023-10-04
As this newsletter contains plenty of images, it can be easily misformatted
in Email. Please refer to the on-wiki version for easier reading in that
case.
--
Arguments made easier
As of today, referencing arguments has become considerably easier.
<https://meta.wikimedia.org/wiki/File:Wikifunctions_-_function_argument_inte…>What
does this mean?
<https://meta.wikimedia.org/wiki/File:Wikifunctions_-_function_argument_inte…>Figure
2: expanded composition for generating the verb form for the German second
person plural.
Every function has arguments.[1]
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2023-10-04#cite_…>
When
creating a composition to implement a function, you need to be able to
reference the arguments of the function. For example, let's look at the
function that generates the regular German verb form for the second person
plural <https://www.wikifunctions.org/wiki/Z11272>: it takes one argument,
the infinitive form of the verb, e.g. *“denken”* (to think), reduces it to
the stem of the word, e.g. *“denk”*, and adds the letter *“t”* to get e.g.
*“denkt”* (as in *“ihr denkt”*, “you think”). So the composition looks as
follows:
join strings <https://www.wikifunctions.org/wiki/Z10000>( stem German verb
<https://www.wikifunctions.org/wiki/Z11259>( Argument reference( infinitive
) ), “t” )
… or see Figure 2 for the expanded view.
In this composition, there is one function call embedded in another. The
inner function call, which returns the German stem, has an argument
reference to the *infinitive*. This means that the argument with which the
function is being called will be placed right in this place. Therefore, if
you call the function with the argument *“denken”*, it turns into this
function call:
join strings <https://www.wikifunctions.org/wiki/Z10000>( stem German verb
<https://www.wikifunctions.org/wiki/Z11259>( “denken” ), “t” )
Then, by evaluating the inner function call, you get:
join strings <https://www.wikifunctions.org/wiki/Z10000>( “denk”, “t” )
And this finally evaluates to the result, *“denkt”*.
Figure 3 shows the status of the composition at the point where everything
but the argument reference has been entered.
At this point, you have to expand the infinitive argument for the function
call for stem German verb <https://www.wikifunctions.org/wiki/Z11259>. You
do this by clicking the sideways-chevron (">") next to the label
“infinitive”. This expands the fields, showing the type and mode and the
(still empty) value (see Figure 4).
Now you switch the input mode from a literal String to an Argument
reference (Figure 5). And this is where the previous workflow diverges from
the new workflow.
<https://meta.wikimedia.org/wiki/File:Wikifunctions_-_function_argument_inte…>
Figure 3: everything but the argument reference is in place.
<https://meta.wikimedia.org/wiki/File:Wikifunctions_-_function_argument_inte…>
Figure 4: expanded *infinitive* field on the Function "stem German verb".
<https://meta.wikimedia.org/wiki/File:Wikifunctions_-_function_argument_inte…>
Figure 5: selecting between the different modes for *infinitive*
How did it work?
<https://meta.wikimedia.org/wiki/File:Wikifunctions_-_function_argument_inte…>
Figure 6: how it was last week: Argument reference is chosen as the mode
and we have now a textfield waiting for the key ID.
<https://meta.wikimedia.org/wiki/File:Wikifunctions_-_function_argument_inte…>
Figure 7: the rather cryptic key ID filled out and ready to be published.
Figure 6 shows the previous status after selecting Argument reference. You
were presented with an empty field labeled “key id”, and the field expected
the editor to type in the key ID. What makes this worse is that the page
did not show you the key ID anywhere (I usually copied it from the URL).
The key ID is the ZID of the function you are implementing, plus a suffix
for the position of the key you are looking for. In this case, since you
were implementing Z11272 <https://www.wikifunctions.org/wiki/Z11272> and
there was only one key, the key ID was Z11272K1. You had to type that in,
and were then able to publish the implementation (see Figure 7).
How does it work now?
Last week we had our internal "Fix-It" week, where we focus on technical
debt and smaller, but irritating, tasks. One of the projects that our
colleague Genoveva <https://meta.wikimedia.org/wiki/User:Geno_(WMF)> tackled
was to improve this workflow: once you have chosen “Argument reference”,
instead of an empty text field for which you had to look up and write in a
key reference, it now shows a dropdown field (see Figure 8).
Clicking on that dropdown field reveals the list of relevant arguments (see
Figure 9). You select the argument (see Figure 10). Now you can even
collapse the infinitive field into a single field, making the view even
more compact (see Figure 11).
<https://meta.wikimedia.org/wiki/File:Wikifunctions_-_function_argument_inte…>
Figure 8: how it is now: Argument reference is chosen as the mode and we
have a dropdown field waiting for selection.
<https://meta.wikimedia.org/wiki/File:Wikifunctions_-_function_argument_inte…>
Figure 9: by clicking on it, we see the available arguments.
<https://meta.wikimedia.org/wiki/File:Wikifunctions_-_function_argument_inte…>
Figure 10: an argument was selected.
<https://meta.wikimedia.org/wiki/File:Wikifunctions_-_function_argument_inte…>
Figure 11: collapsed view of the *infinitive* key with a selected argument
reference.
Why does it matter?
We want to make contributing functions, tests, and implementations as easy
as possible. We believe that in order to achieve the goal of sharing a
comprehensive library of natural language generation functions for hundreds
of languages, we will need a lot of people to edit functions, tests, and
implementations on Wikifunctions.
We followed a number of principles when improving this workflow: we reduced
the cognitive workload necessary to complete the task, we hid identifiers
from one more place in the user interface, and we made it considerably
easier to complete the task on mobile devices. We hope that this will help
with our goal of allowing more people to contribute to Wikifunctions
effectively.
Congratulations to Genoveva for engineering, Amin for Design, and the team
for working on and landing this improvement. Thank you all!
Notes
1. ↑
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2023-10-04#cite_…>
It is possible to have functions with no arguments in Wikifunctions,
e.g. nullary true <https://www.wikifunctions.org/wiki/Z10210>, but such
functions are of limited practical use.