The on-wiki version of this newsletter can be found here:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2023-11-16
--
What types next?
We currently support two types: strings and Booleans. Our next big step for
expanding the capabilities of Wikifunctions is by introducing more types.
We are working hard to get all the pieces in place: we have previously
talked about serialization
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2023-09-27> to
make types work seamlessly with programming languages, and renderers and
parsers
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2023-09-20> to
improve the experience when using them.
<https://meta.wikimedia.org/wiki/File:Data_types_-_en.svg>Data can take
many different types for many different domains
The first new type we expect to open for the community is a type for lists.
This will not only allow for many new functions that are not possible yet
(such as breaking a sentence into words), but also to simplify existing
implementations such as this test for mutation in Breton
<https://www.wikifunctions.org/view/en/Z11601> or this (overly simple) check
if a letter is a vowel <https://www.wikifunctions.org/view/en/Z11894>,
implementing a function we created collaboratively in this week’s Volunteer
Corner.
But, as asked by Cool314 on the Project Chat
<https://www.wikifunctions.org/wiki/Wikifunctions:Project_chat#Once_we_get_n…>,
what should be the next types that the community would like to see? They
also suggest starting with the list type, which, as said, is on the way.
They say to follow with integers, and if that means non-negative integers
or counting numbers, I would be totally on-board. But what would be next?
The list of suggested functions requiring new types
<https://www.wikifunctions.org/wiki/Wikifunctions:Suggest_a_function#Propose…>
also
strongly suggests counting numbers. After that we find suggestions talking
about more complicated numbers, such as negative numbers, floating-point
numbers, fractions; then also bytes and specific-length vectors of bytes;
colors; years, months, and dates; and others.
One thing we need to take into account is to go for simpler types first:
for example, a calendar day could be built from a counting number and a
month, or it could be built from two counting numbers – but one way or the
other, we would need counting numbers first.
Another question is whether we prefer to have our types to be built from
simpler elements, or whether we prefer more complex types. To give a very
simple example, there are many different ways to represent integers, two of
which would be:
1. An integer could be represented by an object with a single key, a
single string that starts with an optional “-” and ends with a whole
number, i.e. a list of digits with no leading zeros
2. An integer could be represented by an object with two keys, one being
the sign (which is one of negative, positive, or none), and the other a
counting number, which we would have previously defined as a type
Then there is also the question of which string would represent which
value, a question that needs to be answered for each individual type. For
numbers one straightforward solution is to take the string representation
of the numbers in Hindu-Arabic numerals, but one could also consider
binary, hexadecimal, or even base64 encodings, potentially reducing storage
space. I think that the potential to more easily understand a JSON
representation would beat the small storage gain.
Finally, what limits, if any, should we put on these values? Frequently,
programming languages have a numerical limit for their 'number' type of the
range from 0 to 4.2 billion (2^^32) or 0 to 18 qunitillion
<https://en.wiktionary.org/wiki/quintillion> (2^^64). Should we add a limit
like this as well, or should we expect function writers to work with any
possible input?
Note that this does not mean that we only will be able to deal with
Hindu-Arabic numerals: with parsers and renderers
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2023-09-20> we
will be able to display numbers appropriately for a given language,
remaining fully translated for whichever language and locale users are
using. We might not necessarily have that feature available immediately
when introducing new types, but we will be working on enabling them soon.
Let us hear and discuss what you think (or even come up with a process for
us to follow, based on the types you are suggesting). What are the types
you are looking forward to?
Recent changes to Wikifunctions software
From now on, we will try to give a quick summary with each update of what
work you can see as it rolls out to Wikifunctions.
In terms of big items, following the completion of "General Availability"
last week
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2023-11-07>,
we've been primarily working on the software to better support types (as
discussed above; T343469 <https://phabricator.wikimedia.org/T343469>). On
the back-end this has meant infrastructure support for applying custom code
to convert from Wikifunction types to native types and back (T297509
<https://phabricator.wikimedia.org/T297509>), and on the front-end we've
been working on using lists as inputs and outputs of functions, coming soon
(T326301 <https://phabricator.wikimedia.org/T326301>). We've also been
working with several volunteers to understand how we can improve the
on-boarding experience (T285509 <https://phabricator.wikimedia.org/T285509>);
thank you to everyone who has taken part.
Our general improvements have included progress on being more consistent in
using proper, specific, translatable errors consistently throughout the
system (T321113 <https://phabricator.wikimedia.org/T321113>), and
researching if it's possible to make any further simple improvements to
picking a type or object (T345547
<https://phabricator.wikimedia.org/T345547>).
As minor fixes, we now use a different title on the creation page depending
on whether you're making a function, implementation, test case, or type,
rather than just say 'object' (T350673
<https://phabricator.wikimedia.org/T350673> and T341847
<https://phabricator.wikimedia.org/T341847>). We also fixed the view of
aliases to be one bulleted list of several items, not several lists of just
one (T345404 <https://phabricator.wikimedia.org/T345404>).
You can browse the full list of deployed changes
<https://www.mediawiki.org/wiki/MediaWiki_1.42/wmf.5#WikiLambda> for the
MediaWiki front-end for Wikifunctions. We didn't deploy any back-end
service changes this week.
No newsletter next week
Due to holidays, we will be skipping next week’s newsletter. Expect to hear
from us again after Thanksgiving
<https://en.wikipedia.org/wiki/Thanksgiving>!
The on-wiki version of this newsletter can be found here:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2023-11-07
--
Wikifunctions, the library of functions that anyone can use and edit
As of a few days ago, Wikifunctions can be used by anyone.
<https://meta.wikimedia.org/wiki/File:Anyone_can_edit_Wikifunctions.png>Wikifunctions
can now be edited by anyone!
That means that everyone visiting Wikifunctions is able to run functions.
Until now, this feature was limited to logged-in users only.
The last two times
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2023-10-25> we
talked about re-implementing our backend to run on WebAssembly
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2023-11-03>.
Since then, we have monitored the system and deployed a number of further
security features. We have improved the monitoring of the system to notice
issues sooner. We have also moved Wikifunctions.org to be the first
Wikimedia project to entirely run on Kubernetes
<https://en.wikipedia.org/wiki/Kubernetes>.
All of these steps gave us the confidence to drop the requirement to be
logged-in in order to run approved functions on Wikifunctions. We will be
monitoring the system, and in case we notice more load than we can handle,
we might be limiting function calls again. This might be a bit of a bumpy
ride, and we will see in the coming weeks and months how this will develop.
Thank you for your patience so far, and thank you for continued patience in
the future!
Furthermore, we have also considerably opened up editing rights. From now
on, all logged-in users can propose and improve draft functions, tests, and
implementations, rather than just special users with the Functioneer status.
Functioneers retain their role with the ability to connect and disconnect
tests and implementations on functions, which makes the function "live" so
that people can use it. The current set of Functioneers were all granted
their rights for a limited amount of time (for a few more months). We are
asking the community to set up a process to assign Functioneer rights
<https://www.wikifunctions.org/wiki/Wikifunctions:Project_chat#Your_input_ne…>
to
users, and keep a healthy number of Functioneers around.
For now, we will still not assign any function maintainer rights either.
Function maintainers will be able to do very wide-ranging, potentially
damaging edits, e.g. changing the definition of a type, or editing
connected implementations. There is no proper support for these workflows
yet, which is why we will not give out those rights for now.
With these changes, we are also dropping the word “soon” from the tagline
on the Wikifunctions main page: "Wikifunctions is a free library of
functions that (soon) anyone can edit.” We consider Wikifunctions now to
have reached general availability.
There is a lot more work to do. One of our next goals, which I'll post more
about next week, is to support more types beyond strings and Booleans, and
thus to allow many more functions to be created and made available.
Thanks so much to James
<https://meta.wikimedia.org/wiki/User:Jdforrester_(WMF)> for leading this
effort! And thanks to all the other team members who have worked on their
part, either within the Abstract Wikipedia team, within Security, or within
SRE (Site Reliability Engineering). We are excited to keep an eye on how
things develop from here on.
The on-wiki version of this newsletter can be found here:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2023-11-03
--
Running Python on WebAssembly
<https://meta.wikimedia.org/wiki/File:Wikifunctions_-_object_selector_improv…>A
screenshot of the improved selector, finding types related to the input as
labels or aliases in different languages. The input, "cha", matches the
Polish "Checha" for Key, the French "Chaîne" for String, the English alias
"Unreachable" for Nothing, and the English alias "Character" for Code point.
As reported last week
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2023-10-25>, we
had switched our runtime for JavaScript to use a WebAssembly-based stack.
We still had to do the same for the Python language.
As of Wednesday, Python code is now also being executed in a
WebAssembly-based runtime. This means that now all user-written code will
be executed on a WebAssembly runtime, whether it is in JavaScript or Python.
The change turned out to be more difficult than anticipated. We tried
several different ways, and discovered entirely novel ways our testing
infrastructure could run into interesting issues. This has uncovered enough
work for weeks! Finally, we ended up using Wasmtime
<https://wasmtime.dev/> with
a locally-compiled RustPython <https://rustpython.github.io/> module. There
are a number of improvements and simplifications we want to work on in the
coming weeks and months, but for now we are happy that the system seems to
run in production. The WebAssembly-based Python runtime we have now seems
to run a bit slower than the previous one, but it is likely that you won’t
notice a difference.
With this change, we have completed adding an additional layer of security
to Wikifunctions. This way we are really close to actually opening up
editing. We will continue monitoring the systems, and if everything looks
good, we will soon open for wider editing.
Thanks to Cory and James for integrating and deploying the WebAssembly
runtime!
Improvement in the object selector
<https://meta.wikimedia.org/wiki/File:Wikifunctions_-_object_selector_improv…>Language
search
Another set of improvements were deployed to the object selector, the
widget that allows you to select a function or other object. You use this
every time you create a function definition, and often when calling them,
so it's important that we make it helpful for key parts of the workflow. It
should be much more descriptive now because it shows the right label and
type. It is also better at finding the right object because it’s taking
aliases and languages into account. For finding languages, now also BCP 47
<https://en.wikipedia.org/wiki/IETF_language_tag> (and when different, also
their MediaWiki) language codes are taken into account.
It used to be that you would sometimes need to type in raw ZIDs to find the
right object. This should now be needed much more rarely, and work more
fluently when you do need it. Let us know if you ever run into such a
situation, so that we have input for further improving the widget.
Thanks to Geno and Amin for improving and redesigning the object selector!
Also thanks to the community members who suggested improvements, including
GZWDer and egezort.
Volunteer’s Corner on November 13, 2023
<https://meta.wikimedia.org/wiki/File:Wikifunctions_-_object_selector_improv…>Searching
using the ZID
The next volunteer’s corner will be on November 13, 2023, at 18:30 UTC
<https://zonestamp.toolforge.org/1699900200>. We will meet in
meet.google.com/xuy-njxh-rkw. Bring your questions, and if time permits, we
will build a new function together.
Hello (Semantic) MediaWiki users, maintainers, software developers, consultants, researchers!
The SMWCon in fall 2023 https://www.semantic-mediawiki.org/wiki/SMWCon_Fall_2023 will be held on location in Paderborn, Germany from December 11-13.
On three days there will be talks, tutorials and hackathons.
This conference addressed everybody interested in wikis and open knowledge, especially in Semantic MediaWiki, e.g. users, developers, consultants, business or government representatives, and researchers.
This conference aimed to:
. inspire/onboard new users,
. inform on where and how MediaWiki is used, . convey and consolidate best practices, . initiate/foster/integrate application and development and . strengthen the community of stakeholders and its service portfolio.
Learn how to "do" MediaWiki in order to assume your responsibilities regarding your organization's knowledge management.
Please not that early-bird ticket sale ends today!
Call for Contributions
----------------------------
We are looking for use cases and best practices that provide insight in issues like
* How does AI change the way we use MediaWiki
* How do semantic wikis fit in and be combined with AI tools
* How can we use Semantic MediaWiki in research and organizations
* How do we develop and deploy MediaWiki and extensions
Your experience is valuable for all of us! So please share and propose a talk, tutorial or other contribution.
Go to the Conference Page (https://www.semantic-mediawiki.org/wiki/SMWCon_Fall_20223)
and hit the 'Propose a talk here' button.
Please propose a contribution if you plan to have one, even if you don't have the details yet. For us it is important to know what we can expect.
We look forward to your contribution!
Sponsoring
----------------
Thank you to the sponsors of SMWCon 2023!
* http://www.archixl.nl/ - Specialists in enterprise architecture, knowledge management, and semantics
* https://bluespice.com - The company behind BlueSpice, the open-source enterprise wiki software
* https://mywikis.eu - GDPR compliant (Semantic) MediaWiki hosting from the heart of Europe.
* https://wikibase-solutions.com/ - Specialist in business solutions with MediaWiki
Organization
------------------
The organizers of SMWCon 2023 and https://mwstake.org
* Bernhard Krabina, https://km-a.net (General Chair)
* Ad Strack van Schijndel, https://www.juggel.com (Program Chair)
* Tobias Oetterer, https://www.uni-paderborn.de/en/ (Local Chair)
The on-wiki version of this newsletter can be found here:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2023-10-25
--
A few weeks ago we opened up Wikifunctions for some community members – but
have yet to open it up to wider contribution and usage. Thanks to the
brilliant input of some community members, most notably Lockal
<https://ru.wikipedia.org/wiki/%D0%A3%D1%87%D0%B0%D1%81%D1%82%D0%BD%D0%B8%D0…>,
we were made aware of some potential security issues before they could be
exploited. This led us to limit function calls to logged-in users while we
implemented some security mitigations.
<https://meta.wikimedia.org/wiki/File:Wikifunctions_-_Top-level_architectura…>Top-level
architecture of Wikifunctions
Our original plan was to rely on a multi-layered approach to security,
where we split up the backend into two parts, one being the orchestrator,
which collects all necessary data, and the other the evaluator, which
actually runs the code written by Wikifunctions editors. The evaluator
would be running in a Docker virtual machine with very limited rights. But,
as we opened up Wikifunctions, issues arose that, although not yet
exploitable themselves, might become so in the future.
We partnered with the SRE and Security teams in response to the new
concerns, and together we brainstormed ideas and hammered out potential
solutions to add further layers of protection. The idea is to provide
additional security in depth. One major component of our revised security
strategy required a complete rewrite of the evaluator encapsulation
service: instead of running user-written code in language runtimes directly
in Docker, we will run them on top of a WebAssembly runtime inside the
container.
What is WebAssembly <https://en.wikipedia.org/wiki/WebAssembly>?
WebAssembly, or "WASM" for short, is a low level programming language,
meaning it is comparably simple and doesn’t directly support higher levels
of abstractions. There are many different runtimes for WebAssembly, the
most prominent of which are basically all modern browsers (thus the “Web”
in the name). As with many other low level programming languages, it can
also serve as a compilation target for other programming languages, meaning
that you can take, for example, code written in C or Rust and compile it to
WebAssembly. This allows programs that were written for the desktop to be
run in the browser. One example is the Jump-and-run game SuperTux
<https://supertux.semphris.com/play/>, which was originally written in C++,
and can now be run in the browser.
WebAssembly does not have to be run in the browser; it can also be run on a
server. In the last few years, a flurry of activity has created dozens of
runtimes. One advantage of WebAssembly is that the runtime that runs
WebAssembly is easy to control and limit; thus, translating code to
WebAssembly adds an additional layer of security.
As of this week, we have deployed the new version of the evaluator for
JavaScript. We will be monitoring how this change will affect the
performance and cost of running Wikifunctions. Note that the WebAssembly
runtime does not replace the other security measures, but is being added in
addition to the existing measures. If you inspect the "Details" of a
function run on JavaScript now, you'll see that it's run on QuickJS v0.5.0
inside WASM (specifically, on WasmEdge <https://wasmedge.org/>), rather
than Node v16.17.1. We are working on also switching the evaluator for
Python to one based on WebAssembly soon.
One previous decision has made things a bit more challenging, though: our
choice to start with JavaScript and Python. WebAssembly is geared towards
compiled programming languages such as C, Rust, or Go, whereas Python and
JavaScript are interpreted languages. Eventually, we found Python and
JavaScript interpreters that can be compiled to WebAssembly, and then these
compiled builds are used to run the actual Python and JavaScript code. We
live in interesting times.
In fact, the tooling around WASM for Python and JS is so novel and
bleeding-edge as to have caused some "fascinating" bugs during adoption. At
one point, we had got our Python executor running on WebAssembly, using
(among other things) a great tool called wasmtime <https://wasmtime.dev/>,
written by Bytecode Alliance <https://bytecodealliance.org/>. Our tests
were reliably green for a couple of weeks, even up to the day we decided to
switch our staging Python executor to use WASM. However, once our new
release reached the staging area, Python function calls mysteriously
failed. After debugging, we found that our call to the wasm command line
tool was the culprit. It turned out that the wasm runner we were using had
pushed a new major version, flagged as a breaking change, less than an hour
before we built the image for deploy. The fix for that issue was easy–we
simply re-specified that our code download and use the previous version of
the command line tool–but this demonstrates how fast-moving the world of
WASM can be.
Where will we go next? We will be monitoring the load that the new
architecture puts on our servers, to see if the system is sustainable.
There will be some change in the speed of evaluating functions, but we
expect that the change will be, overall, barely noticeable at all. We hope
that the additional layer of protection will hold up, but if you do find a
way past it, let us know
<https://www.mediawiki.org/wiki/Reporting_security_bugs>.
We think there is quite some room for improvement in terms of runtime
speed. WebAssembly runtimes have seen a whirlwind of development in the
last few years, and it seems that particularly for interpreted languages it
is still rife with opportunities. One way to improve the runtime
characteristics of Wikifunctions is to add support for languages that are
more natural fits for WebAssembly, such as Rust or C. Given the automatic
support for the fastest implementation, this might swiftly consolidate to
more efficient implementations. But compiled languages would also need a
slightly different architecture, as the compilation results would need to
be stored somehow. One interesting option would be to also push the
function evaluation to the user’s browser, since it contains a WebAssembly
runtime as well. But we would need to understand the consequences of that,
particularly for slower devices.
We used this change also as an opportunity to experimentally switch on the
right for everybody to run community-approved functions on Wikifunctions,
not just for logged-in users. As you can see, this change is buried deep in
this update, and it might be pulled back anytime again. We will monitor the
system to see how stable it is. We will keep you up-to-date in this
newsletter.
Thanks to Cory <https://meta.wikimedia.org/wiki/User:CMassaro_(WMF)> for
taking the lead on this project, James
<https://meta.wikimedia.org/wiki/User:Jdforrester_(WMF)> for taking it to
production, and the Security and SRE teams, who supported us so helpfully!
It is great to see it deployed, taking us a big step closer to opening up
Wikifunctions to everyone.
The on-wiki version of this newsletter can be found here:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2023-10-04
As this newsletter contains plenty of images, it can be easily misformatted
in Email. Please refer to the on-wiki version for easier reading in that
case.
--
Arguments made easier
As of today, referencing arguments has become considerably easier.
<https://meta.wikimedia.org/wiki/File:Wikifunctions_-_function_argument_inte…>What
does this mean?
<https://meta.wikimedia.org/wiki/File:Wikifunctions_-_function_argument_inte…>Figure
2: expanded composition for generating the verb form for the German second
person plural.
Every function has arguments.[1]
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2023-10-04#cite_…>
When
creating a composition to implement a function, you need to be able to
reference the arguments of the function. For example, let's look at the
function that generates the regular German verb form for the second person
plural <https://www.wikifunctions.org/wiki/Z11272>: it takes one argument,
the infinitive form of the verb, e.g. *“denken”* (to think), reduces it to
the stem of the word, e.g. *“denk”*, and adds the letter *“t”* to get e.g.
*“denkt”* (as in *“ihr denkt”*, “you think”). So the composition looks as
follows:
join strings <https://www.wikifunctions.org/wiki/Z10000>( stem German verb
<https://www.wikifunctions.org/wiki/Z11259>( Argument reference( infinitive
) ), “t” )
… or see Figure 2 for the expanded view.
In this composition, there is one function call embedded in another. The
inner function call, which returns the German stem, has an argument
reference to the *infinitive*. This means that the argument with which the
function is being called will be placed right in this place. Therefore, if
you call the function with the argument *“denken”*, it turns into this
function call:
join strings <https://www.wikifunctions.org/wiki/Z10000>( stem German verb
<https://www.wikifunctions.org/wiki/Z11259>( “denken” ), “t” )
Then, by evaluating the inner function call, you get:
join strings <https://www.wikifunctions.org/wiki/Z10000>( “denk”, “t” )
And this finally evaluates to the result, *“denkt”*.
Figure 3 shows the status of the composition at the point where everything
but the argument reference has been entered.
At this point, you have to expand the infinitive argument for the function
call for stem German verb <https://www.wikifunctions.org/wiki/Z11259>. You
do this by clicking the sideways-chevron (">") next to the label
“infinitive”. This expands the fields, showing the type and mode and the
(still empty) value (see Figure 4).
Now you switch the input mode from a literal String to an Argument
reference (Figure 5). And this is where the previous workflow diverges from
the new workflow.
<https://meta.wikimedia.org/wiki/File:Wikifunctions_-_function_argument_inte…>
Figure 3: everything but the argument reference is in place.
<https://meta.wikimedia.org/wiki/File:Wikifunctions_-_function_argument_inte…>
Figure 4: expanded *infinitive* field on the Function "stem German verb".
<https://meta.wikimedia.org/wiki/File:Wikifunctions_-_function_argument_inte…>
Figure 5: selecting between the different modes for *infinitive*
How did it work?
<https://meta.wikimedia.org/wiki/File:Wikifunctions_-_function_argument_inte…>
Figure 6: how it was last week: Argument reference is chosen as the mode
and we have now a textfield waiting for the key ID.
<https://meta.wikimedia.org/wiki/File:Wikifunctions_-_function_argument_inte…>
Figure 7: the rather cryptic key ID filled out and ready to be published.
Figure 6 shows the previous status after selecting Argument reference. You
were presented with an empty field labeled “key id”, and the field expected
the editor to type in the key ID. What makes this worse is that the page
did not show you the key ID anywhere (I usually copied it from the URL).
The key ID is the ZID of the function you are implementing, plus a suffix
for the position of the key you are looking for. In this case, since you
were implementing Z11272 <https://www.wikifunctions.org/wiki/Z11272> and
there was only one key, the key ID was Z11272K1. You had to type that in,
and were then able to publish the implementation (see Figure 7).
How does it work now?
Last week we had our internal "Fix-It" week, where we focus on technical
debt and smaller, but irritating, tasks. One of the projects that our
colleague Genoveva <https://meta.wikimedia.org/wiki/User:Geno_(WMF)> tackled
was to improve this workflow: once you have chosen “Argument reference”,
instead of an empty text field for which you had to look up and write in a
key reference, it now shows a dropdown field (see Figure 8).
Clicking on that dropdown field reveals the list of relevant arguments (see
Figure 9). You select the argument (see Figure 10). Now you can even
collapse the infinitive field into a single field, making the view even
more compact (see Figure 11).
<https://meta.wikimedia.org/wiki/File:Wikifunctions_-_function_argument_inte…>
Figure 8: how it is now: Argument reference is chosen as the mode and we
have a dropdown field waiting for selection.
<https://meta.wikimedia.org/wiki/File:Wikifunctions_-_function_argument_inte…>
Figure 9: by clicking on it, we see the available arguments.
<https://meta.wikimedia.org/wiki/File:Wikifunctions_-_function_argument_inte…>
Figure 10: an argument was selected.
<https://meta.wikimedia.org/wiki/File:Wikifunctions_-_function_argument_inte…>
Figure 11: collapsed view of the *infinitive* key with a selected argument
reference.
Why does it matter?
We want to make contributing functions, tests, and implementations as easy
as possible. We believe that in order to achieve the goal of sharing a
comprehensive library of natural language generation functions for hundreds
of languages, we will need a lot of people to edit functions, tests, and
implementations on Wikifunctions.
We followed a number of principles when improving this workflow: we reduced
the cognitive workload necessary to complete the task, we hid identifiers
from one more place in the user interface, and we made it considerably
easier to complete the task on mobile devices. We hope that this will help
with our goal of allowing more people to contribute to Wikifunctions
effectively.
Congratulations to Genoveva for engineering, Amin for Design, and the team
for working on and landing this improvement. Thank you all!
Notes
1. ↑
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2023-10-04#cite_…>
It is possible to have functions with no arguments in Wikifunctions,
e.g. nullary true <https://www.wikifunctions.org/wiki/Z10210>, but such
functions are of limited practical use.
The on-wiki version of this newsletter can be found here:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2023-09-27
--
Serializers and deserializers for types
Last week, we discussed our plans to add renderers and parsers for types
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2023-09-20>.
This week, we will continue the theme of how to make types easier to use,
by discussing serializers and deserializers, and their role in
Wikifunctions.
If you have the appropriate type, writing a native code function can be
really easy: for example, since we already have types for Booleans
<https://www.wikifunctions.org/wiki/Z40> and Strings
<https://www.wikifunctions.org/wiki/Z6>, and we translate them in the
system to the native concepts of Booleans and Strings in Python and
JavaScript, this means that writing the code implementation for a function
such as the boolean conjunction (and)
<https://www.wikifunctions.org/view/en/Z10174> or joining strings
<https://www.wikifunctions.org/wiki/Z10000> is rather straightforward and
just a single line of code packaged in a function:
- And in Python <https://www.wikifunctions.org/view/en/Z10175>
- And in JavaScript <https://www.wikifunctions.org/view/en/Z10202>
- Join strings in Python <https://www.wikifunctions.org/view/en/Z10004>
- Join strings in JavaScript
<https://www.wikifunctions.org/view/en/Z10005> (and alternatives using
concat <https://www.wikifunctions.org/view/en/Z10621> or join
<https://www.wikifunctions.org/view/en/Z10622>)
On Wikifunctions Beta, we already have seen the creation of a few types,
such as for numbers <https://wikifunctions.beta.wmflabs.org/view/en/Z10015>
or dates <https://wikifunctions.beta.wmflabs.org/view/en/Z10438>. But the
implementations for similarly basic functions such as addition
<https://wikifunctions.beta.wmflabs.org/view/en/Z10118> or squaring a
number are nowhere as simple, and have far more than a single line of code:
- Addition in Python
<https://wikifunctions.beta.wmflabs.org/view/en/Z10874>
- Addition in JavaScript
<https://wikifunctions.beta.wmflabs.org/view/en/Z10119>
Why is that so?
Here’s the implementation of the addition function in Python in the Beta
Cluster version:
def Z10118(Z10118K1, Z10118K2): def deserialize(x): return
int(x.Z10015K1) def serialize(x): return ZObject({"Z1K1":"Z9",
"Z9K1":"Z10015"}, Z10015K1=str(x)) left = deserialize(Z10118K1)
right = deserialize(Z10118K2) result = left + right return
serialize(result)
And here’s how the implementation should look:
def Z10118(Z10118K1, Z10118K2): return Z10118K1 + Z10118K2
In the core of the implementation above, that’s exactly what it does: in
line 11 you can see that the Python + operator is being called. But in
addition to all that, we also need code that deserializes the input
arguments, and serializes the output. In other words, we need to turn the
ZObject that Wikifunctions works with into values of Python’s int type
(that happens in line 3) and back into a ZObject (that happens in line 6).
If Wikifunctions knew that the positive integer type can be fully
represented by the int type of Python 3, we could have automatically made
that conversion inside the system. But we want types to be flexible, and to
eventually be fully community-controlled on-wiki. And that also means that
we shouldn’t build in any magic into the Wikifunctions system that does
such conversions, or that requires the system to know types.
The way we plan to tackle this is as follows (and now is the right time for
comments):
We will introduce two new types of special objects: serializers and
deserializers. A deserializer is attached to a specific programming
language and Wikifunctions source type, and has code attached that takes a
ZObject of the source type and turn it into a value of the target native
type in that programming language. A serializer is the inverse of that.
For example, you might have a deserializer that turns a Wikifunctions
Integer type when used with Python into a native BigNum (even if it might
fit into an int), and the serializer from Python understands how to convert
both native Python ints and BigNums to Wikifunctions Integer type instances.
Now, whenever we want to run native code, the evaluator - the piece of code
responsible for running native code - will also need to run the code
associated with the serializers and deserializers. That is, all the extra
code that makes up the difference between the two implementations above
would be handled automatically by Wikifunctions.
For each type and language, there would be exactly one deserializer and
serializer. Then, when a native implementation for a function is being
written, we look up the types on the function, and find the right
serializer and deserializer for those types in that programming language.
Let us know if you have ideas or comments on these plans!
October Volunteer Corner
The volunteer corner for next month will be next week Monday, October 2nd.
We are playing a bit with the times, so that different people may attend.
Also, because of repeated issues with Jit.si, we are shifting for now to
Google Meet again.
Please give us feedback on the time and on the platform, so we can continue
to improve.
We are meeting on October 2nd, 2023, at 13:30 UTC
<https://zonestamp.toolforge.org/1696253400> at Google Meet
<https://meet.google.com/xuy-njxh-rkw?authuser=0&hs=122>.
The agenda for the meeting is to take any questions that arise, followed by
working on a function together.
Recording of September Volunteer Corner
We also uploaded a recording of the September edition of the Volunteer
Corner
<https://commons.wikimedia.org/wiki/File:Abstract_Wikipedia_Volunteer_Corner…>
to
Commons. We were working together on a function to check if a string is a
valid positive integer <https://www.wikifunctions.org/view/en/Z11129>. It
was great fun to build a function on Wikifunctions together, to have folks
create testers, and to discuss the function and its limits live!
Hi,
This is an interesting discussion and I share here some of my personal
experiences.
As part of a personal project on creating a multilingual programming
language (WIP) [1, 2], I worked on exploring ways where we do not assume
that the numbers are represented only as a sequence of Arabic numerals,
like 4657388. As discussed in this thread, there are numerous other
representation systems. The Roman numeral system, for example, may not
represent very large numbers, but can be found in literature. I used
Unicode and Roman numerals to represent numbers with the possibility to
support mathematical operations. In ideal situations, a user of a
multilingual programming language can use different numerical systems for
performing mathematical calculations and the result must be displayed in
the same numerical system.
For example, using Roman numerals [3]
num1 = rn.RomanNumeral("XV") # create a numeral
num2 = rn.RomanNumeral("VII") # create a numeral
num3 = num1 * num2
or using numerals in Malayalam language [4]
num1 = un.UnicodeNumeral("൧൩") # create a numeral
num2 = un.UnicodeNumeral("൨൪") # create a numeral
num3 = num1 + num2
However, we cannot assume a general way of representing numbers in
different languages. I did not focus on handling cases where spaces or
commas are present in a number, like in currencies 4 657 388 or 4,657,388.
That would require more advanced use of existing
internationalization/localization efforts. We already have some support for
currencies for many locales.
Thus, for Wikifunctions, we may need to imagine such complex, but
interesting examples.
References:
[1] https://github.com/johnsamuelwrites/multilingual
[2] Multilingual Programming Experience: Envisioning an Inclusive and
Diverse Future
<https://medium.com/@jsamwrites/multilingual-programming-experience-envision…>
[3] https://github.com/johnsamuelwrites/multilingual/blob/main/tests/roman_nume…
[4]
https://github.com/johnsamuelwrites/multilingual/blob/main/tests/unicode_nu…
On Thu, Sep 21, 2023 at 2:00 PM <
abstract-wikipedia-request(a)lists.wikimedia.org> wrote:
> Send Abstract-Wikipedia mailing list submissions to
> abstract-wikipedia(a)lists.wikimedia.org
>
> To subscribe or unsubscribe, please visit
>
> https://lists.wikimedia.org/postorius/lists/abstract-wikipedia.lists.wikime…
>
> You can reach the person managing the list at
> abstract-wikipedia-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Abstract-Wikipedia digest..."
>
> Today's Topics:
>
> 1. Newsletter #127: Renderer and parsers for types (Denny Vrandečić)
> 2. Re: Newsletter #127: Renderer and parsers for types (Thad Guidry)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 20 Sep 2023 17:13:42 -0700
> From: Denny Vrandečić <dvrandecic(a)wikimedia.org>
> Subject: [Abstract-wikipedia] Newsletter #127: Renderer and parsers
> for types
> To: Abstract Wikipedia list <abstract-wikipedia(a)lists.wikimedia.org>
> Message-ID:
> <CA+bik1fVXXHePHKdv8W5FXGm=_hRCFNL+=
> fUO9YZq5+mtvfoOQ(a)mail.gmail.com>
> Content-Type: multipart/alternative;
> boundary="00000000000008d7f50605d35eed"
>
> The on-wiki version of this newsletter can be found here:
> https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2023-09-20
> --
> Renderers and parsers for types
>
> Wikifunctions currently supports two types: Strings and Booleans. To make
> Wikifunctions useful, we need to support many more types, such as numbers,
> dates, geocoordinates, and eventually Wikidata lexemes and items. Types
> define what kind of inputs and outputs the functions in Wikifunctions can
> have.
>
> With Wikifunctions, we don’t want to just repeat what different programming
> languages have done, but, if possible, gently update the lessons that have
> been learned from programming language research and experience and make
> sure that we are as inclusive as possible.
>
> Strings and Booleans were very carefully chosen for the first deployment of
> Wikifunctions: Strings <https://www.wikifunctions.org/wiki/Z6>, because
> they are just a specific sequence of Characters, and do not depend on the
> user’s language. Booleans <https://www.wikifunctions.org/wiki/Z40>,
> because
> they are a key basis of logic flow for programming. Further, they can be
> fully translated in Wikifunctions – the two values, True
> <https://www.wikifunctions.org/wiki/Z41> and False
> <https://www.wikifunctions.org/wiki/Z42>, are both represented by a
> Wikifunctions object that can have names in any of the languages we
> support. Since the initial deployment, more than a dozen translations have
> been added! If you can add more, that would be great.
>
> One example of a possible next type that would be interesting to introduce
> would be whole numbers. This raises a big question: how should we represent
> an integer?
>
> Most programming languages have two answers for that: one, they internally
> represent it, usually, as a binary string of a specific length, in order to
> efficiently store and process these numbers. But then there is also their
> representation in the human-readable source code, and here they are usually
> represented as a sequence of Arabic numerals
> <https://en.wikipedia.org/wiki/Arabic_numerals>, e.g. 4657388. Some
> programming languages are nice enough to allow for grouping of the numbers,
> e.g. in Ada <https://en.wikipedia.org/wiki/Ada_(programming_language)> you
> may write 4_657_388, or, if you prefer the Indian system
> <https://en.wikipedia.org/wiki/Indian_numbering_system>, 46_57_388, making
> these numbers a bit more readable.
>
> But programming languages where one can write ৪৬,৫৭,৩৮৮ using Bengali
> numerals <https://en.wikipedia.org/wiki/Bengali_numerals>, referring to
> the
> same number, are rare <https://sjishan.github.io/chascript/>. For
> Wikifunctions, we want to rectify this, to make sure that the whole system
> supports every human language fluently and consistently.
>
> Internally, we will represent numbers - like every other object - as
> ZObjects. The above number would be represented internally as follows
> (using the prototype ZID from the Beta
> <https://wikifunctions.beta.wmflabs.org/view/en/Z10015>, since we don’t
> yet
> have the respective type in the real Wikifunctions):
>
> { "Z1K1": "Z10015", "Z10015K1": "4657388"}
>
> Or, with labels in English:
>
> { "type": "positive integer", "value": "4657388"}
>
> Even though this solves the internal representation, we would want to avoid
> displaying this object in the system if possible. Instead, we plan to allow
> the Wikifunctions community to attach a 'renderer' and a 'parser' to each
> type. The renderer would be a function that takes an object of the given
> type (in this case, an object of the type positive integer) and a language,
> and returns a string. The parser is the opposite of that: it takes a string
> and a language, and returns an object of type positive integer.
>
> This would allow the Wikifunctions community to create functions for each
> type and language that would decide how the values of the type are going to
> be displayed in the given language. In a Bengali interface, the above
> number can then be displayed in the most natural representation for
> Bengali, which might be ৪৬,৫৭,৩৮৮.
>
> When entering a number, we will use the parsing function to turn the input
> of the user into the internal representation. It is then up to the
> community to decide how flexible they want to be: if they would only accept
> ৪৬,৫৭,৩৮৮ as the input, or whether ৪৬৫৭৩৮৮ would be just as good - or even
> also or only 4657388. The decision would be for the Wikifunctions community
> to make.
>
> Note that we made a lot of assumptions in the above text. For example,
> using the ZID from the Beta, calling the type “positive integer”, assuming
> the internal representation of positive integers being Arabic numerals
> without formatting (instead of say, hexadecimal, base 64 or a binary
> number, which also could be good solutions), and other assumptions. All of
> these decisions are up to you, but we used assumptions here to talk
> concretely about the proposal.
>
> We plan to implement this proposal incrementally, over a few weeks and
> months. It will likely be the case that we will at first only accept the
> internal representation (just as it currently works on the Beta), and that
> we will then add renderers and finally parsers.
>
> We are looking forward to hearing your feedback on this plan.
>