The on-wiki version of this newsletter can be found here:
https://www.wikifunctions.org/wiki/Wikifunctions:Status_updates/2024-08-29
======
Limits on Name and Description Lengths
We recently introduced more stringent limits on the length of object names, input names, and descriptions. This is mostly due to design constraints on several of the places where we plan to use these names and descriptions, and also due to size constraints for objects in MediaWiki.
These limits are not yet being strictly enforced, and there are ways to circumvent them. This is to support the transition from the time before these limits were in place: we want to avoid the situation that typos cannot be fixed because a name is too long.
We are asking the Wikifunctions community to help us shorten names and descriptions that are too long. In the next week or so, we will provide a list of objects and the languages that go beyond the new restrictions.
If there are concerns about the restrictions, please raise these on the project chat, or in the other usual channels.
Recent changes in the software
Our big change this week is a new way for the "About widget" to work, as previously discussed at Wikifunctions:Design/About widget improvements[https://www.wikifunctions.org/wiki/Wikifunctions:Design/About_… (T369617[https://phabricator.wikimedia.org/T369617]). Instead of a dialog, you now interact with it in-line where it appears, with each language you select appearing in an accordion view and prompting you with the fallback languages' values, if available. This should make it easier to quickly add multiple languages when editing, and simpler for readers to find labels that they can read. Our thanks to community members who helped us co-design this on-wiki and elsewhere, especially GrounderUK for their detailed feedback.
As part of our work this Quarter towards embedding Wikifunctions calls in wikitext, we re-wrote some of the code, partially preparing it for cross-wiki calls. Expect more to see in this area soon.
We added a visual message, "Running…", to the display when waiting for an Object to get rendered, rather than the page appearing blank (T366722[https://phabricator.wikimedia.org/T366722]). We also updated the alignment of the ">" icon display next to Objects, which had been broken by wider MediaWiki ecosystem updates (T366723[https://phabricator.wikimedia.org/T366723]).
We fixed a bug that meant pressing "back" at some stages would sometimes, but not always, clear the form when creating new Functions (T369713[https://phabricator.wikimedia.org/T369713]).
We prepared for dropping the backwards-compatibility of strings rather than references to refer to programming languages (Z61s) in code blocks, running the migration on the Beta Cluster that we had run in production in May, and making some related improvements (T287153[https://phabricator.wikimedia.org/T287153]).
We re-built the fundamental "ZObjectKeyValue" UX component, which is the fall-back display of a key and its value, to make the code simpler and more robust, and allow for more comprehensive adjustments to it in future. We dropped the local Chip and ChipContainer components, now unused in favour of the Codex ones built after ours (T334738[https://phabricator.wikimedia.org/T334738]).
We dropped our old metrics data collection system, now that we've confirmed the new metrics system is equivalent (T369946). Our thanks to Formafix, who wrote several patches improving our PHP code with better dependency injection, and Bartosz Dziewoński, who found and fixed a bug with our relative path dependencies in our UX code as part of wider MediaWiki fixes (T373065[https://phabricator.wikimedia.org/T373065]).
We, along with all Wikimedia-deployed code, are now using the latest version of the Codex UX library, v1.11.1, as of this week. It should have no user-visible changes on Wikifunctions, so please comment on the Project chat or file a Phabricator task if you spot an issue. We adjusted some of our browser tests to match this (T372415[https://phabricator.wikimedia.org/T372415]).
Function of the Week
No Function of the Week this week, sorry. We'll return with it next week!
Hello guys,
I recently came across the wonderful wikipedia project about circussearch
dumps. Where the entire wikipedia is given in neat JSON format. I was
trying to extract the abstracts (opening_text) tag. I downloaded the
English wiki's content.json file and have been running to extract the
abstracts for all articles.
I realised every odd entry is a data point and even entries are meta-data.
That makes sense for 6.8M wikipedia articles or 61M wikipedia projects (i'm
not sure what's present) but for that it'll be a ballpark number around
122M max no. of entries. But, I have already processed 128M entries from
the JSON.Including checking my script and testing before running it for the
entire JSON file.
That's why I wanted to know how many entries are there for the circussearch
dump JSON file?
https://dumps.wikimedia.org/other/cirrussearch/20240819/https://www.mediawiki.org/wiki/Extension:CirrusSearch/Schema
Thanks
tawsif
The on-wiki version of this newsletter can be found here:
https://www.wikifunctions.org/wiki/Wikifunctions:Status_updates/2024-08-23
--
WasmEdge, Now 300ms Less Edgy
This quarter, the Abstract Wikipedia team committed [https://www.wikifunctions.org/wiki/Wikifunctions:Status_updates/2024-07-03] to improve system performance. We performed a bottleneck analysis of function calls, tracking how much time the system spent making HTTP requests, performing calculations, and managing resources. We found that the single slowest operation–by far!–was starting up the WasmEdge CLI.
We've talked about WasmEdge before[https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2023-10-2…: briefly, it's is a system interface for WebAssembly (WASM). It allows WASM code to interact with the operating system. We mostly use it so that our code executors can read/write standard streams [https://en.wikipedia.org/wiki/Standard_streams] and be guaranteed to touch nothing else – no files, no network access, and no other processes – which is a key part of our work to ensure the integrity and security of the services. Unfortunately, it turns out that WasmEdge takes around 300 ms to spin up in our production environment before it's ready to handle any data.
Our solution is to keep several WasmEdge processes running at all times. That way, when a request is made, the evaluator doesn't have to wait for a new process to get ready: instead, the evaluator can simply use a ready one from the pool and run your request immediately.
We deployed these changes on Wednesday, and, as has been already noticed by many of you, Wikifunctions has become considerably more snappy. Congratulations to the team for getting this deployed!
Recent changes in the software
As mentioned above, the biggest thing we shipped this week was a large re-write of how the back-end "evaluator" service works, which we hope will have a significant improvement on perceived performance. This is one of the big pieces of work we committed to this Quarter. Instead of loading WASM on request, we're now pre-loading it into a pool of ready threads (T371837[https://phabricator.wikimedia.org/T371837]). From some initial testing, calls seem roughly 3–12x faster for single requests (depending), and 30% faster for saturated requests, but we're interested in how well it works for you in practice!
We also made some front-end improvements. When editing Function labels, we now show and enforce the length limit on the front-end (T370995[https://phabricator.wikimedia.org/T370995]) for new labels. If you’ve previously created a label that’s longer than the new limits, we encourage you to shorten it within the new limit. We’ll give everyone some time and help to do this, and we’ll aim to enforce the limit once all the labels are ready. More details will come up in future updates. We also fixed a bug that meant that on Function pages the checkboxes in the tables for Implementations and Test cases were wrongly aligned.
We fixed a bug that meant we would create meta-data objects with a key and no information in some circumstances (T369625[https://phabricator.wikimedia.org/T369625]). We re-wrote our code so that we share the logic to change the page's displayed title when you alter the label between the Function editor and the "About dialog" editor, to be consistent and avoid the chance for bugs (T371350[https://phabricator.wikimedia.org/T371350]).
We worked on some API and general code improvements, to share code more consistently and reliably, which allowed us to add a feature flag for the 'repo' nature of the WikiLambda code. This means that in future we'll be able to install the extension on other wikis in 'client' mode, as part of the work to let you embed calls to Wikifunctions inside articles, using wikitext.
As part of our "Fix-It" technical debt work two weeks ago, we landed a few further improvements. The biggest one was a comprehensive re-write of our front-end styling code to use consistent naming conventions (documented on MediaWiki.org), which helped us find a number of now-unused code and one hard-coded i18n label (T369596[https://phabricator.wikimedia.org/T369596]). We added some testing of the Wikifunctions UX application's browser history editing, as this is a tricky and confusing area to break. We also tweaked a couple of i18n labels to demonstrate to translators where to place the `{{PLURAL:$1|$1 …|$1 …}}` syntax.
Our linguistic diversity is our strength.
Meeting so many multi-lingual users from our community at Wikimania, I came back inspired to think deeper on this topic.
Despite having spent several years learning and actively coding, I hadn’t fully wrapped my head around the subtle yet crucial role that language ability plays in being a coder who writes quality code. If you are proficient in English, the out-of-the-box functions of popular coding languages make more sense to you than otherwise, you will be naturally better at naming your own functions, you comment better, I can go on. What drives our team to improve Wikifunctions every day is the revolutionary idea of allowing users to write code in their own language. This approach moves us beyond the concept of a single dominant language, creating a level playing field that empowers users to write in their own way. Excellent functions have very little to do with the natural language of their labels, and yet, so much to do with building our diverse community of users. We encourage you to write in your native language.
Speaking of labels, if you're new to Wikifunctions and not yet comfortable with writing functions, a great way to contribute to our vibrant community is by translating function labels and descriptions into your language. You can also suggest a function to be created in your language.
Function of the Week: blood compatibility (Z14469)[https://www.wikifunctions.org/view/en/Z14469]
In the past few weeks, I’ve selected functions that are radically different, showcasing the diversity of our function library, even in its early stages. Continuing this theme, this week I have chosen a blood compatibility tool. This Function checks whether two blood groups are compatible, helping to determine if blood donation or receipt is possible.
The Function takes two String inputs for the blood types [https://en.wikipedia.org/wiki/Blood_type] we want to compare and outputs their compatibility in the form of a Boolean. For instance, if you input ‘A’ and ‘B’ as the types you are interested to know the compatibility of, a result of ‘true’ infers they are compatible and ‘false’ if they are not. Maybe in future the Function can be changed to use a custom Type to represent blood type, but this works well enough for now.
The Function currently has one Implementation in Python [https://www.wikifunctions.org/view/en/Z14471] which accepts two blood types as input , converts them to uppercase, and checks if they are valid blood types. It then checks if the pair of blood types (either in the order provided or reversed) is present in the pre-defined plasma_table, which lists compatible plasma donation pairs. If the pair is found in the table, the function returns True, indicating compatibility; otherwise, it returns False. If either of the blood types is invalid, it returns None.
It currently has two Tests that demonstrate what to expect from the function. One for a compatible type [https://www.wikifunctions.org/view/en/Z14472] and another for non-compatibility [https://www.wikifunctions.org/view/en/Z14470]. We encourage you to add more implementations and edge case handling to improve this function.