The on-wiki version of this newsletter can be found here: https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2023-04-19 -- Selecting the right implementation
Functions in Wikifunctions can have more than one implementation. For example, if we have a function that capitalizes the first letter https://wikifunctions.beta.wmflabs.org/wiki/Z10577 of a word, we can have several implementations, e.g. one https://wikifunctions.beta.wmflabs.org/wiki/Z10711 or two in Python https://wikifunctions.beta.wmflabs.org/wiki/Z10713, one in JavaScript https://wikifunctions.beta.wmflabs.org/wiki/Z10712, and one using composition https://wikifunctions.beta.wmflabs.org/wiki/Z10579. You might find some of the implementations surprising. We previously discussed why we made the design choice to allow multiple implementations https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2021-06-17 for a single function.
Until recently, Wikifunctions selected an implementation at random. Meaning, whenever someone was calling a function and there were multiple implementations available, Wikifunctions would select the implementation to be used randomly.
Implementations of the same function can have wildly different runtime behavior. Some can be very slow, and others can be very fast: sorting a list of 100,000 random numbers using bubble sort https://en.wikipedia.org/wiki/Bubble_sort can take a minute on a current processor, but with quicksort https://en.wikipedia.org/wiki/Quicksort the same list of numbers can be sorted in less than two hundredth of a second - faster than the blink of an eye. Much faster.
In Wikifunctions, functions should be accompanied by testers. The capitalization function we talked about earlier has only one tester https://wikifunctions.beta.wmflabs.org/wiki/Z10578 as this is being written, that checks that capitalizing the word “test” returns “Test”. If all goes well, Wikifunctions will run each tester on each implementation. The results of these tests are stored: does the implementation pass, how many resources does it require, and other meta-data. This run-time information is also shown to the user in a pop-up on request, for people interested in the back-end details.
Wikifunctions now ranks the implementations based on this meta-data, and updates the internal order of the implementations. Test failures result in downgrades, and quick results lead to a better ranking. And so, for the last few weeks, instead of selecting an implementation at random, we now select the first implementation based on that ranking. Here is an example of that reordering https://wikifunctions.beta.wmflabs.org/w/index.php?title=Z10577&diff=prev&oldid=4357 working in practice (but alas, diffs are not implemented yet).
This should lead to a considerable reduction in used resources, and to a more consistent behavior of Wikifunctions. Function calls should produce timeouts less often. This should also relieve the Wikifunctions community from worrying about inefficient implementations and whether we should accept them or not. Often, algorithms which are simpler are easier to read and verify, but are slower: bubble sort is a good example of this, compared with quicksort. Bubble sort is generally regarded to be much easier to explain and understand than quicksort. Having both allows for the results of the simpler implementation to be compared to results of the more complex implementation, with both passing the same suite of testers, and thus increase our confidence in the overall system. At the same time, we can in practice use the more efficient implementation and thus reduce overall resource usage.
With this, the first version of a major element that will work behind the scenes of Wikifunctions has been put into place, and we have delivered another goal of the current phase. Maria Keet’s reflection on Abstract Wikipedia so far
Maria Keet http://www.meteck.org/ has been an active and central part of the Natural Language Generation Workstream. She is a professor at the University of Cape Town, South Africa, and her collaboration with Ariel Gutman https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2022-12-19 on the template language https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Template_Language_for_Wikifunctions and her arguments have been mentioned in the fellows’ evaluation https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Google.org_Fellows_evaluation and the answer https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Google.org_Fellows_evaluation_-_Answer. Maria has now written down her own reflections and published them on her blog:
keet.wordpress.com/2023/03/14/some-reflections-on-designing-abstract-wikipedia-so-far/
The text is very accessible, gives context, and explains some of the issues that low resource languages face, and makes suggestions on how to proceed. Maria also describes some of the frustrating challenges she encountered in having her voice heard and recognized. That part makes for a painful read, and points to necessary changes.
To repeat her closing words:
The mountain we’ll keep climbing, be it with or without the Abstract Wikipedia project. If Abstract Wikipedia is to become a reality and flourish for many languages soon, it needs to allow for molehills, anthills, dykes, dunes, and hills as well, and with whatever flowers available to set it up and make it grow.
We are thankful to Maria for her ongoing contributions. We hope that we can achieve a more inclusive space, with the goal to have contributing become a more wholesome experience. Talk about Abstract Wikipedia in Sweden
Professor Aarne Ranta https://www.cse.chalmers.se/~aarne/ will give a talk on Natural Language Generation and Abstract Wikipedia on Thursday, April 20th, 2023 at 17:30 local time, in the Maritime Museum and Aquarium https://sv.wikipedia.org/wiki/Sj%C3%B6fartsmuseet_Akvariet in Göteborg, Sweden. The in-person event is free for the public. The talk will be given in Swedish.
You can find more information about the talk in Swedish here:
https://www.vetenskapsfestivalen.se/for-alla/kunskap-utan-granser-abstract-w...
And thank you for your patience while we took a break on the newsletters.
On Wed, Apr 19, 2023 at 9:55 PM Denny Vrandečić dvrandecic@wikimedia.org wrote:
The on-wiki version of this newsletter can be found here: https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2023-04-19 -- Selecting the right implementation
Functions in Wikifunctions can have more than one implementation. For example, if we have a function that capitalizes the first letter https://wikifunctions.beta.wmflabs.org/wiki/Z10577 of a word, we can have several implementations, e.g. one https://wikifunctions.beta.wmflabs.org/wiki/Z10711 or two in Python https://wikifunctions.beta.wmflabs.org/wiki/Z10713, one in JavaScript https://wikifunctions.beta.wmflabs.org/wiki/Z10712, and one using composition https://wikifunctions.beta.wmflabs.org/wiki/Z10579. You might find some of the implementations surprising. We previously discussed why we made the design choice to allow multiple implementations https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2021-06-17 for a single function.
Until recently, Wikifunctions selected an implementation at random. Meaning, whenever someone was calling a function and there were multiple implementations available, Wikifunctions would select the implementation to be used randomly.
Implementations of the same function can have wildly different runtime behavior. Some can be very slow, and others can be very fast: sorting a list of 100,000 random numbers using bubble sort https://en.wikipedia.org/wiki/Bubble_sort can take a minute on a current processor, but with quicksort https://en.wikipedia.org/wiki/Quicksort the same list of numbers can be sorted in less than two hundredth of a second - faster than the blink of an eye. Much faster.
In Wikifunctions, functions should be accompanied by testers. The capitalization function we talked about earlier has only one tester https://wikifunctions.beta.wmflabs.org/wiki/Z10578 as this is being written, that checks that capitalizing the word “test” returns “Test”. If all goes well, Wikifunctions will run each tester on each implementation. The results of these tests are stored: does the implementation pass, how many resources does it require, and other meta-data. This run-time information is also shown to the user in a pop-up on request, for people interested in the back-end details.
Wikifunctions now ranks the implementations based on this meta-data, and updates the internal order of the implementations. Test failures result in downgrades, and quick results lead to a better ranking. And so, for the last few weeks, instead of selecting an implementation at random, we now select the first implementation based on that ranking. Here is an example of that reordering https://wikifunctions.beta.wmflabs.org/w/index.php?title=Z10577&diff=prev&oldid=4357 working in practice (but alas, diffs are not implemented yet).
This should lead to a considerable reduction in used resources, and to a more consistent behavior of Wikifunctions. Function calls should produce timeouts less often. This should also relieve the Wikifunctions community from worrying about inefficient implementations and whether we should accept them or not. Often, algorithms which are simpler are easier to read and verify, but are slower: bubble sort is a good example of this, compared with quicksort. Bubble sort is generally regarded to be much easier to explain and understand than quicksort. Having both allows for the results of the simpler implementation to be compared to results of the more complex implementation, with both passing the same suite of testers, and thus increase our confidence in the overall system. At the same time, we can in practice use the more efficient implementation and thus reduce overall resource usage.
With this, the first version of a major element that will work behind the scenes of Wikifunctions has been put into place, and we have delivered another goal of the current phase. Maria Keet’s reflection on Abstract Wikipedia so far
Maria Keet http://www.meteck.org/ has been an active and central part of the Natural Language Generation Workstream. She is a professor at the University of Cape Town, South Africa, and her collaboration with Ariel Gutman https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2022-12-19 on the template language https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Template_Language_for_Wikifunctions and her arguments have been mentioned in the fellows’ evaluation https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Google.org_Fellows_evaluation and the answer https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Google.org_Fellows_evaluation_-_Answer. Maria has now written down her own reflections and published them on her blog:
keet.wordpress.com/2023/03/14/some-reflections-on-designing-abstract-wikipedia-so-far/
The text is very accessible, gives context, and explains some of the issues that low resource languages face, and makes suggestions on how to proceed. Maria also describes some of the frustrating challenges she encountered in having her voice heard and recognized. That part makes for a painful read, and points to necessary changes.
To repeat her closing words:
The mountain we’ll keep climbing, be it with or without the Abstract Wikipedia project. If Abstract Wikipedia is to become a reality and flourish for many languages soon, it needs to allow for molehills, anthills, dykes, dunes, and hills as well, and with whatever flowers available to set it up and make it grow.
We are thankful to Maria for her ongoing contributions. We hope that we can achieve a more inclusive space, with the goal to have contributing become a more wholesome experience. Talk about Abstract Wikipedia in Sweden
Professor Aarne Ranta https://www.cse.chalmers.se/~aarne/ will give a talk on Natural Language Generation and Abstract Wikipedia on Thursday, April 20th, 2023 at 17:30 local time, in the Maritime Museum and Aquarium https://sv.wikipedia.org/wiki/Sj%C3%B6fartsmuseet_Akvariet in Göteborg, Sweden. The in-person event is free for the public. The talk will be given in Swedish.
You can find more information about the talk in Swedish here:
https://www.vetenskapsfestivalen.se/for-alla/kunskap-utan-granser-abstract-w...
Good reading on Maria Keet's blog! She aptly states:
It may be tempting to (over)generalise for other languages once one
speaks several languages, but it’s better to be safe than sorry.
That has been the case from what I've seen, where overgeneralizing hurts long term. But luckily some frameworks support a bite-sized approach to rules, such as what Maria hinted at. I agree with her that GF Grammatical Framework and others are not for the faint of heart, and I also wish things were simpler. Still, GF Grammatical Framework indeed allows a focused bite-sized approach for language rules. Case in point, the categories of Given Name and Second Name, are not generalized within just 1 function, but many functions across each Language as necessary! This was added across 26 languages (wish it was more!) just 4 months ago: added GN & SN categories for constructing names · GrammaticalFramework/gf-rgl@7085aca (github.com) https://github.com/GrammaticalFramework/gf-rgl/commit/7085acacc930ad1ca130e168a7086b75bc0a53ed
I found out today from my wife that there are indeed Chinese compound surnames, like Ouyang - Wikidata https://www.wikidata.org/wiki/Q1927285 , where in the Lexeme namespace we do not have such categorization, but luckily we have the Wikidata namespace and others that hold a wealth of knowledge and categories, like Chinese compound surname - Wikidata https://www.wikidata.org/wiki/Q847773
My move to China has solidified my passion for "context is king" and indeed any Wikifunctions or "article generators" will need to tiptoe into the full knowledge stored across Wikidata's namespaces in order to retrieve broad domain content that will be necessary to support Abstract Wikipedia across unique, low-resourced languages like NCB languages. I fully agree with her that it can and should be done in an agile fashion. Template languages; quick small trial and error experiments. Even while the grand vision is slow-paced, we can have faster-paced experiments with incremental grammar development for faster feedback. Agile.
Where that "context" will ultimately be stored in Abstract Wikipedia still is yet to be determined, I think. It might ultimately live in the language specific Constructors. And that's probably a good thing, and helps to avoid overgeneralizing. Copy/paste from other language Constructors or borrowing their templates will make things easier, I guess.
Still, LOTS of volunteer work is needed to make sure categories in general, like in linguistic systems such as GF, UD, Lemon structure, etc. are mapped well across each other so that outside experts can tiptoe into the ecosystem also. Those maps could live inside the Wikidata namespace, just as we did with Schema.org and other Linked Data, but they can certainly just start to be stored in a spreadsheet somewhere started by some Outreachy participants.
Thad https://www.linkedin.com/in/thadguidry/ https://calendly.com/thadguidry/
abstract-wikipedia@lists.wikimedia.org