The on-wiki version of this newsletter is here:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2021-08-27
--
When we started the development effort towards the Wikifunctions site, we
sub-divided the work leading up to the launch of Wikifunctions into eleven
phases <https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Phases>, named
after the first eleven letters of the Greek alphabet.
- With Phase α (alpha) completed, it became possible to create instances
of the system-provided Types in the wiki.
- With Phase β (beta), it became possible to create new Types on-wiki
and to create instances of these Types.
- With Phase γ (gamma), all the main Types of the pre-generic function
model were available.
- With Phase δ (delta), it became possible to evaluate built-in
implementations.
- With Phase ε (epsilon), it became possible to evaluate
contributor-written implementations in any of our supported programming
languages.
- This week, we completed Phase ζ (zeta).
The goal of Phase ζ has been to provide the capability to evaluate
implementations composed of other functions.
What does this mean? Every Function in Wikifunctions can have several
Implementations. There are three different ways to express an
Implementation:
1. As a built-in Function, written in the code of Wikilambda: this means
that the Implementation is handled by the evaluator natively using code
written by the team.
2. As code in a programming language, created by the contributors of
Wikifunctions: the Implementation of a Function can be given in any
programming language that Wikifunctions supports. Eventually we aim to
support a large number of programming languages; for now we support
JavaScript and Python.
3. As a composition of other Functions: this means that contributors can
use existing Functions as building blocks in order to implement new
capabilities.
With Phase ζ we close the trilogy of Phases dealing with the different ways
to create Implementations.
Besides making composition work, we also spent some time on other areas.
We worked to reduce technical debt that we accumulated in development
during the last two phases which we rushed in order to be ready for the
security and performance reviews. We improved how the error system works,
re-worked the data model for Testers and Errors, refactored the common
library to be more extensible, moved the content of the wiki to the main
namespace, and changed Python function definitions to align with the style
we use for JavaScript ones.
We started with some work to make the current bare-bones user experience
better. This included displaying Testers' results and meta-data on their
own page as well as related Function and Implementation pages. Functions
and Implementations can be easily called right from their page. We made it
much easier to create and connect Implementations and Testers with their
functions, started on the designs for Function definition and
implementation, and implemented aliases that sit alongside labels, much
like in Wikidata. Plenty done!
We are now moving on to Phase η (eta). The three main goals of phase η is
to finish the re-work of the Error system, to revisit user-defined types
and integrate them better with validators, and to allow for generic types.
What are generic types?
We have a type for a list of elements. But instead of saying “this is a
list of elements”, we can often be more specific, and for example say “this
is a list of strings”. Why is that useful? Because now, if, for example, we
have a function to get the first element of a list, we know that this
function will return a string when given this kind of list. This allows us
to then offer a better user experience by making more specific suggestions,
because now the system knows that it can suggest functions that work with
strings. We can also check whether an implementation makes sense by
ensuring that the types fit. We won’t be able to do that in all cases, but
having generics will allow us to increase the number of cases where we can
do that by a lot. For more background you can refer to the Wikipedia
article on generic programming
<https://en.wikipedia.org/wiki/Generic_programming>.
In this example case, instead of a special type representing a list of
strings, we will have a function that takes a type and returns a typed
list. If you then call this function with the string type as the argument,
the result of the function will be the concept of a list of strings. And
you can easily use that for any other type, including user-defined types.
My thanks to the team! My thanks to the volunteers! Some of us are starting
to have fun using the prototype, playing with implementations across
different programming languages interacting with each other in non-trivial
ways, and starting to build a small basic library of functions. This will
also be the phase where we move from the pre-generic data model
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Pre-generic_function_mod…>
to
the full function model
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Function_model>. To
give due warning: this probably means that almost everything will need to
be re-written by the end of this phase, in order to take advantage of the
generic system that we are introducing.
Thank you for accompanying us on our journey!
The on-wiki version of this newsletter is here:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2021-09-10
--
One of the early types of functions we want to start building in
Wikifunctions are functions that perform regular morphological
transformations on words. That is, functions that, given the base form of a
word, can create the regular inflected forms of a word. Or, to give an
example: that can tell me that the plural of *“book”* in English is
*“books”*.
English is a comparably simple example, but that should make it easier to
sketch out the proposal in this newsletter. In many other cases, the
morphological functions and the grammar are likely to be more complicated.
The most regular way to create a plural from an English noun’s base form is
to add the letter *“s”* to it. Let’s now see how many of Wikidata’s entries
would be covered by this simple rule.
Wikidata currently has about 28,100 <https://w.wiki/43N6> English nouns.
Whereas Wikidata allows for a lot of flexibility when entering
lexicographical entries, Wikifunctions will require the data to have a more
predictable shape in order to use it effectively. One way to express these
shapes is through lexical masks <https://github.com/google/lexical-masks/>.
English nouns have two different lexical masks
<https://github.com/google/lexical-masks/blob/master/masks/en.json>: one
with only two forms (a singular and a plural, e.g. *“book”* and *“books”*)
and one with four forms (including two genitive forms, i.e. *“book’s”* and
*“books’”*). Both of these masks have been automatically translated
<https://github.com/google/lexical-masks/blob/master/shex/en.shex> into Shex
<https://www.wikidata.org/wiki/Wikidata:WikiProject_Schemas>, the language
that is used by Wikidata for checking data completeness. But only the
two-form version has been turned into an Entity Schema in Wikidata
<https://www.wikidata.org/wiki/EntitySchema:E155>.
Now we can take the 28,000 English nouns in Wikidata and check how many of
them fulfill the requirements described above (let me know if there is
interest in the code). It turns out that more than 25,500, that is more
than 91% of the nouns, fulfill the requirement. And all of them fulfill the
two-form schema. Four nouns (*contract
<https://www.wikidata.org/wiki/Lexeme:L5605>*, *player*
<https://www.wikidata.org/wiki/Lexeme:L5607>, *swimmer*
<https://www.wikidata.org/wiki/Lexeme:L7384>, and *sport
<https://www.wikidata.org/wiki/Lexeme:L301>*) almost fulfill the four-form
schema, but on each of them the cases on the nominative forms are missing.
<https://meta.wikimedia.org/wiki/File:Book_to_books_in_NotWikiLambda.png>
<https://meta.wikimedia.org/wiki/File:Book_to_books_in_NotWikiLambda.png>
Evaluating "Add s" on "book" in NotWikiLambda
So let’s focus on the 25,500 nouns that pass the structural requirements.
We created a function that adds the letter *“s”* at the end of the word in
NotWikiLambda. When we count how many of the plurals are generated this
way, we see that 21,000 English nouns are created correctly by simply
adding *"s"*, 82% of all nouns. Adding *“s”* is one paradigm, and, as we
can see, the most common one for English nouns.
On the right-hand side of the Function's page you can see a heading
“Evaluate Function,” and there you can enter a value, say *“book”*. If you
click on “Call Function” below, the result *“books”* should come back.
(Note that WikiLambda <https://www.mediawiki.org/wiki/Extension:WikiLambda> is
in heavy development, and the test site
<https://notwikilambda.toolforge.org/> might have hiccups at any time. A
screenshot of the evaluation working correctly is shown here.)
Another paradigm works for many English nouns that end with the letter *“y”*.
There are many cases where we replace the letter *“y”* with the letter
*“ies”*, e.g. when turning *“baby”* into *“babies”*, or *“fairy”* into
*“fairies”*. We created the function replacing *“y”* at the end with *“ies”*
<https://notwikilambda.toolforge.org/wiki/Z10129> in NotWikiLambda. When we
run this paradigm against the nouns in Wikidata, more than 2,000 nouns
(almost 8%) get covered by this function.
<https://meta.wikimedia.org/wiki/File:Baby_to_babies_in_NotWikiLambda.png>
<https://meta.wikimedia.org/wiki/File:Baby_to_babies_in_NotWikiLambda.png>
Evaluating "Replace y with ies at end" in NotWikiLambda
We could create further paradigms (e.g. add *“es”*, which would cover more
than 1,800 nouns), and we could even write a single function which tries to
discern which of these functions to apply (e.g. if it ends with *“s”* or
*“sh”*, add *“es”*; if it ends with a *“y”* preceded by a consonant,
replace that *“y”* with an *“ies”*; else simply add an *“s”*, etc.), which
would give us a more powerful function that can deal with many more words
(a bit of experimentation got me to a function
<https://notwikilambda.toolforge.org/wiki/Z10132> that covers 98.3% of all
cases).
Grammatical Framework has introduced these functions as so-called smart
paradigms <https://aclanthology.org/E12-1066.pdf>. Their web-based
implementation of smart paradigms
<https://cloud.grammaticalframework.org/gfmorpho/> for English nouns covers
96% of the nouns in Wikidata. I would be very curious to see how either of
these numbers compares to modern, machine-learning based solutions, and I
also want to invite people to create an even smarter paradigm with better
coverage without the code becoming too complex.
Smart paradigms are useful when data in Wikidata is incomplete. For example
for loan words, technical terms, neologisms, names, or when verbing nouns
<https://www.gocomics.com/calvinandhobbes/1993/01/25> (so-called conversion
<https://en.wikipedia.org/wiki/Conversion_(word_formation)#Verb_conversion_i…>),
we might need to create a form automatically that Wikidata doesn’t yet
explicitly know about.
As this week’s entry is already getting quite long, we will defer to next
time the discussion of some of the possibilities of how those paradigms
implemented in Wikifunctions might interplay with the lexicographic data in
Wikidata. This will also shed more light on the role that the morphological
paradigms might play for Abstract Wikipedia in the future.
----
In other news:
This week, Abstract Wikipedia was covered within the US NPR radio news
programme The World
<https://en.wikipedia.org/wiki/The_World_(radio_program)>. Host Marco
Werman interviewed Denny
<https://www.pri.org/file/2021-09-07/wikipedia-s-efforts-get-its-300-languag…>
in
a five-minute segment that was broadcasted on numerous public radio
stations. The segment is now also available online.
The German public TV station 3sat <https://en.wikipedia.org/wiki/3sat>
broadcast
a documentary about Wikipedia this week: “Wikipedia - Die Schwarmoffensive”
<https://www.3sat.de/film/dokumentarfilm/wikipedia--die-schwarmoffensive-100…>.
The German-language documentary can be viewed online from Germany,
Switzerland, and Austria. It also discusses Abstract Wikipedia for a few
minutes at the end of the documentary.
Is very hard to make large or even medium size corpus of sentences, in
which each word would be manually annotated with sense.
Abstract Wikipedia not only allows generate text in many languages from one
source but can be WSD corpus. Moreover: in many languages.
This allows understanding natural text and operations like:
1) translation from any natural language to disambig form
2) translate from this form to other natural language
and after step 1 this form will very useful not only for translation
I was interested in this Abstract Wikipedia project one year ago.Now I'm
not up to date on the topic
On Arctic Knot conference will be look on project as database of
disambiguated knowledge?
The on-wiki version of this newsletter is here:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2021-09-03
Due to the embedded table it might be easier to read on-wiki.
--
The update this week has been written by Mahir Morshed
<https://meta.wikimedia.org/wiki/User:Mahir256>. Mahir is a long time
contributor to Wikidata and particularly also to the lexicographical data
on Wikidata. He has developed a prototype that generates natural language
from an abstract content representation in Bengali and Swedish, a prototype
with the goal that this could be implementable within Wikifunctions. In
this newsletter, Mahir describes the prototype.
------------------------------
Discussion around Abstract Wikipedia's natural language generation
capabilities has revolved around the presence of abstract constructors and
concrete renderers per language, while also noting the use of Wikidata
items and lexemes as a basis for mapping concepts to language. In the
interest of making this connection a bit clearer to imagine, I have started
to build a text generation system. This uses items, lexemes, and wrappers
for them as building blocks, and these blocks are then assembled into
syntactic trees, based in part on the Universal Dependencies
<https://universaldependencies.org/> syntactic annotation scheme.
(If this seems like a different approach from what was discussed in a
newsletter two months prior
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2021-06-24>,
that's because it is. Feel free to drop me a message if you'd like to
discuss it.)
The system is composed of three parts, where the last is likely to be
something we could skip in a port to Wikifunctions:
- Ninai <https://bitbucket.org/mmorshe2/ninai/> (from the Classical
Tamil for "to think") holds all constructors, logic at a sufficiently high
level for renderers, and a resolution system from items (each wrapped in a
"Concept" object) to sense IDs for a given language. Decisions and actions
in Ninai are meant to be agnostic to the methods for text formation
underneath, which are supplied by...
- Udiron <https://bitbucket.org/mmorshe2/udiron/> (from the Bengali
pronunciation of the Sanskrit for "communicating, saying, speaking"), which
holds lower-level text manipulation functions for specific languages. These
functions operate on syntactic trees of lexemes (each lexeme wrapped in a
"Clause" object). These lexemes are imported via...
- tfsl <https://phabricator.wikimedia.org/source/tool-twofivesixlex/> (from
"twofivesixlex"), a lexeme manipulation tool, which is intended to be akin
to pywikibot but with a specific focus on the handling of Wikibase objects.
Both of the above components depend on this one, although if 'native' item
and lexeme access and manipulation becomes possible with Wikifunctions
built-ins then tfsl could possibly be omitted.
Some design choices in this system worth noting are as follows:
- Constructors, while being language-agnostic and falling within some
portion of a class hierarchy, are purely containers for their arguments,
carrying no other logic within. This means, for example, that an instance
of a constructor Existence(subject), to indicate that the subject in
question exists, only holds that subject within that instance, and does
nothing else until a renderer encounters that constructor.
- Every constructor allows, in addition to any required inputs, a list
of extra modifiers in any order (the 'scope' of the idea represented by
that constructor). This means, for example, that a constructor
Benefaction(benefactor,
beneficiary) might be invoked with extra arguments for the time, place,
mode, and other specifiers after the beneficiary.
- When one 'renders' a composition of constructors, a Clause object
(representing the root of a syntactic tree) is returned; turning it into a
string of text is done with Python's str() built-in applied to that
object.
At the moment, there are just enough constructors to represent Sentence 1.1
from the Jupiter examples
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Examples/Jupiter>, as
well as renderers in Bengali and Swedish for those constructors (thanks to
Bodhisattwa <https://meta.wikimedia.org/wiki/User:Bodhisattwa>, Jan
<https://meta.wikimedia.org/wiki/User:Ainali>, and Dennis
<https://meta.wikimedia.org/wiki/User:So9q> for feedback on those).
Building up to the Jupiter sentence should demonstrate how these work:
Building up to the Jupiter sentence step by step
Constructor textBengali outputSwedish outputGloss *(not renderer output!)*
Notes
Identification(
Concept(Q(319)),
Concept(Q(634)))
বৃহস্পতি গ্রহ। Jupiter är klot. Jupiter is planet. We start by simply
*identifying* the two *concepts* of Jupiter (Q319)
<https://www.wikidata.org/wiki/Q319> and planet (Q634)
<https://www.wikidata.org/wiki/Q634> as being equal.
Identification(
Concept(Q(319)),
Instance(
Concept(Q(634))))
বৃহস্পতি একটা গ্রহ। Jupiter är ett klot. Jupiter is a planet. Instead of
equating the concepts alone, we might instead equate "Jupiter" with an
*instance* of "planet".
Identification(
Concept(Q(319)),
Instance(
Concept(Q(634)),
Definite()))
বৃহস্পতি গ্রহটি। Jupiter är klotet. Jupiter is the planet. We may further
refine that by making clear that "Jupiter" is a *definite* instance of
"planet".
Identification(
Concept(Q(319)),
Instance(
Attribution(
Concept(Q(634)),
Concept(Q(59863338))),
Definite()))
বৃহস্পতি বড় গ্রহটা। Jupiter är det stora klotet. Jupiter is the large
planet. Now we might ascribe an *attribute* to the definite planet instance
in question, this attribute being large (Q59863338)
<https://www.wikidata.org/wiki/Q59863338>.
Identification(
Concept(Q(319)),
Instance(
Attribution(
Concept(Q(634)),
Superlative(
Concept(Q(59863338)))),
Definite()))
বৃহস্পতি সবচেয়ে বড় গ্রহটি। Jupiter är det största klotet. Jupiter is the
largest planet. This attribute being *superlative* for Jupiter can be
marked by modifying the attribute.
Identification(
Concept(Q(319)),
Instance(
Attribution(
Concept(Q(634)),
Superlative(
Concept(Q(59863338)),
Locative(
Concept(Q(544))))),
Definite()))
বৃহস্পতি সৌরমণ্ডলে সবচেয়ে বড় গ্রহ। Jupiter är den största planeten i
solsystemet. Jupiter is the largest planet in the solar system. Once we
specify the *location* where Jupiter being the largest applies (that is, in
the Solar System (Q544) <https://www.wikidata.org/wiki/Q544>), we're done!
Note that the sense resolution system does not have enough information to
choose which of '-টা' or '-টি' (for Bengali) or of 'klot' or 'planet' (for
Swedish) to use in some of these examples, so currently in the prototype
one is chosen at random. This therefore means that re-rendering any
examples which pull those in might use something different.
Besides this, there is clearly a lot more functionality to be added, and
because Bengali and Swedish are both Indo-European languages (however
distant), there are likely linguistic phenomena that won't be considered
simply by developing renderers for those two languages alone. If there's
something particular in your language that isn't present in those two
languages, this may then raise the question: what can you do for your
language?
I can think of at least four things, not in any particular order:
- Create lexemes and add senses to them! What matters most to the system
is that words have meanings (possibly in some context, and possibly with
equivalents in other languages or to Wikidata items) so that those words
can be properly retrieved based on those equivalences; that these words
might have a second-person plural negative past conditional form is largely
secondary!
- Think about how you might perform some basic grammatical tasks in your
language: how do you inflect adjectives? add objects to verbs? indicate in
a sentence where something happened?
- Think about how you might perform higher-level tasks involving
meaning: what do you do to indicate that something exists? to indicate that
something happened in the past but is no longer the case? to change a
simple declarative sentence into a question?
- If you have some ideas on how to render the Jupiter sentence in your
language, and the lexemes you would need to build that sentence exist on
Wikidata, and those lexemes have senses for the meanings those lexemes take
in that sentence, let me know!
We'd love to hear your thoughts on this prototype, and what it might mean
for realizing Abstract Wikipedia through Wikidata's lexicographic data and
Wikifunctions's platform.
------------------------------
Thank you Mahir for the great update! If you too want to contribute to the
weekly, get in touch. This is a project we all build together.
In addition, this week Slate published a great explaining article on the
goals of Abstract Wikipedia and Wikifunctions: Wikipedia Is Trying to
Transcend the Limits of Human Language
<https://slate.com/technology/2021/09/wikipedia-human-language-wikifunctions…>
The on-wiki version of this newsletter is here:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2021-08-20
--
Last weekend was Wikimania 2021 <https://wikimania.wikimedia.org/>, and it
was a blast! There were more than 4000 registrations for the virtual event,
making it the largest Wikimania by far. If you missed it, don't worry!
Sessions have been recorded and can all be watched online at your leisure.
The work of the team on building Wikifunctions and Abstract Wikipedia was
represented at a session
<https://wikimania.wikimedia.org/wiki/2021:Submissions/Wikifunctions_and_Abs…>
where
we first had a short presentation and then over an hour of questions and
comments with most of the team and associated volunteers. It was brilliant
to hear from so many interested people, and there were a lot of insightful
questions, comments, and concerns.
I think we got people excited and had good answers for their questions, but
that's of course for you to decide. I would encourage everyone who missed
it to watch the overview introduction
<https://wikimania.wikimedia.org/wiki/File:Wikimania_2021_Abstract_Wikipedia…>
and the longer full session <https://www.youtube.com/watch?v=LecYqXHvHfg> if
they have time. We'd be delighted to follow-up on any thoughts you might
have.
Beyond our session, there were many brilliant sessions exploring our
communities, our priorities, and our futures. Here is a small selection of
sessions which I think were particularly relevant to people interested in
Wikifunctions and Abstract Wikipedia:
- Radhika Mamidi led a great session
<https://wikimania.wikimedia.org/wiki/2021:Submissions/To_translate_or_not_t…>
(video <https://www.youtube.com/watch?v=Jb9XCVMiSZ8>) on the use of
machine translation to create articles, and particularly their use on the
Indic language wikis.
- Deryck Chan presented
<https://wikimania.wikimedia.org/wiki/2021:Submissions/Cross-wiki_ideologica…>
(video <https://www.youtube.com/watch?v=s6zRuU6DqXY>) about the
challenges that different wikis' and language communities' backgrounds may
have on what we consider "reliable sources", how this leads to cross-wiki
conflict, and how this may need to change.
- A panel talked about integrating Wikidata into the Wikimedia projects
<https://wikimania.wikimedia.org/wiki/2021:Submissions/Integrating_Wikidata_…>
(video <https://www.youtube.com/watch?v=AveonN5pHwY>) which discusses
the use of Wikidata as a central knowledge base within different Wikipedia
language projects. This is a path that Abstract Wikipedia will explore even
more, which is why these experiences are crucial to what we are building.
- In a similar vein is the presentation on Domain Specific Content
Generation using Human Bot Collaboration
<https://wikimania.wikimedia.org/wiki/2021:Submissions/Domain_Specific_Conte…>
(video <https://www.youtube.com/watch?v=M6T_UygofSw>) by Praveen
Garimella and Vasudeva Varma. They were guiding students to work with bots
in order to scale up the creation of good content in underrepresented Indic
languages. We certainly hope that they will become early adopters of the
Abstract Wikipedia framework.
- Wikidata: What happened? Where are we going?
<https://wikimania.wikimedia.org/wiki/2021:Submissions/Wikidata:_What_happen…>
(video <https://www.youtube.com/watch?v=ymMxPsNGI64>) was led by Lydia
Pintscher, and particularly relevant to Abstract Wikipedia fans covered the
state of Wikidata's lexicographic data and the plans for improving that.
- There was a great panel on English as a lingua franca of the Wikimedia
movement
<https://wikimania.wikimedia.org/wiki/2021:Submissions/English_as_a_lingua_f…>
(video <https://www.youtube.com/watch?v=2X6UJ25TiN8>) which raised some
important points around how we as a movement can better involve and engage
with all people regardless of the languages they speak. Ensuring that we
make decisions for all languages and wikis is key to making Wikifunctions
and Abstract Wikipedia a success.
- The round-table on Wikimedia's Universal Code of Conduct
<https://wikimania.wikimedia.org/wiki/2021:Submissions/Universal_Code_of_Con…>
(video <https://www.youtube.com/watch?v=qAiYUdXRV6E>) was a great
opportunity to discuss the introduction of the Code of Conduct to the
projects. This is directly relevant for Wikifunctions as it will also be
covered by the Code.
Right now videos of all sessions are on YouTube; they will be copied to
Wikimedia Commons by the organising team in the coming weeks. My thanks
again to the organisers of Wikimania 2021, and to all the speakers and
participants that made it so great.
------------------------------
The Grammatical Framework Summer School was held from 26th July to 6th
August. The Wikimedia community was invited to join the Summer School, and
ten Wikimedians took the time and the opportunity to join and learn about
Grammatical Framework, Natural Language Generation, and related topics.
There were a total of 51 registered students. We are very excited to see
this kind of knowledge transfer.
You can catch up with all recorded sessions through the GF Summer School
2021 playlist
<https://www.youtube.com/playlist?list=PL7VkoRLnYYP6EZngakW7lNNCTjfC93uh0>.
If you are either one of the few who participated, and want to repeat
something, or if you want to browse the videos and learn about the
technologies and topics, feel free to take a look and enjoy the talks. Many
hours of material are available.
Thanks to the Grammatical Framework community, the people at CCLAW in
Singapore <https://cclaw.smu.edu.sg/>, and Dr Inari Listenmaa for the
opportunity for Wikimedians to join and for organizing the summer school.
------------------------------
We have started a documentation page for external outreach
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/External_outreach>
coordination
and advice. There are notes on the talkpage, and any feedback is
appreciated. The list is not exhaustive yet. Please contact us in case we
missed you.
Kudos to Denny and Quiddity for making the Updates page:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates
Love the navigation as well.
Also, really well done to bring things together via the side info panel.
But I noticed there's no linking between the Meta wiki page and Wikidata
itself?
It's on Meta, so how would we link the Meta wiki page of the same
concept/project: https://meta.wikimedia.org/wiki/Abstract_Wikipedia
to the Wikidata concept page: https://www.wikidata.org/entity/Q96807071
Generally, how do Projects/Concepts within Meta get linked with concepts in
Wikidata.
Thoughts ?
Doh! Nevermind, it is linked in the graph, but the display didn't help
much to see that important aspect. Multi-lingual sites box is all the way
at the bottom.
Perhaps if Multi-lingual sites has at least 1 entry it might be best to
move that closer to the top? Phab ticket?
[image: image.png]
Thad
https://www.linkedin.com/in/thadguidry/https://calendly.com/thadguidry/
The on-wiki version with the embedded video can be found here:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2021-08-13
--
The team has been busy developing features and designing interfaces for
Wikifunctions, and we are moving towards closing the current phase of the
development
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Phases#Phase_%CE%B6_(zet…>.
Aliases are now available in the data model, the data model for testers has
been updated, error objects have been considerably reworked, the evaluation
model can now deal with recursive calls and lazy evaluation, new built-in
functions for string and boolean equality have landed, and more. It is
exciting to see the pieces coming together.
In today’s weekly, we want to take a look at Testers and their current
implementation. Lindsay has created a screencast that you can watch (it is
without sound), and here we will describe what is happening in the video.
<https://meta.wikimedia.org/wiki/File:Wikilambda_Zeta_Tester.webm>
<https://meta.wikimedia.org/wiki/File:Wikilambda_Zeta_Tester.webm>
We start with creating a new function definition, *“reverse string”*, which
takes a single string as an input and returns a string. On saving (0:19),
the function is created and assigned Z10000. Now we edit the newly created
function, and we create a first tester inline. We give it the name *“test
-> tset”*, and set the argument to “test” and then use the *“String
equality”* function to compare it to the expected result, “tset” (0:45).
*“String equality”* is a built-in function (Z866 on a fresh Wikilambda
installation) that takes two strings as the arguments and returns True if
they are the same, and False otherwise.
Note that even though we have created the Tester inline, in the background
a new page was created (entirely behind the scenes it was assigned ZID
Z10001) that holds the test.
Next, we create a test for the input “racecar”, which is a palindrome,
using the same built-in function (1:00), and a test reversing “banana” and
getting the output “wrong” (which is an example for a bad test) (1:19).
Next we create an implementation for *“reverse string”* in JavaScript. At
the bottom of the page we already see our three testers working, showing
that they all fail initially (1:30). Now we start implementing the
function, and we enter “return Z10000K1” - and without even saving, the
testers are run against our implementation and we can see that the “racecar”
test passes! (It passes because it is a palindrome, and returning the input
unchanged happens to be a correct implementation for palindromes). The
other two tests keep failing, though (1:41).
We complete the implementation by taking the input, splitting it into an
array of strings, reversing that array, and then joining the strings of the
array again into a single string. Now the first test, *“test -> tset”* also
passes, but the “banana” test (due to being actually a faulty test)
continues to fail (1:54).
We save the implementation, go to the function page, and add the
implementation to the function. On the function page, just like on the
implementation page, we see the status of all the testers for the
implementation.
Next we create a second implementation, this time in Python. Again, we
start with an implementation that simply returns the input, and again it
passes for “racecar”. We go back to the function page, and connect the new
Python implementation with the function. On the bottom of the page we now
see, in a table, the implementations against all the testers, and whether
the individual testers pass or fail for each implementation (2:28).
We create another two tests, *“another -> rehtona”* and *“final test ->
tset lanif”*, again inline. The tests become immediately visible upon
creation. We still need to save the whole page in order to store the
association with the function page. We can see how both tests pass for the
JavaScript implementation and fail for the Python implementation (3:43).
Let’s go fix the Python implementation. We go to the implementation page
and edit it by adding “[::-1]” to the string. That’s some Python magic -
feel free to skip this paragraph explaining this syntax: Python has a few
very convenient short-hand syntaxes for specific operations which, in many
other languages, require functions or more complex constructs. What is
happening here is that by appending the square brackets to a string
variable, we treat the string implicitly as a list. Inside the square
brackets we have three arguments, separated by colons (:). The first
argument says at which element to start, the second argument at which
element to stop, and the third argument gives the step size (say, you only
want every second element of the list, you would state the step size as 2).
Here, the step is -1, which means you want to walk backwards through the
list. And since the first and second argument are omitted, default values
are used - and the default for a negative step size is from the end to the
beginning. In short, you can read this as *“go through the string,
backwards one by one, from the beginning to the end, and return the new
resulting string”*. You can find a more detailed explanation of Python’s
slice notation on StackOverflow
<https://stackoverflow.com/questions/509211/understanding-slice-notation>.
Once we fixed our Python code (4:07), all but one of the tests satisfyingly
switched to green. We confidently store the new improved version. When we
go to the function page of *“reverse string”*, we can see that now both the
JavaScript and the Python implementation behave consistently. Time to fix
the banana tester!
We go to the page for the banana tester and change the expected value from “
wrong” to “ananab”. Again, before even saving, the testers are re-run
against both implementations and switch from messaging failure to letting
you know they passed (4:26). Going back to the function page, we can now
see that all testers pass all implementations.
Finally, we see a feature added (and recorded, which explains the slightly
different format) a bit later, where a new test is being created inline
(4:39). While we are creating the new tester inline the result of the test
runs for all implementations is already shown - before the tester is even
stored yet. Once we can see both implementations pass, the new tester is
saved (and thus created, 5:28), and then we save the function page itself,
associating the function with the new tester (5:35).
I hope you enjoyed this whirlwind tour through our new tester features, and
it gives you a small glimpse of how Wikifunctions will be working. Feedback
and ideas are welcome, as always.
------------------------------
We are all excited about the weekend: Wikimania 2021 has started! Wikifunctions
and Abstract Wikipedia will host a session
<https://wikimania.wikimedia.org/wiki/2021:Submissions/Wikifunctions_and_Abs…>
on
Saturday, 14 August, at 17:00 UTC
<https://iw.toolforge.org/zonestamp/1628960400>, where we will have a panel
to present our work and talk with you and the Wikimedia communities. Please
join us, bring your questions, and we are very much looking forward to a
lively discussion!
The on-wiki version of this newsletter is available here:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2021-07-29
--
Our goal with Abstract Wikipedia is to enable everyone to write content in
any language that can be read in any language. Ultimately, the main form of
content we aim for are Wikipedia articles, in order to allow everyone to
equitably have and contribute to unbiased, up-to-date, comprehensive
encyclopedic knowledge.
In the coming months, we will take major milestones towards that goal.
Today, I want to sketch one possible milestone on our way: abstract
descriptions for Wikidata.
Every Item <https://www.wikidata.org/wiki/Help:Items> in Wikidata has a
label <https://www.wikidata.org/wiki/Help:Label>, a short description
<https://www.wikidata.org/wiki/Help:Description>, and aliases
<https://www.wikidata.org/wiki/Help:Aliases> in each language. Let’s say
you take a look at Item Q836805 <https://www.wikidata.org/wiki/Q836805>. In
English, that Item has the label *“Chalmers University of Technology”* and
the description *“university in Gothenburg, Sweden”*. In Swedish it is
*“Chalmers
tekniska högskola”* and *“universitet i Göteborg, Sverige”*. The goal of
the label is to be a common name for the Item, and together with the
description it should uniquely identify the Item in the world. That’s why,
although multiple Items can have the same label, as things in the world can
be called the same but be different, no two Items should have both the same
label and the same description in a given language. The aliases are used to
help with improving the search experience.
The meaning of the descriptions across languages is often the same, and
when it is not, although sometimes intentional, it usually differs by
accident. Given there are more than 94 million Items in Wikidata, and
Wikidata supports more than 430 languages, that would mean that if we had
perfect coverage, we would have more than 40 billion labels and as many
descriptions. And not only would the creation of all these labels and
descriptions be a huge amount of work, they would also need to be
maintained. If there are not enough contributors checking on the quality of
these, it would be unfortunately easy to sneak in vandalism.
The Wikidata community has known about this issue for a long time, and made
great efforts to correct it. Tools such as AutoDesc
<https://autodesc.toolforge.org/> by Magnus Manske
<https://meta.wikimedia.org/wiki/User:Magnus_Manske> and bots such as
Edoderoobot <https://www.wikidata.org/wiki/User:Edoderoobot>, Mr.Ibrahembot
<https://www.wikidata.org/wiki/User:Mr.Ibrahembot>, MatSuBot
<https://www.wikidata.org/wiki/User:MatSuBot> (these were selected by
clicking “Random Item” and looking at the history) and many others have
worked on increasing the coverage. And it shows: these bots often target
descriptions, and so, even though only six languages have *labels* for more
than 10% of Wikidata Items, a whopping 64 languages have a coverage over
10% for *descriptions*! Today, we have well over two billion descriptions
in Wikidata.
These bots create descriptions, usually based on the existing statements of
the Item. And that is great. But there is no easy way to fix an error
across languages, nor is there an easy way to ensure that no vandalism has
snuck in. Also, bots give an oversized responsibility to a comparably small
group of bot operators. Our goal is to democratize that responsibility
again and allow more people to contribute.
Descriptions in Wikidata are usually noun phrases, which are something that
we will need to be able to do for Abstract Wikipedia anyway. We want to
start thinking about how to implement this feature, and then derive from
there what will need to happen in Wikifunctions and in Wikidata. This work
will need to happen in close coöperation with the Wikidata team, and the
communities of both Wikidata and Wikifunctions. It will represent a way to
ramp-up our capabilities towards the wider vision of Abstract Wikipedia.
Timewise, we hope to achieve that in 2022.
We don’t know yet how exactly this will work. Here are a few thoughts, but
really I invite you so that we all work together on the design for abstract
descriptions:
- It must be possible to overwrite a description for a given language
- It must be possible to retract a local overwrite for a given language
- The pair of label and description still must remain unique
- It would be great if implementing this would not be a large effort
- The goal is not to create automatic descriptions
<https://www.wikidata.org/wiki/Wikidata:Automating_descriptions>, but
abstract descriptions
The last point is subtle: an automatic description is a description
generated automatically from the given statements of an Item. That’s a
valuable and very difficult task. The above mentioned AutoDesc for example,
starts the English description for Douglas Adams
<https://autodesc.toolforge.org/?q=Q42&lang=en&mode=short&links=text&redlink…>
as
follows: *“British playwright, screenwriter, novelist, children's writer,
science fiction writer, comedian, and writer (1952–2001) ♂; member of
Footlights and Groucho Club; child of Christopher Douglas Adams and Janet
Adams; spouse of Jane Belson”*. The Item <https://www.wikidata.org/wiki/Q42>'s
current manual English description is the much more succinct *“English
writer and humorist”*. There can be many subtle decisions and editorial
judgements to be made in order to create the description for a given Item,
and I think we should be working on this — but later.
Instead, we want to support abstract descriptions: a description, manually
created, but instead of being written in a specific natural language, it is
encoded in the abstract notation of Wikifunctions and then we use the
renderers to generate the natural languages text. This allows the community
to retain direct control over the content of a description.
Here are a few ideas to kick off the conversation:
- We introduce a new language code, qqz. That code is in the range
reserved for local use, and is similar to the other dummy language codes
<https://www.mediawiki.org/wiki/Manual:$wgDummyLanguageCodes> in
MediaWiki, qqq and qqx. Wikidata is to support the qqz language code for
descriptions.
- The content of the qqz description is an abstract content. Technically
we could store it in some string notation such as “Z12367(Q3918
<https://www.wikidata.org/wiki/Q3918>, Q25287
<https://www.wikidata.org/wiki/Q25287>, Q34
<https://www.wikidata.org/wiki/Q34>)”. Or we could store the JSON
ZObject.
- The abstract description would be edited using the same Vue components
we develop for Wikifunctions for editing abstract content.
- The abstract description is a fallback for languages without a
description. It can be overwritten by providing a description in that
language.
- Every time the renderer function or the underlying lexicographic data
changes, we also need to retrigger the relevant generations.
- One question is whether we should store the generated description in
the Item, and if so, how to change the data model in order to mark the
description as generated from the abstract description.
- We also need to figure out how to report changes to everyone who is
interested in tracking them. If we store the generated description as
proposed above, we can piggyback on the current system.
All of these are just ideas for discussion. Some of the major questions are
whether to store all the generated descriptions in the Item or not, how to
represent that in the edit history of the Item, how to design the caching
and retriggering of the generated descriptions, etc.
What would that look like?
Let’s take a look at an oversimplified example. The description for
Chalmers is *“university in Gothenburg, Sweden”*. That seems like a
reasonably simple case that could easily be templated into abstract content
say of the form “Z12367(Q3918 <https://www.wikidata.org/wiki/Q3918>, Q25287
<https://www.wikidata.org/wiki/Q25287>, Q34
<https://www.wikidata.org/wiki/Q34>)”, where Z12367 (that ZID is made-up)
represents the abstract content saying in English *“(institution) in
(city), (country)”*, Q3918 <https://www.wikidata.org/wiki/Q3918> the QID
for university, Q25287 <https://www.wikidata.org/wiki/Q25287> the QID for
Gothenburg, and Q34 <https://www.wikidata.org/wiki/Q34> the QID for Sweden.
(In reality, this template is actually nowhere near as simple as it looks
like - we will discuss this more in an upcoming weekly newsletter. For now,
let’s assume this to be so simple.)
Renderers would then take this abstract content and for each language
generate the description, in this case *“university in Gothenburg, Sweden”* for
English, or *“sveučilište u Göteborgu u Švedskoj”* in Croatian. Since there
is already an English description, we wouldn’t store nor actually generate
the text, but in Croatian we would generate it, store it, and mark it as a
generated description.
We think of this as a good milestone on our path to Abstract Wikipedia,
with a directly useful outcome. What are your thoughts? Join us in
discussing this idea on the following talk page:
https://meta.wikimedia.org/wiki/Talk:Abstract_Wikipedia/Updates/2021-07-29
------------------------------
In other news, Lindsay has created a video of a new feature: how Testers
and Implementations work together to show whether the tests pass. The video
is availabe here:
https://commons.wikimedia.org/wiki/File:Wikilambda_Testers_on_Code_based_Im…
The video shows how she is changing the implementation and re-running the
testers several times. Testers will be a main component in ensuring the
quality of Wikifunctions.
The next opportunity to meet us and ask us questions will be at Wikimania.
On 14 August, at 17:00 UTC, we will host a 1.5 hour session on
Wikifunctions and Abstract Wikipedia. This year, Wikimania will be an
entirely virtual event and registration is free. Bring your questions and
discussions to Wikimania 2021.
Next week, we are skipping the weekly update.
The on-wiki version of this update is here:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2021-07-22
----
In the last few weeks, the Wikifunctions prototype has passed a few
critical milestones. We have massively improved the testability of our
codebase and increased the robustness of our tests. There’s still plenty to
do, but, considering the development ahead, it is reassuring to see the
code becoming more robust.
Another step is that the first parts of evaluating function composition are
now working. We can neatly compose any combination of built-ins,
code-based implementations, and other compositions.
I found myself having quite a bit of fun working with the prototype. Last
week, in order to capture some of the possibilities, I made a video where I
set up a new Wikilambda instance and defined a few functions for Boolean
algebra. Booleans are one of the types that come pre-loaded with a
Wikilambda instance. The main reason why they come as a pre-loaded type is
because they are necessary for the builtin If function, and the If function
is extremely useful.
In the demonstration video, I defined the Negate function, which takes one
of the two Boolean values (i.e. either True or False) and returns the
other. Then I implemented the Negate function using the If function: If
true then false else true. I followed this by implementing a few other
Boolean functions with two parameters, such as the And function
(conjunction), the Or function (disjunction), the Nand function, and the
Exclusive or function. Some of the functions are implemented using solely
the built-in If function; others combine previously composed functions
together (such as Nand, implemented as Not And).
The video also shows how to call these newly-created functions and see that
they work. You will notice a number of bugs in the video. Most of them are
already filed and being worked on; some of them have even been solved
already. A number of the workflows that you see have already been improved,
such as creating an implementation directly from a newly defined function,
etc. Also, please remember that the UX is still intentionally rough, and we
will give it a complete overhaul before we launch.
The video runs for 24 minutes and is available on Commons:
https://commons.wikimedia.org/wiki/File:Boolean_Algebra_with_very_early_Wik…
Thanks so much to the team for getting the prototype so far! I am very
proud, and looking forward to what comes next.
----
We are hiring! We are looking for an Engineering Manager:
https://boards.greenhouse.io/wikimedia/jobs/3270135 Our hires can be based
remotely.
The next opportunity to meet us and ask us questions will be at Wikimania.
On 14 August, at 17:00 UTC, we will host a 1.5 hour session on
Wikifunctions and Abstract Wikipedia. This year, Wikimania will be an
entirely virtual event and registration is free. Bring your questions and
discussions to Wikimania 2021:
https://www.eventbrite.com/e/wikimania-2021-tickets-161884957265
And a reminder that all Wikimedians are invited to attend the Grammatical
Framework Summer School from 26 July to 6 August 2021 for free. The link
explains how to register and gives more background:
https://meta.wikimedia.org/wiki/Special:MyLanguage/Abstract_Wikipedia/Updat…
The on-wiki version of this newsletter is available here:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2021-07-16
--
It is our pleasure to announce that Max Binder
<https://meta.wikimedia.org/wiki/User:MBinder_(WMF)> will join the Abstract
Wikipedia team part-time for a while in order to provide us help and
support with our processes and tools. Max is a Senior Team Effectiveness
Coach and joined the Wikimedia Foundation in 2015. I want to let Max
introduce himself with his own words:
Hello! :)
My name is Max (it’s not short for anything, which I have found is
uncommon). I use he/him pronouns. I am excited to join this team from
the Technical
Program Management
<https://www.mediawiki.org/wiki/Wikimedia_Product/Technical_Program_Manageme…>
team,
and support healthy team practices. Here are some links to things I’ve
written previously about who I am and how I approach my work:
Meta page: User:MBinder (WMF)
<https://meta.wikimedia.org/wiki/User:MBinder_(WMF)>
Approach and style: Team Effectiveness Coach Approach and Style
<https://www.mediawiki.org/wiki/Wikimedia_Product/Technical_Program_Manageme…>
I will be with this team as long as it takes to codify needs and norms
thereof, and eventually onboard a Technical Program Manager for ongoing
support thereafter.
Picking a favorite Wikipedia page is like picking a favorite child, but
I’ve always enjoyed: List of helicopter prison escapes
<https://en.wikipedia.org/wiki/List_of_helicopter_prison_escapes> and Toliet
paper orientation <https://en.wikipedia.org/wiki/Toilet_paper_orientation>
One goal for Max is to help us get our processes ready to scale for more
new members.
Speaking of which: we are hiring! We are currently hiring for two Software
Engineers
<https://boards.greenhouse.io/wikimedia/jobs/3298646?gh_src=03df28cb1us>
and an Engineering Manager
<https://boards.greenhouse.io/wikimedia/jobs/3270135> for Abstract
Wikipedia. The positions can be remote and can be outside the United
States. If you are interested, or know someone who might be, or a good
community to spread the word about the positions, please share the link.
We also want to say thanks to Carolyn Li-Madeo and Simone Cuomo, who have
been working with us for the last few months. Carolyn was helping to
kick-off the design work within Abstract Wikipedia with Aishwarya, and is
now stepping away from the day to day work on Abstract Wikipedia in order
to focus more on her primary work within the Foundation. Simone has worked
on a number of tasks on the front end, for example making the front-end
more testable and modular, and is now ramping up to provide support for the
Structured Data team. Our deepest gratitude to both of them for their
contributions to the project and the support they provide to the
organization and user experiences more generally.
We also got word that we were selected for a session at Wikimania 2021
<https://meta.wikimedia.org/wiki/Wikimania_2021>. Wikimania will be held
from August 13 to 17, 2021. On August 14, at 15:00 UTC, we will host a two
hour session on Wikifunctions and Abstract Wikipedia. This year, Wikimania
will be an entirely virtual event and registration is free
<https://www.eventbrite.com/e/wikimania-2021-tickets-161884957265>. Bring
your questions and discussions to Wikimania 2021!
Another reminder that there is an invitation to attend for free
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2021-06-24> the
Grammatical Framework Summer School from 26 July to 6 August 2021.