The on-wiki version of this newsletter is here:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2020-12-09
--
One of the great things in the Wikimedia movement is our opportunity to
partner with open source and open access communities across the Internet.
Today we'd like to introduce you to two new members of our effort who join
us via such a partnership.
The Wikimedia Foundation regularly partners with the Outreachy program
<https://www.outreachy.org/>, which offers three-month internships to work
remotely in Free and Open Source Software ("FOSS") with experienced
mentors. The program is open to both students and non-students, and
encourages applications from people who face under-representation, systemic
bias, or discrimination in the technology industry of their country.
We have the good fortune in Abstract Wikipedia to welcome two Outreachy
interns this round! Please join me in welcoming Aisha and Jade, who will be
focusing on analysing and understanding how templates and Lua modules are
used across Wikimedia wikis. This knowledge will be crucial in helping us
work out how the central wiki of functions can best support existing
community use cases, alongside the novel features that Abstract Wikipedia
will also provide. There's more detail on the outline project definition on
Phabricator <https://phabricator.wikimedia.org/T263678>.
In their own words:
*Aisha*
I completed my Computer Science undergraduate degree in February 2020, then
worked as a Machine Learning Engineer for 7 months before I joined
Wikimedia as an Intern. I am passionate about data and natural language
processing, to which I was introduced through my undergraduate thesis
research. I cook as a hobby and someday I wish to travel to far lands and
experience lots of cultures.
Communities have inspired me and have been my source of motivation for
quite some time. I owe my skills and knowledge to them and I am
enthusiastic to keep the process of dissemination going. And what is a
better place than an open-source community where free and open information
is not only celebrated but people work towards making it possible for
everyone else? This makes software and data so much more organic and gives
it the kind of atmosphere I love hanging around in. I am so glad to have
joined Wikimedia! Hoping to learn a lot and make valuable contributions!
*Jade*
I am a CS student, who will most likely finish the Bachelor's degree in the
summer of 2021.
While I was studying in school, for a long time, I was thinking that
linguistics would be my future - but learning to code through high school
made me choose another path. I felt that creating new things and easily
solving problems with programs created by yourself is really exciting, so I
stick to that decision.
In the University I was lucky enough to find the lecturer whose main
interest was computer linguistics, so I was able to combine both of my
passions, creating projects about predicting the style of the text (which
has far less flexible classification in Russian than in English) and about
translating poems to Interslavic.
Reading Abstract Wikipedia papers made me feel really enthusiastic, as it
showed me the really different point of view to the translation problems.
You don't have to strive for variety - but for simplicity, unambiguity and
richness of the language at the same time. And what can be better if this
initiative will help many more people find the knowledge they are looking
for?
We're so pleased to have the opportunity to work with Aisha and Jade, and
look forward to their contributions. You'll see them on IRC and on their
project repository
<https://github.com/wikimedia/abstract-wikipedia-data-science> (just set
up) in the coming months. Please do say hi!
Aisha and Jade will be blogging about their experience. We encourage you to
have a read of their first posts about their application for Outreachy
internship:
- Aisha's blog
<https://tanny411.github.io/2020/12/01/Getting-Started-With-Outreachy/>
- Jade's blog
<https://dev.to/lost_enchanter/scaredy-cat-s-guide-to-outreachy-admission-20…>
While you're at it, you may want to add the reports page
<https://www.mediawiki.org/wiki/Outreachy/Round_21/Bi-weekly_Reports> to
your watchlist, too, and see the fine work from all of our Outreachy
interns.
Thank you, Aisha and Jade, for joining us!
--The Abstract Wikipedia Team
The on-wiki version can be found here:
https://meta.wikimedia.org/wiki/Special:MyLanguage/Abstract_Wikipedia/Updat…
* * *
Today we will talk a little bit about how the catalog of functions will
actually work. A more formal and comprehensive description of this is
the function
model
<https://meta.wikimedia.org/wiki/Special:MyLanguage/Abstract_Wikipedia/Wiki_…>,
but here we want to create a more intuitive explanation.
Every page in the new wiki will contain one *Object*. Every Object is of
exactly one *Type*. Types decide what the Object means. Types also decide
what makes an Object valid or not. An Object that is of a certain Type is
called an *instance* of that Type. Every Object that is stored as a wiki
page is called a *Persistent Object*, and such an Object has a Z-ID by
which it can be referenced. The Z-ID is the name of the page.
What does it mean that the Type decides what the Object means? Let’s look
at an example. An Object of Type *Positive Integer* with the value
"2020" represents
the number 2020. That’s the number you get when you multiply 4, 5 and 101.
And because it has the Type of Positive Integer, we know it is a number. If
the Type of "2020" was String, then that wouldn’t represent the number
2020, but rather a sequence of four characters, "2", "0", "2", and 0. If
the Type of it was *Gregorian Calendar Year* then this would represent the
year 2020 in that calendar, a leap year starting with a Wednesday and
ending with a Thursday.
The Type decides what you can do with an Object -- or, more formally
speaking, which Functions you can apply to the given Object. If you have a
String, you can count the length of the String, *i.e.* how many characters
are in the sequence. The result for "2020" above would be four. If you have
a Gregorian Calendar Year, you might also be able to ask for the 'length',
but the result should probably be 366 days (as it is a leap year). If you
have a Positive Integer and ask for its length, you might be asking for the
logarithm in base 10, or you might be asking for the multiplicative
persistence <https://en.wikipedia.org/wiki/Persistence_of_a_number>. But
all of these would be different Functions (that may well have the same
label, “length”, in English; their descriptions would explain in more
detail what they would do). The Function states which Types it can take,
and the Type decides what an Object means.
Types can be very diverse. Some Types might have a small, explicitly
defined number of instances. These are called *Enumerations*, because all
the instances of the Type are known in advance and enumerated, and each
instance must be one of these few possible values. The classic example of
such a Type is *Boolean*, named after George Boole
<https://en.wikipedia.org/wiki/George_Boole>, which has exactly two
instances: *True* and *False*. Another Enumeration would be the *Days of
the Week*. Other Types can be simple but have an infinite number of
possible instances. Examples of such Types are *String* for sequences of
characters, and *Positive Integer* for the counting numbers. Other Types
can be composed of simpler Types. For example, one way to represent an
*Integer* would be to compose it from a Positive Integer and a Boolean,
where the Boolean decides whether the Integer is positive or negative, and
the Positive Integer represents the absolute value.
Not every Object has to be stored as a wiki page. For example, there is no
need to store the number 2020 as an Object in the wiki, as it can be easily
just created on the fly.
This is how Objects are built and represented. Objects of almost all Types
are called *Literals*. A Literal is an Object that, when evaluated, results
in itself. For example, when you evaluate the number 2020, the result is
the number 2020. But there are two very special Types whose instances are
not Literals, and these two types are *References* and *Function Calls*.
References are a special Type of Object that refers to a Persistent Object
by its Z-ID. It basically just says “here, this Reference, should really be
just this Object that is referenced to by this Z-ID”. Evaluating a
Reference results in the referenced Object. A Reference does not need to be
evaluated immediately, but can be evaluated whenever needed. In fact, it is
sometimes impossible to evaluate all References fully, as it can easily
lead to a recursion and infinite objects.
The other special Type of Object is the Function Call. A Function Call
consists of a *Function* and a *List of Arguments*. Evaluating a Function
Call is the core 'magic' of the whole system: it is basically replacing a
Function Call with the result of that Function Call, *e.g.* evaluation
would be replacing the Function Call to the Function with the label
"length" and the argument "2020" of Type String with the value "4" of Type
Positive Integer.
Function Calls are usually created on the fly. We wouldn’t, in general,
create a new wiki page with a Persistent Object in which we store the
Function Call described above, but rather we would send that Function Call
to an evaluation engine which, in turn, evaluates the Function Call and
returns the result. There is really not much more to the system than that.
A Function Call is in many ways analogous to using a MediaWiki template
<https://www.mediawiki.org/wiki/Help:Templates> with parameters.
I hope that this explanation made some sense and helped with understanding
the planned system. We are currently developing the part of the system that
allows contributors to create Types inside the wiki itself, so that the
community can create and control which Types exist. Please feel free to ask
questions.
------------------------------
The voting for the final round of the naming contest
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Wiki_of_functions_naming…>
is
ongoing. Voting will be open until Monday, November 16. If you haven’t
already, please vote here and decide on a name for the new project where
all of these Objects we described above will be available. The proposals we
are voting on are, in reverse alphabetical order: *Wikimedia Functions*,
*Wikilambda*, *Wikifusion*, *Wikifunctions*, *Wikicodex*, and *Wikicode*.
Let us know your preferences and decide on the name of the new project by
Monday!
The newsletter is also available on-wiki here:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2020-12-02
--
We are very happy to welcome Genoveva Galarza Heredero who is joining the
Abstract Wikipedia team as a software engineer.
Genoveva finished her Software Engineering degree in Madrid in 2009, and
since then has worked in various places. She worked for some time in
Argentina and then moved to India, where she stayed for five years, first
working on the architecture of expert systems that leveraged NLP and Linked
Open Data. Then she started her own social venture, an effort to move
Indian IT hubs to more eco-friendly ways of commuting. She now lives in
Madrid again, and after working for some years as a full-stack developer in
projects on agriculture, green energy, and data privacy, has joined the
Wikimedia Foundation.
She is passionate about art and free culture online, and has used her hobby
as a comic artist to work along the Spanish and Latin American webcomic
community to create faneo.es, a non-profit project led by its members that
attempts to provide an open and free space to publish, read, and even
create comics collaboratively.
Let’s share a few of her words:
“I got into IRC and Usenet when I was a kid: I discovered communities,
became part of them and ultimately learned how to code with the help of
others who shared their knowledge online. I can honestly say that
everything that I am today - a programmer, a comic creator, even a nature
lover or a linguistics geek - I owe it to the availability of free and open
knowledge and culture online. I have seen in my own flesh how incredibly
self-defining it is to have access to local and global knowledge, so I am
really excited to join the Wikimedia movement and, as we say in Spanish,
put my little grain of sand to make this possible for others. Hopefully
I’ll be learning a lot more about how language works, about bigger
communities, and about defying the for-profit approach to progress and
succeeding! I now have a lot of catching up to do, but I’m looking forward
to working with all of you and hopefully making meaningful contributions
soon.”
Genoveva will also expand the project’s timezones, so we can have
discussions with volunteers from more global regions in real-time. Given
that she is joining James Forrester as the second software engineer on the
team, she will be working on all the parts of the system for now: storage,
display, and editing on the wiki, the back-end evaluation engine service
and related APIs, the general UX, etc.
As the team grows, we will also start to implement more structure and
process in the development. We set up a new IRC channel at
#wikipedia-abstract-techconnect
<https://webchat.freenode.net/?channels=#wikipedia-abstract-tech> that will
(but does not yet) track changes to the source code and to the task boards.
Other technical streams will also be reflected on that. You are all welcome
to join this new IRC channel.
Please join me in welcoming Geno to the team as we are continuing our work
on the new Wikimedia project.
The wikiversion of this newsletter can be found here:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2020-11-25
----
Every Object in Wikifunctions will be identified by its Z-ID, similar to
the Q-IDs and L-IDs of Wikidata for Items and Lexemes, respectively.
Whereas most of the Z-IDs will be simply assigned consecutively, we plan to
have a few intentionally chosen Z-IDs, just as with Q-IDs and L-IDs. Unlike
with Q-IDs and L-IDs the goal of these is less for them to be "Easter
eggs", but rather to be mnemonic and to help with remembering them.
No, no, don’t worry - you shouldn’t be remembering all the Z-IDs that you
need to use. The interface should hide most of the Z-IDs. But sometimes,
either in early versions of the system, when the interface isn’t
sufficiently complete yet, or later in some debugging tasks or when
inspecting or creating some messages on the wire, it might be helpful to
have core Z-IDs be ever so slightly easier to recall than entirely
arbitrary assignments.
I invite you all to join us in finding good Z-IDs for the core Objects of
Wikifunctions. We will be coalescing on the solution on this page: Reserved
ZIDs <https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Reserved_ZIDs>.
The page discusses and lists:
- Should there be a contiguous block of Z-IDs reserved, e.g. all Z-IDs
with four digits or fewer (or maybe three or fewer? Just two or one?)
- Which Z-IDs should be reserved?
- What should these Z-IDs be standing for?
- What are the Objects that should get pre-assigned Z-IDs?
Here are a few thoughts that went with the current status of assigned Z-IDs
in the Function Model (but you will see that the page above does not always
agree with these thoughts).
We tried to minimize the number of reserved Z-IDs, and just reserve the
Z-IDs Z1-Z99. That’s tight. By extending to Z999 or Z9999 we would have
more space.
The Z-IDs Z1-Z25 were mostly for the basic types of the Function Model.
These include Object, Type, Function, Implementation, Error, String, List,
etc. After that we thought about assigning a block of Z-IDs for the fifteen
or so initial Functions, and, twenty above them, their respective initial
Builtin Implementations, e.g. having “if” be Z31 and the builtin
implementation of “if” be Z51. Besides that, we need a number of reserved
Z-IDs for certain error codes, for languages, and for the Boolean values
True and False.
Here are a few thoughts on how some of the basic Types were assigned Z-IDs
- mostly based on the length of the English label:
- Z1 : *Object* (Type) : because everything starts here.
- Z2 : *Persistent object* (Type) : everything on the wiki is a Z2,
that's why this has such a low ZID.
- Z3 : *Key* (Type) : because the word key has three letters in English.
- Z4 : *Type* (Type) : because the word type has four letters in English.
- Z5 : *Error* (Type) : because the word error has five letters in
English.
- Z6: *String* (Type) : because the word string has six letters in
English.
- Z7: *Function call* (Type) : because function calls are the main
'magic' of the system, and 7 is a magic number. It is also close to Z8.
- Z8: *Function* (generic type, thus technically a function) : because
function has eight letters in English.
- Z9 : *Reference* (Type) : because the word reference has nine letters
in English.
- Z10 : *List* (generic type, thus technically a function) : because
it's the first number that has two digits.
- Z11 : *Monolingual text* (Type) : because it is just one language, and
there's a one in the name.
- Z12 : *Multilingual text* (Type) : because it's an extension of Z11.
- Z14 : *Implementation* (Type) : because the word implementation has
fourteen letters in English.
- Z20 : *Tester* (Type) : because 20/20 is perfect vision, and tests
make errors visible.
- Z99 : *Quote* (Type)
Another basic Type that still needs an assignment is Boolean, and possibly
others.
A list of the currently suggested initial fifteen Functions:
- *if*: Boolean, T, T ➝ T
- *value by key*: Key reference, T ➝ U
- *reify*: Any ➝ List(Pair(Key reference, Any))
- *abstract*: List(Pair(Key reference, Any)) ➝ Any
- *head*: List(T) ➝ T
- *tail*: List(T) ➝ T
- *empty*: List(T) ➝ Boolean
- *cons*: T, List(T) ➝ List(T)
- *unquote*: Quote ➝ Any
Additionally to that we also need a validator function for most of the
basic Types. There is a task about the basic Functions and Types on
Phabricator: T261474 <https://phabricator.wikimedia.org/T261474>
These are all just suggestions, and we are happy to see them discussed on
wiki <https://meta.wikimedia.org/wiki/Talk:Abstract_Wikipedia/Reserved_ZIDs>.
Please make further suggestions for reserved Z-IDs there, either for
mnemonic purposes or also for genuine easter eggs. I am sure that you will
come up with much better and more interesting suggestions than we are. So,
let us know your ideas and see you next week with some news.
The on-wiki version of this post is here:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2020-11-17
--
The vote for the name of the new Wikimedia project that will contain a
contributor-curated catalog of functions is over! Thanks to all the
participants.
There have been votes from 320 accounts, a considerable increase over the
193 voters in our first round. Using the vote tallying process described in
the rules, the top candidate resulting from the vote is *Wikifunctions*.
Here is how the vote shook out. After removing the ineligible votes, these
were the top preferences of the voters:
1. Wikifunctions 84
2. Wikilambda 82
3. Wikimedia Functions 70
4. Wikicode 45
5. Wikicodex 16
6. Wikifusion 16
We eliminate the two options with the fewest votes, *Wikifusion* and
*Wikicodex*, and count the top preferences after that elimination:
1. Wikifunctions 92
2. Wikilambda 89
3. Wikimedia Functions 79
4. Wikicode 53
That round eliminated *Wikicode*, resulting in this third round:
1. Wikifunctions 126
2. Wikilambda 97
3. Wikimedia Functions 90
Which lead to the elimination of *Wikimedia Functions*:
1. Wikifunctions 190
2. Wikilambda 123
Which finally led to *Wikifunctions* winning. The fact that we had to
eliminate all options but two shows how tight the vote was.
In order to honor the name *Wikilambda*, which as you can see also caught
quite some attention, and which had a few fiery supporters, we have decided
to name the software extension to MediaWiki that we are developing
*WikiLambda* (note the CamelCase, which is normally present in our software
tools in honour of our roots in UseMod
<https://en.wikipedia.org/wiki/UseModWiki>).
The next step is now to have the Legal department do the in-depth vetting
of the name. We aim to have that finalized by mid-December.
Interestingly, should *Wikifunctions* not be accepted, per the rules of the
contest, the runner-up would not be *Wikilambda* but *Wikimedia Functions*.
If *Wikifunctions* is to be eliminated due to legal considerations, this is
how the first round would looked like:
1. Wikimedia Functions 113
2. Wikilambda 94
3. Wikicode 63
4. Wikicodex 22
5. Wikifusion 21
With *Wikicodex* eliminated, the next round looks as this:
1. Wikimedia Functions 117
2. Wikilambda 99
3. Wikicode 75
4. Wikicodex 22
Which eliminates *Wikifusion* and leads to the following results:
1. Wikimedia Functions 125
2. Wikilambda 108
3. Wikicode 80
Finally, eliminating *Wikicode*, we see quite a strong lead:
1. Wikimedia Functions 176
2. Wikilambda 137
The winning proposal of that would be *Wikimedia Functions*, again with
*Wikilambda* being the apparent follow-up. We know already that *Wikimedia
Functions* would pass the vetting process (due to the strong trademark that
*Wikimedia* itself is), so we would be able to stop the process there. So
the comments about *Wikilambda* above would stay.
Thanks again for everyone participating in the vote! We are looking forward
to finalizing the result, and come back to you within a few weeks. Once
that is closed, we will be setting up the timeline and the process for the
logo contest. We already have community members being eager to start. Feel
free to start creating proposals and discussing them! That page is
preliminary. Over the next few weeks we plan to suggest rules for the logo
contest, and also to publish some suggestions regarding the potential logo.
We are also very excited about these parts coming together!
Thank you all!
The on-wiki version of this newsletter is here:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2020-11-05
----
The second round of voting has started for naming the new Wikimedia
project, the wiki of functions!
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Wiki_of_functions_naming…>
How
often do you have the opportunity to name a Wikimedia project? This will be
the first new Wikimedia project to launch since 2012.
Today I will talk about the details of the second round of the naming
contest. Feel free to skip this update if you are not interested in this.
The first round saw more than 170 proposals submitted for what to call the
central wiki of functions. After more than 500 votes, we have a list of the
six top proposals. We have published considerations from legal and
communications
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Wiki_of_functions_naming…>
on
the proposals.
The six top proposals, in alphabetical order, are:
- Wikicode
- Wikicodex
- Wikifunctions
- Wikifusion
- Wikilambda
- Wikimedia Functions
<https://meta.wikimedia.org/wiki/File:Mediawiki_logo_proposal_(gradient_tran…>
<https://meta.wikimedia.org/wiki/File:Mediawiki_logo_proposal_(gradient_tran…>
Winner of the MediaWiki logo contest
We have used and adapted a gadget written by Amir Sarabdani
<https://meta.wikimedia.org/wiki/User:Ladsgroup> in his volunteer capacity,
and diligently adapted by James and Nick for our usage. The gadget was
written and has been used previously for voting on the new logo for the
MediaWiki software
<https://www.mediawiki.org/wiki/Project:Proposal_for_changing_logo_of_MediaW…>.
That vote, by the way, just a few days ago, and the winner of the vote was
just announced. The preliminary winner - pending the final Legal check-up -
can be seen to the right. Congratulations to Amir for running this process,
and congratulations to the MediaWiki community on a fresh and beautiful
proposed new logo!
Unfortunately, gadgets do not work on the mobile website. If you're using a
mobile device, you can go directly to the vote recording page
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Wiki_of_functions_naming…>
and
enter your preference manually.
For the second round, you are asked to rank the six proposals by
preference. We will then use a modified instant-runoff voting system to
decide on the winner of the vote. The vote tallying will work as follows:
First, we count how many valid votes have been made. This provides the
denominator. Then we count, for each proposal, how often it has been listed
as the first preference. This provides the numerator. If any proposal
reaches more than half at this point, it immediately becomes the winning
proposal. If, which is more likely, none of the proposals manage to exceed
half at this point, we count which proposal is listed least frequently as
the first preference. This proposal gets eliminated (in case of a tie, all
of the least popular proposals get eliminated together). Then all votes
which had an eliminated proposal as their top preference are re-assigned to
the next-ranked preference, and we count the votes again.
We repeat this process until one proposal becomes the top preferred choice
by at least half of all voters. This proposal becomes the winner of the
second round of voting.
The winner of this second voting round will then undergo a more thorough
(and expensive) vetting process. Assuming the vetting process finds no
issues with the candidate, the candidate will become the official name of
the new project.
In the case that there are findings that disqualify the proposal, we will
remove that proposal from all rankings, and repeat the tallying of the
votes with the now modified rankings. This will produce a new top proposal,
which in turn gets validated. This proceeds until a top candidate is deemed
acceptable by the legal vetting process.
Votes have opened on Monday, and will be open for two weeks, *until 16
November*. We have kept the same rules regarding voting eligibility as
applied to the first round:
- You must have a registered account which is not blocked on more than
one project
- You must not be a bot
- You must have made at least 25 edits as of September 1 on any public
Wikimedia production wiki
Additionally, current and former members of the Board of Trustees and of
the Advisory Board of the Wikimedia Foundation are also qualified to vote.
We hope that this process will be finished by 8 December, when we plan to
announce the name of the project - and then kick-off the contest to find a
logo to go with the new name.
So, if you are eligible, please go ahead and cast your vote! Let us know
your preferences and help to name a brand new Wikimedia project. Spread the
word in your communities!
*Go vote
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Wiki_of_functions_naming…>!*
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Wiki_of_functions_naming…
A version of this newsletter with links and formatting is available on-wiki
here: https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2020-10-22
In this edition of the weekly posts, I want to discuss the different places
where we will use unstructured text in the objects of the wiki of
functions. The following is a plan and a request for comments. Besides the
labels, nothing is implemented yet, so we would really appreciate your
feedback.
Labels. Every object in the wiki of functions will be identified by a Z-ID,
similar to Q-IDs identifying items in Wikidata. But just like Q-IDs, we
don't expect Z-IDs to be widely visible and used. Instead, every object
will have labels, one per language.
But unlike Wikidata items, every object will be an instance of a specific
type. For example, there might be an object representing the addition of
two integer numbers. So the English label for this object could be “add”.
Other good labels for this object could be “addition”, “sum”, or “plus”. An
inappropriate name for the function would be “multiplication”, as that
would be terribly confusing.
Uniqueness. There would likely also be other objects representing functions
that do addition, for example the addition of two floating point numbers,
or of two complex numbers, or of two matrices. In the wiki of functions
labels will not need to be unique overall - but they will need to be unique
for each type. So there can only be one function with the label “add” that
takes two integers and returns one integer. Or only one type with the label
“integer”. Per type, each label must be unique.
Now does every object need a label? No. There will be many objects where a
label won’t be strictly necessary. Not every test for a function will need
a label, nor will every implementation of a function. They may have labels,
but they won’t be necessary - this is another difference to Wikidata, where
items without labels are almost always problematic.
One more note on labels - labels do not have to be direct translations
across languages. So in one language, two functions might have the same
label if they have a different type, but in another language the two
functions might have different labels as well. For example, in English
“length” might be an appropriate label for both a function that returns the
number of elements in a list, but also for a function that returns the
length of a river, or a function that returns the duration of a movie. In
Croatian, on the other hand, all three of these might have different names
(“broj elemenata”, “duljina”, “trajanje”). Each language can decide what
pattern works best for them. Whether verbs in the imperative mood (“add”)
or a description of the result (“sum”) or the name of the operation
(“addition”) work best can be decided from language to language, and
independent style guides for each language may evolve.
Aliases. Besides labels, every object can have additional aliases per
language. Aliases are helpful when searching for an object. The above
function, labeled “add” in English, could have all the other alternative
names given above - “sum”, “plus”, “addition” - as aliases, so that when
someone searches for one function by a different name they still find the
right one as a result.
Documentation. Every object may also have documentation in each language.
Documentation is some wikitext that further describes the given object.
Many objects will not have any documentation at all, but many will have
some. If you ever had the opportunity to read The Art of Computer
Programming, you will know that there is often a lot to say about a
function! But besides this kind of background story, we can also have some
documentation describing a given implementation, or some explanation for
why a given test is useful. It could also link to a Phabricator task
describing an error that used to be there and that this test is checking
for it in order to catch it before it resurfaces, or a link to other
resources such as a textbook on algorithms.
Keys and arguments. At least types and functions will also have labels for
each key of the type and for each argument of the function. These will be
used in creating the user interface to display and edit values and function
calls.
Short descriptions. One specific question we have is - should we have
optional short descriptions for each object? In many IDEs, and sometimes
even directly in the programming language (think docstrings in Python)
there is support for short one-liners that give a bit of extra information
for a function, beyond just the name of the function and the arguments and
the types.
We are going to have a strong type system, and we will have plenty of space
for the documentation. So do we really also need a place for short
descriptions? What would their use case be? How would they be used in the
UX? On the Website, something akin to the pop-up previews in Wikipedia seem
to be even more useful than a one-liner, previewing the whole documentation.
Originally I assumed per default that we would have short descriptions,
given their importance and usefulness in Wikidata, and so they also feature
in the AbstractText prototype. But they were never useful there. Also, the
type took over the role of the disambiguator, as described above, so there
was really no technical need for a short description. I currently lean
towards not having them, but I would like to hear more input.
The good thing is that no decision will be carved in stone. The data model
of the wiki of functions will be much more flexible than the data model of
Wikidata, and if we figure out that we do need short descriptions, we can
just introduce them later. It is much harder to remove things though,
because almost everything that is there gets used some way or the other, so
I am more wary of introducing features without a good reason.
Dogfooding. One obvious question is: hey, we are developing this
architecture to create multilingual content, why even have all this
documentation and labels and all that in actual languages, why not use our
own functions to build all of this up?
And yes, agreed, that would be best. It’s just, we’re not there yet, and
until then, we will still need labels and documentation and aliases. So
eventually I would very much look forward to using abstract content to
describe the objects of the wiki of functions themselves. It will also be
interesting to see if we can then roll back some of the local content, and
how open we will be to do so - as this will give us a lot of interesting
insight into how to approach similar goals in the other Wikimedia projects,
from descriptions (and maybe even labels? or sense glosses?) to stubby
Wikipedia articles, there will be plenty of potential to make our content
across projects be easier to maintain and to provide a more uniform
baseline coverage across languages.
New video introducing Abstract Wikipedia. A new presentation, given for
Wikidata’s Eighth Birthday, organized by the community of
WikiProject:India, is available: https://www.youtube.com/watch?v=GAb1HylGemA
Naming contest. Next week, Tuesday 27 October, the second round of voting
for the name of the new wiki of functions will begin. The proposals are
currently being vetted by legal, and we should know which names will be in
the final round soon.
on wiki:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2020-10-29
I had a weekly update written, but then, on the last moment, I had to
entirely drop it and write a new one - you’ll probably get the other update
next week!
We originally had planned to start the second round of the naming contest
this week, but a number of issues were found with the proposals. Instead of
removing proposals, though, we decided to explicitly and transparently
write up the issues that were raised, and then let you, the community,
decide on the proposals. We are currently writing up a ‘voters’ guide’ with
the results from our internal review processes, which we plan to publish by
the end of the week, and then start the voting on Monday. We will also push
back the end of the voting period by the same number of days.
This week also sees the Eighth Birthday of Wikidata! Congratulations to the
Wikidata project and the Wikidata community! Without Wikidata, Abstract
Wikipedia would be unthinkable, for many different reasons, including:
Wikidata provides a large catalog of entities of interest, which can be
referred to by stable identifiers. This will be extremely helpful when
creating the content of the Abstract Wikipedia: it will allow us to make
some simple statements, such as “Marie Curie and Pierre Curie were
introduced to each other by Józef Wierusz-Kowalski.” This could be
expressed by a constructor that takes three Q-IDs as the parameters, Q7186
<https://www.wikidata.org/wiki/Q7186>, Q37463
<https://www.wikidata.org/wiki/Q37463>, and Q11730603
<https://www.wikidata.org/wiki/Q11730603>, as in this case all three
mentioned people are represented by an item in Wikidata. But still,
Wikidata will never be sufficient: not everything we want to talk about
will have a Wikidata item, and Abstract Wikipedia will need a mechanism to
create a reference through description: for example, the mother of Marie
Curie does not have an item in Wikidata. We could either create an item, or
we could refer to her by description in Abstract Wikipedia, i.e. a
constructor with the meaning “the mother of Marie Curie”. But for these
descriptions, having the large catalog of entities that Wikidata provides
will be immensely valuable.
Wikidata also provides a lot of data that can be used directly in Abstract
Wikipedia. We would not need to repeat the date of birth for Marie Curie
<https://www.wikidata.org/wiki/Q7186#P569> in Abstract Wikipedia, but can
simply query Wikidata for the statement about her date of birth and display
the respective value. This will help with reducing the number of places we
have to maintain data.
Wikidata also contains a large catalogue of references, and there are two
aspects to that. On the one hand, the statement about Marie Curie’s date of
birth has in fact 17 references <https://www.wikidata.org/wiki/Q7186#P569>.
We can then select some of those references to be displayed in the rendered
Wikipedia article created from Abstract Wikipedia. This is already
happening today, so if you see the article for Marie Curie in Greek
<https://el.wikipedia.org/wiki/%CE%9C%CE%B1%CF%81%CE%AF%CE%B1_%CE%9A%CE%B9%C…>,
you will see 17 references for the date of birth in the infobox.
On the other hand, Wikidata also contains a lot of possible sources as
items: books, scientific articles, websites. The Wikicite conference, which
had its 2020 edition this week
<https://meta.wikimedia.org/wiki/WikiCite/2020_Virtual_conference>, was all
about growing and maintaining the large corpus of referenceable sources
that Wikidata has become. Having the sources described as items will make
it easy to use them in references in Abstract Wikipedia, as it already
makes it simpler to cite them in the Wikipedias.
Wikidata provides the lexicographic database that will be needed for
Abstract Wikipedia. So when we want to talk about the mother of Marie Curie
in, say, Russian, we need to know what the singular nominative form of the
word ‘mother’ is in Russian. And that information will be coming from the
Form <https://www.wikidata.org/wiki/Lexeme:L57777#F1> stored on the
respective Lexeme in Wikidata.
We are currently planning to better understand and visualize how the
coverage of the lexicographic data in Wikidata is progressing for
encyclopedic content. To undertake this analysis, we have encouraged the
Wikidata team to provide regular JSON dumps of the Lexeme corpora
<https://phabricator.wikimedia.org/T220883>, which is currently processing
and will be available soon. Our thanks to the Wikidata team!
As we see, there are many ways that Wikidata will be used to power Abstract
Wikipedia. In 2021, we also plan to have a discussion amongst the Wikimedia
communities as to whether Wikidata will be the place to actually store and
maintain the content for Abstract Wikipedia, or if it should live in some
other place. There are many pros and cons regarding this decision, and we
are looking forward to the discussion.
In the wiki of functions, Wikidata will provide a large set of interesting
items that can be used as input for functions. Some of the entries in the list
of function examples
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Early_function_examples>
illustrate functions that could use Wikidata items as input: there are
functions such as distance, that calculate the distance between two cities,
or the head of state at birth function, that takes a person and returns the
head of state in the place of birth during the time of birth of that
person, etc. We can have plenty of interesting functions that are answering
questions around Wikidata items. We will use Wikidata as a large repository
of items that can be used in Wikidata functions.
Happy Birthday, "big sibling" project Wikidata! We are looking forward to
joining you as a Wikimedia project as the wiki of functions next year!
New media: We had a talk about Abstract Wikipedia
<https://www.youtube.com/watch?v=GAb1HylGemA> at the Wikidata Eighth
Birthday event organized by WikiProject:India and is now available on
Youtube <https://www.youtube.com/watch?v=GAb1HylGemA>.
Hello all,
Just a quick note: we will have to delay the start of the vote for the
second round of the naming contest by a few days.
We will move the start of the vote to Monday, November 2nd. We will also
provide a kind of voter guide to explain some of the potential issues we
encountered with some of the names in the preliminary step, but we will
leave the final decision with the community (unless major issues show up in
the second step of the legal assessment).
Thank you all for your understanding,
Denny