A version of this newsletter with links and formatting is available on-wiki
here: https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2020-10-22
In this edition of the weekly posts, I want to discuss the different places
where we will use unstructured text in the objects of the wiki of
functions. The following is a plan and a request for comments. Besides the
labels, nothing is implemented yet, so we would really appreciate your
feedback.
Labels. Every object in the wiki of functions will be identified by a Z-ID,
similar to Q-IDs identifying items in Wikidata. But just like Q-IDs, we
don't expect Z-IDs to be widely visible and used. Instead, every object
will have labels, one per language.
But unlike Wikidata items, every object will be an instance of a specific
type. For example, there might be an object representing the addition of
two integer numbers. So the English label for this object could be “add”.
Other good labels for this object could be “addition”, “sum”, or “plus”. An
inappropriate name for the function would be “multiplication”, as that
would be terribly confusing.
Uniqueness. There would likely also be other objects representing functions
that do addition, for example the addition of two floating point numbers,
or of two complex numbers, or of two matrices. In the wiki of functions
labels will not need to be unique overall - but they will need to be unique
for each type. So there can only be one function with the label “add” that
takes two integers and returns one integer. Or only one type with the label
“integer”. Per type, each label must be unique.
Now does every object need a label? No. There will be many objects where a
label won’t be strictly necessary. Not every test for a function will need
a label, nor will every implementation of a function. They may have labels,
but they won’t be necessary - this is another difference to Wikidata, where
items without labels are almost always problematic.
One more note on labels - labels do not have to be direct translations
across languages. So in one language, two functions might have the same
label if they have a different type, but in another language the two
functions might have different labels as well. For example, in English
“length” might be an appropriate label for both a function that returns the
number of elements in a list, but also for a function that returns the
length of a river, or a function that returns the duration of a movie. In
Croatian, on the other hand, all three of these might have different names
(“broj elemenata”, “duljina”, “trajanje”). Each language can decide what
pattern works best for them. Whether verbs in the imperative mood (“add”)
or a description of the result (“sum”) or the name of the operation
(“addition”) work best can be decided from language to language, and
independent style guides for each language may evolve.
Aliases. Besides labels, every object can have additional aliases per
language. Aliases are helpful when searching for an object. The above
function, labeled “add” in English, could have all the other alternative
names given above - “sum”, “plus”, “addition” - as aliases, so that when
someone searches for one function by a different name they still find the
right one as a result.
Documentation. Every object may also have documentation in each language.
Documentation is some wikitext that further describes the given object.
Many objects will not have any documentation at all, but many will have
some. If you ever had the opportunity to read The Art of Computer
Programming, you will know that there is often a lot to say about a
function! But besides this kind of background story, we can also have some
documentation describing a given implementation, or some explanation for
why a given test is useful. It could also link to a Phabricator task
describing an error that used to be there and that this test is checking
for it in order to catch it before it resurfaces, or a link to other
resources such as a textbook on algorithms.
Keys and arguments. At least types and functions will also have labels for
each key of the type and for each argument of the function. These will be
used in creating the user interface to display and edit values and function
calls.
Short descriptions. One specific question we have is - should we have
optional short descriptions for each object? In many IDEs, and sometimes
even directly in the programming language (think docstrings in Python)
there is support for short one-liners that give a bit of extra information
for a function, beyond just the name of the function and the arguments and
the types.
We are going to have a strong type system, and we will have plenty of space
for the documentation. So do we really also need a place for short
descriptions? What would their use case be? How would they be used in the
UX? On the Website, something akin to the pop-up previews in Wikipedia seem
to be even more useful than a one-liner, previewing the whole documentation.
Originally I assumed per default that we would have short descriptions,
given their importance and usefulness in Wikidata, and so they also feature
in the AbstractText prototype. But they were never useful there. Also, the
type took over the role of the disambiguator, as described above, so there
was really no technical need for a short description. I currently lean
towards not having them, but I would like to hear more input.
The good thing is that no decision will be carved in stone. The data model
of the wiki of functions will be much more flexible than the data model of
Wikidata, and if we figure out that we do need short descriptions, we can
just introduce them later. It is much harder to remove things though,
because almost everything that is there gets used some way or the other, so
I am more wary of introducing features without a good reason.
Dogfooding. One obvious question is: hey, we are developing this
architecture to create multilingual content, why even have all this
documentation and labels and all that in actual languages, why not use our
own functions to build all of this up?
And yes, agreed, that would be best. It’s just, we’re not there yet, and
until then, we will still need labels and documentation and aliases. So
eventually I would very much look forward to using abstract content to
describe the objects of the wiki of functions themselves. It will also be
interesting to see if we can then roll back some of the local content, and
how open we will be to do so - as this will give us a lot of interesting
insight into how to approach similar goals in the other Wikimedia projects,
from descriptions (and maybe even labels? or sense glosses?) to stubby
Wikipedia articles, there will be plenty of potential to make our content
across projects be easier to maintain and to provide a more uniform
baseline coverage across languages.
New video introducing Abstract Wikipedia. A new presentation, given for
Wikidata’s Eighth Birthday, organized by the community of
WikiProject:India, is available: https://www.youtube.com/watch?v=GAb1HylGemA
Naming contest. Next week, Tuesday 27 October, the second round of voting
for the name of the new wiki of functions will begin. The proposals are
currently being vetted by legal, and we should know which names will be in
the final round soon.
on wiki:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2020-10-29
I had a weekly update written, but then, on the last moment, I had to
entirely drop it and write a new one - you’ll probably get the other update
next week!
We originally had planned to start the second round of the naming contest
this week, but a number of issues were found with the proposals. Instead of
removing proposals, though, we decided to explicitly and transparently
write up the issues that were raised, and then let you, the community,
decide on the proposals. We are currently writing up a ‘voters’ guide’ with
the results from our internal review processes, which we plan to publish by
the end of the week, and then start the voting on Monday. We will also push
back the end of the voting period by the same number of days.
This week also sees the Eighth Birthday of Wikidata! Congratulations to the
Wikidata project and the Wikidata community! Without Wikidata, Abstract
Wikipedia would be unthinkable, for many different reasons, including:
Wikidata provides a large catalog of entities of interest, which can be
referred to by stable identifiers. This will be extremely helpful when
creating the content of the Abstract Wikipedia: it will allow us to make
some simple statements, such as “Marie Curie and Pierre Curie were
introduced to each other by Józef Wierusz-Kowalski.” This could be
expressed by a constructor that takes three Q-IDs as the parameters, Q7186
<https://www.wikidata.org/wiki/Q7186>, Q37463
<https://www.wikidata.org/wiki/Q37463>, and Q11730603
<https://www.wikidata.org/wiki/Q11730603>, as in this case all three
mentioned people are represented by an item in Wikidata. But still,
Wikidata will never be sufficient: not everything we want to talk about
will have a Wikidata item, and Abstract Wikipedia will need a mechanism to
create a reference through description: for example, the mother of Marie
Curie does not have an item in Wikidata. We could either create an item, or
we could refer to her by description in Abstract Wikipedia, i.e. a
constructor with the meaning “the mother of Marie Curie”. But for these
descriptions, having the large catalog of entities that Wikidata provides
will be immensely valuable.
Wikidata also provides a lot of data that can be used directly in Abstract
Wikipedia. We would not need to repeat the date of birth for Marie Curie
<https://www.wikidata.org/wiki/Q7186#P569> in Abstract Wikipedia, but can
simply query Wikidata for the statement about her date of birth and display
the respective value. This will help with reducing the number of places we
have to maintain data.
Wikidata also contains a large catalogue of references, and there are two
aspects to that. On the one hand, the statement about Marie Curie’s date of
birth has in fact 17 references <https://www.wikidata.org/wiki/Q7186#P569>.
We can then select some of those references to be displayed in the rendered
Wikipedia article created from Abstract Wikipedia. This is already
happening today, so if you see the article for Marie Curie in Greek
<https://el.wikipedia.org/wiki/%CE%9C%CE%B1%CF%81%CE%AF%CE%B1_%CE%9A%CE%B9%C…>,
you will see 17 references for the date of birth in the infobox.
On the other hand, Wikidata also contains a lot of possible sources as
items: books, scientific articles, websites. The Wikicite conference, which
had its 2020 edition this week
<https://meta.wikimedia.org/wiki/WikiCite/2020_Virtual_conference>, was all
about growing and maintaining the large corpus of referenceable sources
that Wikidata has become. Having the sources described as items will make
it easy to use them in references in Abstract Wikipedia, as it already
makes it simpler to cite them in the Wikipedias.
Wikidata provides the lexicographic database that will be needed for
Abstract Wikipedia. So when we want to talk about the mother of Marie Curie
in, say, Russian, we need to know what the singular nominative form of the
word ‘mother’ is in Russian. And that information will be coming from the
Form <https://www.wikidata.org/wiki/Lexeme:L57777#F1> stored on the
respective Lexeme in Wikidata.
We are currently planning to better understand and visualize how the
coverage of the lexicographic data in Wikidata is progressing for
encyclopedic content. To undertake this analysis, we have encouraged the
Wikidata team to provide regular JSON dumps of the Lexeme corpora
<https://phabricator.wikimedia.org/T220883>, which is currently processing
and will be available soon. Our thanks to the Wikidata team!
As we see, there are many ways that Wikidata will be used to power Abstract
Wikipedia. In 2021, we also plan to have a discussion amongst the Wikimedia
communities as to whether Wikidata will be the place to actually store and
maintain the content for Abstract Wikipedia, or if it should live in some
other place. There are many pros and cons regarding this decision, and we
are looking forward to the discussion.
In the wiki of functions, Wikidata will provide a large set of interesting
items that can be used as input for functions. Some of the entries in the list
of function examples
<https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Early_function_examples>
illustrate functions that could use Wikidata items as input: there are
functions such as distance, that calculate the distance between two cities,
or the head of state at birth function, that takes a person and returns the
head of state in the place of birth during the time of birth of that
person, etc. We can have plenty of interesting functions that are answering
questions around Wikidata items. We will use Wikidata as a large repository
of items that can be used in Wikidata functions.
Happy Birthday, "big sibling" project Wikidata! We are looking forward to
joining you as a Wikimedia project as the wiki of functions next year!
New media: We had a talk about Abstract Wikipedia
<https://www.youtube.com/watch?v=GAb1HylGemA> at the Wikidata Eighth
Birthday event organized by WikiProject:India and is now available on
Youtube <https://www.youtube.com/watch?v=GAb1HylGemA>.
Hello all,
Just a quick note: we will have to delay the start of the vote for the
second round of the naming contest by a few days.
We will move the start of the vote to Monday, November 2nd. We will also
provide a kind of voter guide to explain some of the potential issues we
encountered with some of the names in the preliminary step, but we will
leave the final decision with the community (unless major issues show up in
the second step of the legal assessment).
Thank you all for your understanding,
Denny
This is a rather technical question. If you are not interested in the inner
working of the function model, feel free to safely skip this one.
Currently, a function call is represented as follows (assume, Z142 is the
concatenation function):
Z1K1: Z7
Z7K1: Z144
Z144K1: "Wiki"
Z144K2: "data"
If we use global keys, it would look like this:
Z1K1: Z7
Z7K1: Z144
K1: "Wiki"
K2: "data"
The local keys in this case get expanded against the Z7K1 value, not the
Z1K1 value, as is the case for all other local keys. This makes it very
different than all the other objects, and requires special handling.
The suggestion is to change the representation of function calls and make
them more unified compared to the other entries, i.e. like this:
Z1K1: Z7
Z7K1:
Z1K1: Z144
Z144K1: "Wiki"
Z144K2: "data"
So, instead of pulling the values into the Z7 object, we basically
instantiate a function just like any other type, and wrap it into a Z7 to
say that this is a function call. This needs one extra object, but it leads
to much more uniform handling of objects.
Any thoughts?
Cheers,
Denny
[The on-wiki version with images, links and markup is available here:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2020-10-14]
Naming contest. Today we closed the first round of the naming contest. 176
proposals were submitted, and more than 500 votes tallied over these. Two
proposals are clearly leading the tally, Wikilambda and Wikifunctions. Four
more proposals made it past the first round: Wikimedia Functions, Wikicode,
Wikifusion, and Wikicodex. These six proposals will now undergo a first,
preliminary check by the legal team of the Wikimedia Foundation, and the
ones that pass the review will enter as candidates into the second and
final round of the naming contest. The second round will begin October 27
and go for two weeks, until we close the voting on November 10.
I am very excited to see the creativity of the community making so many
proposals, and for so many people having joined the voting.
Outreachy. Abstract Wikipedia also submitted a task to the Outreachy
program. Outreachy is a program that was started in 2006 by the GNOME
Foundation, and has grown in scope and in the number of participating
organizations over the years. The goal of the program is to increase the
diversity of contributors to free and open source projects by providing
internship opportunities. Wikimedia joined the program in 2013 for the
first time, and there have been many interns that have been mentored by the
Wikimedia Foundation within the Outreachy program.
For Abstract Wikipedia we have created a task that aims at analyzing the
current code base in the Wikimedia projects. The wiki of functions aims to
make a diverse range of functions available to the Wikimedia projects: unit
conversions, calculations, formatting, and much more. Currently, the
Wikimedia projects already have a mechanism to solve some of these use
cases, by using modules written in Lua.
The task is to take stock of the current situation. What kind of problems
are being solved with the Lua modules? Which of these are available across
many of our projects? Which of these are suitable for the approach that we
plan for the wiki of functions, i.e. can they be resolved with purely
functional solutions without access to context? Do modules have small, but
important differences between their implementations in different projects?
We are curious to see the results of this work, and it will directly feed
into our roll-out plans to support the Wikimedia projects we plan for next
year. It will hopefully help us identify which functions would be most
valuable for the projects, and how to allow the projects which are
interested in doing so to use them and rely on the new repository of
functions.
Development. Our own code base has also been developing. A new view mode is
now available that considerably improves the display and readability of
ZObjects in the wiki (see screenshots of before and after). The development
of the programmatic evaluation of the content based on its types is
ongoing, which is the last big part of our work on Phase β.
External publications. Smashing Magazine published an article on
“Developing for the Semantic Web”, which uses Abstract Wikipedia as its
vehicle, but goes well beyond the scope of Abstract Wikipedia and discusses
the meandering history of Semantic Web technologies and its current
incarnations and hopes.
What can you do?
* Work on lexicographical data in Wikidata
* If you are an Outreachy applicant, check out our task
The on-wiki version is here:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2020-10-07
Functions are a form of knowledge. Functions answer questions. A growing
number of technology companies realize that and give access to functions
through an increasing number of interfaces. Virtual assistants such as
Apple’s Siri, Google Assistant, Amazon’s Alexa, Microsoft’s Cortana, or the
open source project Mycroft all give us access to functions. You can ask
Siri how many teaspoons are in two tablespoons, and it will calculate the
answer for you. Web search interfaces also provide access to functions for
various queries. You can ask Google about the volume of a pyramid, and you
will get a beautiful user experience where you can enter the necessary
values and see how the function to calculate the volume looks like. You can
ask Bing about the age of Sigourney Weaver, and it won’t show you search
results or find the answer on the web, but it will run a function based off
of knowledge of her date of birth stored within Microsoft’s knowledge base
and today’s date and show you the result. In fact, it will show you a
different result if you ask this on the day this text is published and on
the end of the week, as she is going to have her birthday this week.
Congratulations!
But the experiences these companies provide, as beautiful and useful as
they are, are curated and selected by these companies. If you look for a
function that goes beyond what these companies offer, you are out of luck.
Let’s take one example: did you hear about the QWERTY effect
<https://www.wikidata.org/wiki/Q28730063>? There was some research that
claimed that since for right-handed people it is easier to type the keys on
the right-hand side of their keyboard, they have a more positive sentiment
towards words that have more letters typed by the keys on the right-hand
side than those on the left-hand side of the keyboard. Since the majority
of people are right-handed, this translates into a measurable effect where
a sentiment towards a text might be influenced by the ratio of right- vs
left-handed letters in the text.
Now, without commenting on the merit of this research, let’s assume you are
a high school graduate, writing up your application essays for the college
you dream of going to. Or someone looking for a job writing a cover letter.
And you hear about this effect, and, you know, it can’t hurt, why not check
what the ratio of right-hand and left-hand letters in your text is before
sending it off?
If you are a programmer, you have the ability to write a function and run
it on your text. But you know what? That’s a superpower! And if you don’t
have that superpower, you’re out of luck. If the tech companies didn’t yet
create that function for you to use yet, you have to go hunting for a
Website or an app that offers you that function, and maybe you’ll be lucky,
maybe you won’t.
Most of us hold enough computing power in our hand, or around our wrist,
sometimes in our fridges and light bulbs, to easily compute the answer to a
question like this, and to millions of questions more. But we can’t easily
run the relevant functions on the powerful computing devices we own.
We want to change that. With the wiki of functions we want to democratize
this superpower. We want to make functions available to many, many more
people. We want to show people what they can do with the amazing computing
power that they have available.
And it is not just about using these functions -- this is a Wikimedia
project: it is also about writing functions. It is about contributing and
working together on making a large, comprehensive catalog of functions
available to everyone. Maybe you know about a rare unit of measurement, or
a seldomly used calendar, and want to provide a function to convert to the
calendar or unit. Maybe you want to collaborate with others on functions to
calculate the area that are spanned by geoshapes on Commons or Open Street
Maps. Or implement a function from a scientific paper so that others can
easier reuse it. We will create a platform in which a new community will
collaboratively create, curate, and maintain a catalog of functions that
are widely useful, where we explore a new way for everyone to share in a
new form of knowledge.
This is why, even if the goal of a truly multilingual Wikipedia should turn
out to be even more challenging than we expect, the wiki of functions will
provide a useful stepping stone and will be an interesting project in its
own right. And we are working towards making it happen.
Since last week, we have implemented a number of improvements to the first
UX for object creation - particularly, the editing of multilingual labels
is now MUCH nicer! Thanks to Arthur P. Smith for implementing that! We are
currently aiming to tie in the type definitions into the object system, and
to use the types for validating values and for creating generic viewers and
editors.
Also, the voting for the name is ongoing. You have made more than 170
proposals so far, and cast hundreds of votes. The first round of voting
continues until October 13, next week Tuesday. Then the six proposals with
the most votes will have a first round of legal review, and all proposals
that pass that round will be taken to a second round of voting using
instant-runoff.
What can you do?
- Vote on the name
(A version with links is available on-wiki:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2020-09-30 )
Hello all,
Abstract Wikipedia will be a long journey, and I expect the project to
shift and pivot over time. Our ultimate goal is clear: a system that allows
more people to share more knowledge with more people, across languages. And
we have an idea which path we are going to take.
Runa Bhattacharjee, the lead of the WMF’s language team made the suggestion
to keep a journal or blog of our journey, and I will try to follow the
suggestion. The posts will be personal, reflective, aspirational, trying to
capture unfinished thoughts. Maybe we’ll have guest posts. These might,
over time, morph into more structured weekly updates. We see how it goes.
I also will try to not have the posts be too long and meandering, which is
particularly hard in the beginning, when there is so much still
undiscussed. I constantly feel like I want to say more about everything in
here, but I will try to keep these posts succinct, and will come back to
many of the topics later. If there is anything you are particularly
interested in, let me know!
So, how have our first three months been? We are a small team, so the code
base development is going on slowly. The advantage of things moving slowly
is that we have the time to think through different options, and to
socialize the idea of what we are doing more widely with the communities,
with the many related teams, and others. I have been busy with writing
documents, getting settled into a new job, and giving presentations to a
wide number of stakeholders.
At the same time, I was trying to read up the discussions on the mailing
list and the wiki, and follow up on what the enthusiastic community is
doing. We had a volunteer, ZI Jony, create Facebook pages and Telegram
chats and IRC channels and more pages on-wiki, and it was all great to
watch. It looks like we as a community are figuring out our communication
channels, and it’s looking like a great start: in July, our mailing list
has been the most active public mailing list across the whole movement.
Things have cooled down since then, but community members are busily
working on gathering the state of the art in natural language generation
and other tasks - thanks for the volunteering effort by GrounderUK, Chris
Cooley, and Adam Sobieski for their contributions. Nick has caught up on
the ongoing conversations and existing documents, and will be the crucial
link between the development team and the communities. He has now started
and is leading the vote for the name of the wiki of functions - the first
voting round started yesterday, spread the word and join the voting! So
far, more than 150 proposals have been collected, displaying the amazing
creativity of our community.
Adam has been invaluable in helping me navigate the Foundation, and setting
up meetings with people inside the Foundation. There are so many teams
whose work will be paramount to the success of Abstract Wikipedia, and we
are starting to have some of the meetings. Be it legal, be it comms, be it
the community relation experts, be it partnerships, be it the language
team, be it SRE, design, security, performance, analytics, or one of the
many other teams we are working with - a project like Abstract Wikipedia
obviously doesn’t happen in a vacuum, but in the context of a larger
organization, which in turn is embedded in an even larger movement. And in
fact, this is the only way such a project even has a shot at succeeding.
Adam is also leading to find our first new hire and has posted the job
description for a full-stack software engineer. We hope for more positions
opening soon.
Regarding our code base, James has been turning requirements and vague
wishes into architecture and code. Thanks to James’s in-depth knowledge of
the MediaWiki code base, he was able to set up the WikiLambda extension and
build a solid foundation for further development quickly. We have defined
eleven phases to take us to the launch of the wiki of functions, and so far
we have finished the first one, phase α, which allows us to store objects
in MediaWiki. Phase β is currently in full swing, and allows for the
objects to be typed and types to be created. We are also very thankful to
Arthur P. Smith, a volunteer, who has created frontend components in Vue
that provide a first version of an editing interface. His first patch
landed last week. Arthur is also joining our daily stand-up meetings once a
week. If you want to help with development, please reach out. A screenshot
of Arthur’s results is available - and also a demo system, set up by Lucas
Werkmeister (in his volunteer capacity), is up and running. Lucas and
Arthur, you're awesome!
There are currently so many ideas floating and being discussed and so many
wishes and hopes being projected onto Abstract Wikipedia, that it is almost
a given that we will disappoint some of those. Apologies for that. I as
well have great expectations as to where the project will lead to, and when
I allow myself to dream, I imagine that this will be groundbreaking and
change fundamentally not only the way Wikipedia works, but have impact well
beyond that. But then I remind myself that projects like these have a
terrible history of failing, and I plan to get to some of these failures in
the following posts. And what we can learn from them.
I am extremely grateful for the very warm welcome to the project by the
communities. Now, one of the steps for us is to manage expectations. First,
we are a very small, exploratory project - so don’t expect us to launch a
wiki next week. Or next month. Next year? That’s our current plan. Also,
even when we launch, we will launch with a very minimal project. There
won’t be an integration into local Wikipedias yet. There won’t be any
support for natural language generation (that is planned for 2022, if all
goes well). And in fact, I am afraid, it will be even confusing, because
we’re starting with a wiki for the collaborative curation of a catalog of
functions, and there might well be the question, why are we doing this and
not focusing on the generation of text for the local Wikipedias? And what
is a function anyway and why do we need a catalog of these? I will come
back to this in future posts. You can see our current best attempt at a
clear and concise explanation of functions at the main project page.
I want to end each post with a list of things that can be done right now,
things as actionable as possible. Feel free to send suggestions my way.
Things to do:
* Vote in the first round of the Wiki of function naming contests (new
proposals are also still allowed). The first vote ends October 13.
* Work on lexicographic knowledge in Wikidata. I will come to how central
this is in later posts, but this will be needed for Abstract Wikipedia to
succeed and can be worked on right now.
Cheers,
Denny