[Abstract-wikipedia] Re: Newsletter #39: Abstract descriptions

1 Aug 2021

      Hoi,
On Twitter I applauded the notion of Abstract descriptions replacing
Automated descriptions. Indeed we have suffered inadequate descriptions
long enough. Inadequate because descriptions were concocted to mitigate
restrictions in the software (uniquencess and consequently "researcher,
ORCID 0000.0000.0000.0000"  used as an uninformative description on
thousands of people. It is why the automated descriptions have been so
valuable; given additional statements, qualifiers the descriptions reflect
what is known.
My hope for abstract descriptions is that it will replace automated
descriptions and update its text when changes are made to the item. Having
a text string representing the Abstract representation is fine as it
reduces the time to prepare the string that is to be presented in any
language. It is however crucial for an abstract description that is
represents the data as available.
One key reason for having automated descriptions is that it facilitates
editors and it does not require knowledge about the construction of
abstract wikipedia texts. As a result the text will *always* reflect the
current knowledge about each item.
Thanks,
        GerardM
On Fri, 30 Jul 2021 at 00:14, Denny Vrandečić dvrandecic@wikimedia.org
wrote:
...
The on-wiki version of this newsletter is available here:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2021-07-29
--
Our goal with Abstract Wikipedia is to enable everyone to write content in
any language that can be read in any language. Ultimately, the main form of
content we aim for are Wikipedia articles, in order to allow everyone to
equitably have and contribute to unbiased, up-to-date, comprehensive
encyclopedic knowledge.
In the coming months, we will take major milestones towards that goal.
Today, I want to sketch one possible milestone on our way: abstract
descriptions for Wikidata.
Every Item https://www.wikidata.org/wiki/Help:Items in Wikidata has a
label https://www.wikidata.org/wiki/Help:Label, a short description
https://www.wikidata.org/wiki/Help:Description, and aliases
https://www.wikidata.org/wiki/Help:Aliases in each language. Let’s say
you take a look at Item Q836805 https://www.wikidata.org/wiki/Q836805.
In English, that Item has the label *“Chalmers University of Technology”* and
the description *“university in Gothenburg, Sweden”*. In Swedish it is *“Chalmers
tekniska högskola”* and *“universitet i Göteborg, Sverige”*. The goal of
the label is to be a common name for the Item, and together with the
description it should uniquely identify the Item in the world. That’s why,
although multiple Items can have the same label, as things in the world can
be called the same but be different, no two Items should have both the same
label and the same description in a given language. The aliases are used to
help with improving the search experience.
The meaning of the descriptions across languages is often the same, and
when it is not, although sometimes intentional, it usually differs by
accident. Given there are more than 94 million Items in Wikidata, and
Wikidata supports more than 430 languages, that would mean that if we had
perfect coverage, we would have more than 40 billion labels and as many
descriptions. And not only would the creation of all these labels and
descriptions be a huge amount of work, they would also need to be
maintained. If there are not enough contributors checking on the quality of
these, it would be unfortunately easy to sneak in vandalism.
The Wikidata community has known about this issue for a long time, and
made great efforts to correct it. Tools such as AutoDesc
https://autodesc.toolforge.org/ by Magnus Manske
https://meta.wikimedia.org/wiki/User:Magnus_Manske and bots such as
Edoderoobot https://www.wikidata.org/wiki/User:Edoderoobot,
Mr.Ibrahembot https://www.wikidata.org/wiki/User:Mr.Ibrahembot, MatSuBot
https://www.wikidata.org/wiki/User:MatSuBot (these were selected by
clicking “Random Item” and looking at the history) and many others have
worked on increasing the coverage. And it shows: these bots often target
descriptions, and so, even though only six languages have *labels* for
more than 10% of Wikidata Items, a whopping 64 languages have a coverage
over 10% for *descriptions*! Today, we have well over two billion
descriptions in Wikidata.
These bots create descriptions, usually based on the existing statements
of the Item. And that is great. But there is no easy way to fix an error
across languages, nor is there an easy way to ensure that no vandalism has
snuck in. Also, bots give an oversized responsibility to a comparably small
group of bot operators. Our goal is to democratize that responsibility
again and allow more people to contribute.
Descriptions in Wikidata are usually noun phrases, which are something
that we will need to be able to do for Abstract Wikipedia anyway. We want
to start thinking about how to implement this feature, and then derive from
there what will need to happen in Wikifunctions and in Wikidata. This work
will need to happen in close coöperation with the Wikidata team, and the
communities of both Wikidata and Wikifunctions. It will represent a way to
ramp-up our capabilities towards the wider vision of Abstract Wikipedia.
Timewise, we hope to achieve that in 2022.
We don’t know yet how exactly this will work. Here are a few thoughts, but
really I invite you so that we all work together on the design for abstract
descriptions:

It must be possible to overwrite a description for a given language
It must be possible to retract a local overwrite for a given language
The pair of label and description still must remain unique
It would be great if implementing this would not be a large effort
The goal is not to create automatic descriptions

https://www.wikidata.org/wiki/Wikidata:Automating_descriptions, but
   abstract descriptions
The last point is subtle: an automatic description is a description
generated automatically from the given statements of an Item. That’s a
valuable and very difficult task. The above mentioned AutoDesc for example,
starts the English description for Douglas Adams
https://autodesc.toolforge.org/?q=Q42&lang=en&mode=short&links=text&redlinks=&format=html&get_infobox=yes&infobox_template= as
follows: *“British playwright, screenwriter, novelist, children's writer,
science fiction writer, comedian, and writer (1952–2001) ♂; member of
Footlights and Groucho Club; child of Christopher Douglas Adams and Janet
Adams; spouse of Jane Belson”*. The Item
https://www.wikidata.org/wiki/Q42's current manual English description
is the much more succinct *“English writer and humorist”*. There can be
many subtle decisions and editorial judgements to be made in order to
create the description for a given Item, and I think we should be working
on this — but later.
Instead, we want to support abstract descriptions: a description, manually
created, but instead of being written in a specific natural language, it is
encoded in the abstract notation of Wikifunctions and then we use the
renderers to generate the natural languages text. This allows the community
to retain direct control over the content of a description.
Here are a few ideas to kick off the conversation:

We introduce a new language code, qqz. That code is in the range

reserved for local use, and is similar to the other dummy language
   codes https://www.mediawiki.org/wiki/Manual:$wgDummyLanguageCodes in
   MediaWiki, qqq and qqx. Wikidata is to support the qqz language code
   for descriptions.

The content of the qqz description is an abstract content.

Technically we could store it in some string notation such as “Z12367(
   Q3918 https://www.wikidata.org/wiki/Q3918, Q25287
   https://www.wikidata.org/wiki/Q25287, Q34
   https://www.wikidata.org/wiki/Q34)”. Or we could store the JSON
   ZObject.

The abstract description would be edited using the same Vue

components we develop for Wikifunctions for editing abstract content.

The abstract description is a fallback for languages without a

description. It can be overwritten by providing a description in that
   language.

Every time the renderer function or the underlying lexicographic

data changes, we also need to retrigger the relevant generations.

One question is whether we should store the generated description in

the Item, and if so, how to change the data model in order to mark the
   description as generated from the abstract description.

We also need to figure out how to report changes to everyone who is

interested in tracking them. If we store the generated description as
   proposed above, we can piggyback on the current system.
All of these are just ideas for discussion. Some of the major questions
are whether to store all the generated descriptions in the Item or not, how
to represent that in the edit history of the Item, how to design the
caching and retriggering of the generated descriptions, etc.
What would that look like?
Let’s take a look at an oversimplified example. The description for
Chalmers is *“university in Gothenburg, Sweden”*. That seems like a
reasonably simple case that could easily be templated into abstract content
say of the form “Z12367(Q3918 https://www.wikidata.org/wiki/Q3918,
Q25287 https://www.wikidata.org/wiki/Q25287, Q34
https://www.wikidata.org/wiki/Q34)”, where Z12367 (that ZID is made-up)
represents the abstract content saying in English *“(institution) in
(city), (country)”*, Q3918 https://www.wikidata.org/wiki/Q3918 the QID
for university, Q25287 https://www.wikidata.org/wiki/Q25287 the QID for
Gothenburg, and Q34 https://www.wikidata.org/wiki/Q34 the QID for
Sweden. (In reality, this template is actually nowhere near as simple as it
looks like - we will discuss this more in an upcoming weekly newsletter.
For now, let’s assume this to be so simple.)
Renderers would then take this abstract content and for each language
generate the description, in this case *“university in Gothenburg,
Sweden”* for English, or *“sveučilište u Göteborgu u Švedskoj”* in
Croatian. Since there is already an English description, we wouldn’t store
nor actually generate the text, but in Croatian we would generate it, store
it, and mark it as a generated description.
We think of this as a good milestone on our path to Abstract Wikipedia,
with a directly useful outcome. What are your thoughts? Join us in
discussing this idea on the following talk page:
https://meta.wikimedia.org/wiki/Talk:Abstract_Wikipedia/Updates/2021-07-29

In other news, Lindsay has created a video of a new feature: how Testers
and Implementations work together to show whether the tests pass. The video
is availabe here:
https://commons.wikimedia.org/wiki/File:Wikilambda_Testers_on_Code_based_Imp...
The video shows how she is changing the implementation and re-running the
testers several times. Testers will be a main component in ensuring the
quality of Wikifunctions.
The next opportunity to meet us and ask us questions will be at Wikimania.
On 14 August, at 17:00 UTC, we will host a 1.5 hour session on
Wikifunctions and Abstract Wikipedia. This year, Wikimania will be an
entirely virtual event and registration is free. Bring your questions and
discussions to Wikimania 2021.
Next week, we are skipping the weekly update.
_______________________________________________
Abstract-Wikipedia mailing list -- abstract-wikipedia@lists.wikimedia.org
List information:
https://lists.wikimedia.org/postorius/lists/abstract-wikipedia.lists.wikimed...

2025

2024

2023

2022

2021

2020

[Abstract-wikipedia] Re: Newsletter #39: Abstract descriptions