The on-wiki version of this newsletter is available here: https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2021-07-29
--
Our goal with Abstract Wikipedia is to enable everyone to write content in any language that can be read in any language. Ultimately, the main form of content we aim for are Wikipedia articles, in order to allow everyone to equitably have and contribute to unbiased, up-to-date, comprehensive encyclopedic knowledge.
In the coming months, we will take major milestones towards that goal. Today, I want to sketch one possible milestone on our way: abstract descriptions for Wikidata.
Every Item https://www.wikidata.org/wiki/Help:Items in Wikidata has a label https://www.wikidata.org/wiki/Help:Label, a short description https://www.wikidata.org/wiki/Help:Description, and aliases https://www.wikidata.org/wiki/Help:Aliases in each language. Let’s say you take a look at Item Q836805 https://www.wikidata.org/wiki/Q836805. In English, that Item has the label *“Chalmers University of Technology”* and the description *“university in Gothenburg, Sweden”*. In Swedish it is *“Chalmers tekniska högskola”* and *“universitet i Göteborg, Sverige”*. The goal of the label is to be a common name for the Item, and together with the description it should uniquely identify the Item in the world. That’s why, although multiple Items can have the same label, as things in the world can be called the same but be different, no two Items should have both the same label and the same description in a given language. The aliases are used to help with improving the search experience.
The meaning of the descriptions across languages is often the same, and when it is not, although sometimes intentional, it usually differs by accident. Given there are more than 94 million Items in Wikidata, and Wikidata supports more than 430 languages, that would mean that if we had perfect coverage, we would have more than 40 billion labels and as many descriptions. And not only would the creation of all these labels and descriptions be a huge amount of work, they would also need to be maintained. If there are not enough contributors checking on the quality of these, it would be unfortunately easy to sneak in vandalism.
The Wikidata community has known about this issue for a long time, and made great efforts to correct it. Tools such as AutoDesc https://autodesc.toolforge.org/ by Magnus Manske https://meta.wikimedia.org/wiki/User:Magnus_Manske and bots such as Edoderoobot https://www.wikidata.org/wiki/User:Edoderoobot, Mr.Ibrahembot https://www.wikidata.org/wiki/User:Mr.Ibrahembot, MatSuBot https://www.wikidata.org/wiki/User:MatSuBot (these were selected by clicking “Random Item” and looking at the history) and many others have worked on increasing the coverage. And it shows: these bots often target descriptions, and so, even though only six languages have *labels* for more than 10% of Wikidata Items, a whopping 64 languages have a coverage over 10% for *descriptions*! Today, we have well over two billion descriptions in Wikidata.
These bots create descriptions, usually based on the existing statements of the Item. And that is great. But there is no easy way to fix an error across languages, nor is there an easy way to ensure that no vandalism has snuck in. Also, bots give an oversized responsibility to a comparably small group of bot operators. Our goal is to democratize that responsibility again and allow more people to contribute.
Descriptions in Wikidata are usually noun phrases, which are something that we will need to be able to do for Abstract Wikipedia anyway. We want to start thinking about how to implement this feature, and then derive from there what will need to happen in Wikifunctions and in Wikidata. This work will need to happen in close coöperation with the Wikidata team, and the communities of both Wikidata and Wikifunctions. It will represent a way to ramp-up our capabilities towards the wider vision of Abstract Wikipedia. Timewise, we hope to achieve that in 2022.
We don’t know yet how exactly this will work. Here are a few thoughts, but really I invite you so that we all work together on the design for abstract descriptions:
- It must be possible to overwrite a description for a given language - It must be possible to retract a local overwrite for a given language - The pair of label and description still must remain unique - It would be great if implementing this would not be a large effort - The goal is not to create automatic descriptions https://www.wikidata.org/wiki/Wikidata:Automating_descriptions, but abstract descriptions
The last point is subtle: an automatic description is a description generated automatically from the given statements of an Item. That’s a valuable and very difficult task. The above mentioned AutoDesc for example, starts the English description for Douglas Adams https://autodesc.toolforge.org/?q=Q42&lang=en&mode=short&links=text&redlinks=&format=html&get_infobox=yes&infobox_template= as follows: *“British playwright, screenwriter, novelist, children's writer, science fiction writer, comedian, and writer (1952–2001) ♂; member of Footlights and Groucho Club; child of Christopher Douglas Adams and Janet Adams; spouse of Jane Belson”*. The Item https://www.wikidata.org/wiki/Q42's current manual English description is the much more succinct *“English writer and humorist”*. There can be many subtle decisions and editorial judgements to be made in order to create the description for a given Item, and I think we should be working on this — but later.
Instead, we want to support abstract descriptions: a description, manually created, but instead of being written in a specific natural language, it is encoded in the abstract notation of Wikifunctions and then we use the renderers to generate the natural languages text. This allows the community to retain direct control over the content of a description.
Here are a few ideas to kick off the conversation:
- We introduce a new language code, qqz. That code is in the range reserved for local use, and is similar to the other dummy language codes https://www.mediawiki.org/wiki/Manual:$wgDummyLanguageCodes in MediaWiki, qqq and qqx. Wikidata is to support the qqz language code for descriptions. - The content of the qqz description is an abstract content. Technically we could store it in some string notation such as “Z12367(Q3918 https://www.wikidata.org/wiki/Q3918, Q25287 https://www.wikidata.org/wiki/Q25287, Q34 https://www.wikidata.org/wiki/Q34)”. Or we could store the JSON ZObject. - The abstract description would be edited using the same Vue components we develop for Wikifunctions for editing abstract content. - The abstract description is a fallback for languages without a description. It can be overwritten by providing a description in that language. - Every time the renderer function or the underlying lexicographic data changes, we also need to retrigger the relevant generations. - One question is whether we should store the generated description in the Item, and if so, how to change the data model in order to mark the description as generated from the abstract description. - We also need to figure out how to report changes to everyone who is interested in tracking them. If we store the generated description as proposed above, we can piggyback on the current system.
All of these are just ideas for discussion. Some of the major questions are whether to store all the generated descriptions in the Item or not, how to represent that in the edit history of the Item, how to design the caching and retriggering of the generated descriptions, etc.
What would that look like?
Let’s take a look at an oversimplified example. The description for Chalmers is *“university in Gothenburg, Sweden”*. That seems like a reasonably simple case that could easily be templated into abstract content say of the form “Z12367(Q3918 https://www.wikidata.org/wiki/Q3918, Q25287 https://www.wikidata.org/wiki/Q25287, Q34 https://www.wikidata.org/wiki/Q34)”, where Z12367 (that ZID is made-up) represents the abstract content saying in English *“(institution) in (city), (country)”*, Q3918 https://www.wikidata.org/wiki/Q3918 the QID for university, Q25287 https://www.wikidata.org/wiki/Q25287 the QID for Gothenburg, and Q34 https://www.wikidata.org/wiki/Q34 the QID for Sweden. (In reality, this template is actually nowhere near as simple as it looks like - we will discuss this more in an upcoming weekly newsletter. For now, let’s assume this to be so simple.)
Renderers would then take this abstract content and for each language generate the description, in this case *“university in Gothenburg, Sweden”* for English, or *“sveučilište u Göteborgu u Švedskoj”* in Croatian. Since there is already an English description, we wouldn’t store nor actually generate the text, but in Croatian we would generate it, store it, and mark it as a generated description.
We think of this as a good milestone on our path to Abstract Wikipedia, with a directly useful outcome. What are your thoughts? Join us in discussing this idea on the following talk page: https://meta.wikimedia.org/wiki/Talk:Abstract_Wikipedia/Updates/2021-07-29 ------------------------------
In other news, Lindsay has created a video of a new feature: how Testers and Implementations work together to show whether the tests pass. The video is availabe here: https://commons.wikimedia.org/wiki/File:Wikilambda_Testers_on_Code_based_Imp...
The video shows how she is changing the implementation and re-running the testers several times. Testers will be a main component in ensuring the quality of Wikifunctions.
The next opportunity to meet us and ask us questions will be at Wikimania. On 14 August, at 17:00 UTC, we will host a 1.5 hour session on Wikifunctions and Abstract Wikipedia. This year, Wikimania will be an entirely virtual event and registration is free. Bring your questions and discussions to Wikimania 2021.
Next week, we are skipping the weekly update.
Since there is already an English description, we wouldn’t store nor actually generate the text
If there is already an English description and an abstract description is created, should we delete the English description to replace it with the description generated (and maybe cached) from abstract?
Envoyé de mon iPhone
Le 30 juil. 2021 à 00:14, Denny Vrandečić dvrandecic@wikimedia.org a écrit :
The on-wiki version of this newsletter is available here: https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2021-07-29
--
Our goal with Abstract Wikipedia is to enable everyone to write content in any language that can be read in any language. Ultimately, the main form of content we aim for are Wikipedia articles, in order to allow everyone to equitably have and contribute to unbiased, up-to-date, comprehensive encyclopedic knowledge.
In the coming months, we will take major milestones towards that goal. Today, I want to sketch one possible milestone on our way: abstract descriptions for Wikidata.
Every Item in Wikidata has a label, a short description, and aliases in each language. Let’s say you take a look at Item Q836805. In English, that Item has the label “Chalmers University of Technology” and the description “university in Gothenburg, Sweden”. In Swedish it is “Chalmers tekniska högskola” and “universitet i Göteborg, Sverige”. The goal of the label is to be a common name for the Item, and together with the description it should uniquely identify the Item in the world. That’s why, although multiple Items can have the same label, as things in the world can be called the same but be different, no two Items should have both the same label and the same description in a given language. The aliases are used to help with improving the search experience.
The meaning of the descriptions across languages is often the same, and when it is not, although sometimes intentional, it usually differs by accident. Given there are more than 94 million Items in Wikidata, and Wikidata supports more than 430 languages, that would mean that if we had perfect coverage, we would have more than 40 billion labels and as many descriptions. And not only would the creation of all these labels and descriptions be a huge amount of work, they would also need to be maintained. If there are not enough contributors checking on the quality of these, it would be unfortunately easy to sneak in vandalism.
The Wikidata community has known about this issue for a long time, and made great efforts to correct it. Tools such as AutoDesc by Magnus Manske and bots such as Edoderoobot, Mr.Ibrahembot, MatSuBot (these were selected by clicking “Random Item” and looking at the history) and many others have worked on increasing the coverage. And it shows: these bots often target descriptions, and so, even though only six languages have labels for more than 10% of Wikidata Items, a whopping 64 languages have a coverage over 10% for descriptions! Today, we have well over two billion descriptions in Wikidata.
These bots create descriptions, usually based on the existing statements of the Item. And that is great. But there is no easy way to fix an error across languages, nor is there an easy way to ensure that no vandalism has snuck in. Also, bots give an oversized responsibility to a comparably small group of bot operators. Our goal is to democratize that responsibility again and allow more people to contribute.
Descriptions in Wikidata are usually noun phrases, which are something that we will need to be able to do for Abstract Wikipedia anyway. We want to start thinking about how to implement this feature, and then derive from there what will need to happen in Wikifunctions and in Wikidata. This work will need to happen in close coöperation with the Wikidata team, and the communities of both Wikidata and Wikifunctions. It will represent a way to ramp-up our capabilities towards the wider vision of Abstract Wikipedia. Timewise, we hope to achieve that in 2022.
We don’t know yet how exactly this will work. Here are a few thoughts, but really I invite you so that we all work together on the design for abstract descriptions:
It must be possible to overwrite a description for a given language It must be possible to retract a local overwrite for a given language The pair of label and description still must remain unique It would be great if implementing this would not be a large effort The goal is not to create automatic descriptions, but abstract descriptions The last point is subtle: an automatic description is a description generated automatically from the given statements of an Item. That’s a valuable and very difficult task. The above mentioned AutoDesc for example, starts the English description for Douglas Adams as follows: “British playwright, screenwriter, novelist, children's writer, science fiction writer, comedian, and writer (1952–2001) ♂; member of Footlights and Groucho Club; child of Christopher Douglas Adams and Janet Adams; spouse of Jane Belson”. The Item's current manual English description is the much more succinct “English writer and humorist”. There can be many subtle decisions and editorial judgements to be made in order to create the description for a given Item, and I think we should be working on this — but later.
Instead, we want to support abstract descriptions: a description, manually created, but instead of being written in a specific natural language, it is encoded in the abstract notation of Wikifunctions and then we use the renderers to generate the natural languages text. This allows the community to retain direct control over the content of a description.
Here are a few ideas to kick off the conversation:
We introduce a new language code, qqz. That code is in the range reserved for local use, and is similar to the other dummy language codes in MediaWiki, qqq and qqx. Wikidata is to support the qqz language code for descriptions. The content of the qqz description is an abstract content. Technically we could store it in some string notation such as “Z12367(Q3918, Q25287, Q34)”. Or we could store the JSON ZObject. The abstract description would be edited using the same Vue components we develop for Wikifunctions for editing abstract content. The abstract description is a fallback for languages without a description. It can be overwritten by providing a description in that language. Every time the renderer function or the underlying lexicographic data changes, we also need to retrigger the relevant generations. One question is whether we should store the generated description in the Item, and if so, how to change the data model in order to mark the description as generated from the abstract description. We also need to figure out how to report changes to everyone who is interested in tracking them. If we store the generated description as proposed above, we can piggyback on the current system. All of these are just ideas for discussion. Some of the major questions are whether to store all the generated descriptions in the Item or not, how to represent that in the edit history of the Item, how to design the caching and retriggering of the generated descriptions, etc.
What would that look like?
Let’s take a look at an oversimplified example. The description for Chalmers is “university in Gothenburg, Sweden”. That seems like a reasonably simple case that could easily be templated into abstract content say of the form “Z12367(Q3918, Q25287, Q34)”, where Z12367 (that ZID is made-up) represents the abstract content saying in English “(institution) in (city), (country)”, Q3918 the QID for university, Q25287 the QID for Gothenburg, and Q34 the QID for Sweden. (In reality, this template is actually nowhere near as simple as it looks like - we will discuss this more in an upcoming weekly newsletter. For now, let’s assume this to be so simple.)
Renderers would then take this abstract content and for each language generate the description, in this case “university in Gothenburg, Sweden” for English, or “sveučilište u Göteborgu u Švedskoj” in Croatian. Since there is already an English description, we wouldn’t store nor actually generate the text, but in Croatian we would generate it, store it, and mark it as a generated description.
We think of this as a good milestone on our path to Abstract Wikipedia, with a directly useful outcome. What are your thoughts? Join us in discussing this idea on the following talk page: https://meta.wikimedia.org/wiki/Talk:Abstract_Wikipedia/Updates/2021-07-29
In other news, Lindsay has created a video of a new feature: how Testers and Implementations work together to show whether the tests pass. The video is availabe here: https://commons.wikimedia.org/wiki/File:Wikilambda_Testers_on_Code_based_Imp...
The video shows how she is changing the implementation and re-running the testers several times. Testers will be a main component in ensuring the quality of Wikifunctions.
The next opportunity to meet us and ask us questions will be at Wikimania. On 14 August, at 17:00 UTC, we will host a 1.5 hour session on Wikifunctions and Abstract Wikipedia. This year, Wikimania will be an entirely virtual event and registration is free. Bring your questions and discussions to Wikimania 2021.
Next week, we are skipping the weekly update.
Abstract-Wikipedia mailing list -- abstract-wikipedia@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/abstract-wikipedia.lists.wikimed...
That would possibly depend on where the English description is stored. On English Wikipedia the answer is a simple No, not until you have broad consensus from the English Wikipedia editing community. Similar conditions are likely to exist on other projects. You will have to ask them each separately, on-wiki, at whatever venue they use for such discussions. Cheers, Peter
From: Julio974 [mailto:jules.bour.1@gmail.com] Sent: 31 July 2021 23:30 To: General public mailing list for the discussion of Abstract Wikipedia and Wikifunctions Subject: [Abstract-wikipedia] Re: Newsletter #39: Abstract descriptions
Since there is already an English description, we wouldn’t store nor actually generate the text
If there is already an English description and an abstract description is created, should we delete the English description to replace it with the description generated (and maybe cached) from abstract?
Envoyé de mon iPhone
Le 30 juil. 2021 à 00:14, Denny Vrandečić dvrandecic@wikimedia.org a écrit :
The on-wiki version of this newsletter is available here:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2021-07-29
--
Our goal with Abstract Wikipedia is to enable everyone to write content in any language that can be read in any language. Ultimately, the main form of content we aim for are Wikipedia articles, in order to allow everyone to equitably have and contribute to unbiased, up-to-date, comprehensive encyclopedic knowledge.
In the coming months, we will take major milestones towards that goal. Today, I want to sketch one possible milestone on our way: abstract descriptions for Wikidata.
Every https://www.wikidata.org/wiki/Help:Items Item in Wikidata has a https://www.wikidata.org/wiki/Help:Label label, a short https://www.wikidata.org/wiki/Help:Description description, and https://www.wikidata.org/wiki/Help:Aliases aliases in each language. Let’s say you take a look at Item https://www.wikidata.org/wiki/Q836805 Q836805. In English, that Item has the label “Chalmers University of Technology” and the description “university in Gothenburg, Sweden”. In Swedish it is “Chalmers tekniska högskola” and “universitet i Göteborg, Sverige”. The goal of the label is to be a common name for the Item, and together with the description it should uniquely identify the Item in the world. That’s why, although multiple Items can have the same label, as things in the world can be called the same but be different, no two Items should have both the same label and the same description in a given language. The aliases are used to help with improving the search experience.
The meaning of the descriptions across languages is often the same, and when it is not, although sometimes intentional, it usually differs by accident. Given there are more than 94 million Items in Wikidata, and Wikidata supports more than 430 languages, that would mean that if we had perfect coverage, we would have more than 40 billion labels and as many descriptions. And not only would the creation of all these labels and descriptions be a huge amount of work, they would also need to be maintained. If there are not enough contributors checking on the quality of these, it would be unfortunately easy to sneak in vandalism.
The Wikidata community has known about this issue for a long time, and made great efforts to correct it. Tools such as https://autodesc.toolforge.org/ AutoDesc by https://meta.wikimedia.org/wiki/User:Magnus_Manske Magnus Manske and bots such as https://www.wikidata.org/wiki/User:Edoderoobot Edoderoobot, https://www.wikidata.org/wiki/User:Mr.Ibrahembot Mr.Ibrahembot, https://www.wikidata.org/wiki/User:MatSuBot MatSuBot (these were selected by clicking “Random Item” and looking at the history) and many others have worked on increasing the coverage. And it shows: these bots often target descriptions, and so, even though only six languages have labels for more than 10% of Wikidata Items, a whopping 64 languages have a coverage over 10% for descriptions! Today, we have well over two billion descriptions in Wikidata.
These bots create descriptions, usually based on the existing statements of the Item. And that is great. But there is no easy way to fix an error across languages, nor is there an easy way to ensure that no vandalism has snuck in. Also, bots give an oversized responsibility to a comparably small group of bot operators. Our goal is to democratize that responsibility again and allow more people to contribute.
Descriptions in Wikidata are usually noun phrases, which are something that we will need to be able to do for Abstract Wikipedia anyway. We want to start thinking about how to implement this feature, and then derive from there what will need to happen in Wikifunctions and in Wikidata. This work will need to happen in close coöperation with the Wikidata team, and the communities of both Wikidata and Wikifunctions. It will represent a way to ramp-up our capabilities towards the wider vision of Abstract Wikipedia. Timewise, we hope to achieve that in 2022.
We don’t know yet how exactly this will work. Here are a few thoughts, but really I invite you so that we all work together on the design for abstract descriptions:
· It must be possible to overwrite a description for a given language
· It must be possible to retract a local overwrite for a given language
· The pair of label and description still must remain unique
· It would be great if implementing this would not be a large effort
· The goal is not to create https://www.wikidata.org/wiki/Wikidata:Automating_descriptions automatic descriptions, but abstract descriptions
The last point is subtle: an automatic description is a description generated automatically from the given statements of an Item. That’s a valuable and very difficult task. The above mentioned AutoDesc for example, starts the English https://autodesc.toolforge.org/?q=Q42&lang=en&mode=short&links=text&redlinks=&format=html&get_infobox=yes&infobox_template= description for Douglas Adams as follows: “British playwright, screenwriter, novelist, children's writer, science fiction writer, comedian, and writer (1952–2001) ♂; member of Footlights and Groucho Club; child of Christopher Douglas Adams and Janet Adams; spouse of Jane Belson”. The https://www.wikidata.org/wiki/Q42 Item's current manual English description is the much more succinct “English writer and humorist”. There can be many subtle decisions and editorial judgements to be made in order to create the description for a given Item, and I think we should be working on this — but later.
Instead, we want to support abstract descriptions: a description, manually created, but instead of being written in a specific natural language, it is encoded in the abstract notation of Wikifunctions and then we use the renderers to generate the natural languages text. This allows the community to retain direct control over the content of a description.
Here are a few ideas to kick off the conversation:
· We introduce a new language code, qqz. That code is in the range reserved for local use, and is similar to the other https://www.mediawiki.org/wiki/Manual:$wgDummyLanguageCodes dummy language codes in MediaWiki, qqq and qqx. Wikidata is to support the qqz language code for descriptions.
· The content of the qqz description is an abstract content. Technically we could store it in some string notation such as “Z12367( https://www.wikidata.org/wiki/Q3918 Q3918, https://www.wikidata.org/wiki/Q25287 Q25287, https://www.wikidata.org/wiki/Q34 Q34)”. Or we could store the JSON ZObject.
· The abstract description would be edited using the same Vue components we develop for Wikifunctions for editing abstract content.
· The abstract description is a fallback for languages without a description. It can be overwritten by providing a description in that language.
· Every time the renderer function or the underlying lexicographic data changes, we also need to retrigger the relevant generations.
· One question is whether we should store the generated description in the Item, and if so, how to change the data model in order to mark the description as generated from the abstract description.
· We also need to figure out how to report changes to everyone who is interested in tracking them. If we store the generated description as proposed above, we can piggyback on the current system.
All of these are just ideas for discussion. Some of the major questions are whether to store all the generated descriptions in the Item or not, how to represent that in the edit history of the Item, how to design the caching and retriggering of the generated descriptions, etc.
What would that look like?
Let’s take a look at an oversimplified example. The description for Chalmers is “university in Gothenburg, Sweden”. That seems like a reasonably simple case that could easily be templated into abstract content say of the form “Z12367( https://www.wikidata.org/wiki/Q3918 Q3918, https://www.wikidata.org/wiki/Q25287 Q25287, https://www.wikidata.org/wiki/Q34 Q34)”, where Z12367 (that ZID is made-up) represents the abstract content saying in English “(institution) in (city), (country)”, https://www.wikidata.org/wiki/Q3918 Q3918 the QID for university, https://www.wikidata.org/wiki/Q25287 Q25287 the QID for Gothenburg, and https://www.wikidata.org/wiki/Q34 Q34 the QID for Sweden. (In reality, this template is actually nowhere near as simple as it looks like - we will discuss this more in an upcoming weekly newsletter. For now, let’s assume this to be so simple.)
Renderers would then take this abstract content and for each language generate the description, in this case “university in Gothenburg, Sweden” for English, or “sveučilište u Göteborgu u Švedskoj” in Croatian. Since there is already an English description, we wouldn’t store nor actually generate the text, but in Croatian we would generate it, store it, and mark it as a generated description.
We think of this as a good milestone on our path to Abstract Wikipedia, with a directly useful outcome. What are your thoughts? Join us in discussing this idea on the following talk page: https://meta.wikimedia.org/wiki/Talk:Abstract_Wikipedia/Updates/2021-07-29
_____
In other news, Lindsay has created a video of a new feature: how Testers and Implementations work together to show whether the tests pass. The video is availabe here: https://commons.wikimedia.org/wiki/File:Wikilambda_Testers_on_Code_based_Implementation_-_Early_prototype.webm https://commons.wikimedia.org/wiki/File:Wikilambda_Testers_on_Code_based_Imp...
The video shows how she is changing the implementation and re-running the testers several times. Testers will be a main component in ensuring the quality of Wikifunctions.
The next opportunity to meet us and ask us questions will be at Wikimania. On 14 August, at 17:00 UTC, we will host a 1.5 hour session on Wikifunctions and Abstract Wikipedia. This year, Wikimania will be an entirely virtual event and registration is free. Bring your questions and discussions to Wikimania 2021.
Next week, we are skipping the weekly update.
_______________________________________________ Abstract-Wikipedia mailing list -- abstract-wikipedia@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/abstract-wikipedia.lists.wikimed...
http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient Image removed by sender.
Virus-free. http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient www.avg.com
Hoi, What does the English Wikipedia community have to do with this. Is it not quality what should make the difference? Thanks, Gerard
On Mon, 2 Aug 2021 at 09:58, Peter Southwood peter.southwood@telkomsa.net wrote:
That would possibly depend on where the English description is stored. On English Wikipedia the answer is a simple No, not until you have broad consensus from the English Wikipedia editing community. Similar conditions are likely to exist on other projects. You will have to ask them each separately, on-wiki, at whatever venue they use for such discussions. Cheers, Peter
*From:* Julio974 [mailto:jules.bour.1@gmail.com] *Sent:* 31 July 2021 23:30 *To:* General public mailing list for the discussion of Abstract Wikipedia and Wikifunctions *Subject:* [Abstract-wikipedia] Re: Newsletter #39: Abstract descriptions
Since there is already an English description, we wouldn’t store nor
actually generate the text
If there is already an English description and an abstract description is created, should we delete the English description to replace it with the description generated (and maybe cached) from abstract?
Envoyé de mon iPhone
Le 30 juil. 2021 à 00:14, Denny Vrandečić dvrandecic@wikimedia.org a écrit :
The on-wiki version of this newsletter is available here:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2021-07-29
--
Our goal with Abstract Wikipedia is to enable everyone to write content in any language that can be read in any language. Ultimately, the main form of content we aim for are Wikipedia articles, in order to allow everyone to equitably have and contribute to unbiased, up-to-date, comprehensive encyclopedic knowledge.
In the coming months, we will take major milestones towards that goal. Today, I want to sketch one possible milestone on our way: abstract descriptions for Wikidata.
Every Item https://www.wikidata.org/wiki/Help:Items in Wikidata has a label https://www.wikidata.org/wiki/Help:Label, a short description https://www.wikidata.org/wiki/Help:Description, and aliases https://www.wikidata.org/wiki/Help:Aliases in each language. Let’s say you take a look at Item Q836805 https://www.wikidata.org/wiki/Q836805. In English, that Item has the label *“Chalmers University of Technology”* and the description *“university in Gothenburg, Sweden”*. In Swedish it is *“Chalmers tekniska högskola”* and *“universitet i Göteborg, Sverige”*. The goal of the label is to be a common name for the Item, and together with the description it should uniquely identify the Item in the world. That’s why, although multiple Items can have the same label, as things in the world can be called the same but be different, no two Items should have both the same label and the same description in a given language. The aliases are used to help with improving the search experience.
The meaning of the descriptions across languages is often the same, and when it is not, although sometimes intentional, it usually differs by accident. Given there are more than 94 million Items in Wikidata, and Wikidata supports more than 430 languages, that would mean that if we had perfect coverage, we would have more than 40 billion labels and as many descriptions. And not only would the creation of all these labels and descriptions be a huge amount of work, they would also need to be maintained. If there are not enough contributors checking on the quality of these, it would be unfortunately easy to sneak in vandalism.
The Wikidata community has known about this issue for a long time, and made great efforts to correct it. Tools such as AutoDesc https://autodesc.toolforge.org/ by Magnus Manske https://meta.wikimedia.org/wiki/User:Magnus_Manske and bots such as Edoderoobot https://www.wikidata.org/wiki/User:Edoderoobot, Mr.Ibrahembot https://www.wikidata.org/wiki/User:Mr.Ibrahembot, MatSuBot https://www.wikidata.org/wiki/User:MatSuBot (these were selected by clicking “Random Item” and looking at the history) and many others have worked on increasing the coverage. And it shows: these bots often target descriptions, and so, even though only six languages have *labels* for more than 10% of Wikidata Items, a whopping 64 languages have a coverage over 10% for *descriptions*! Today, we have well over two billion descriptions in Wikidata.
These bots create descriptions, usually based on the existing statements of the Item. And that is great. But there is no easy way to fix an error across languages, nor is there an easy way to ensure that no vandalism has snuck in. Also, bots give an oversized responsibility to a comparably small group of bot operators. Our goal is to democratize that responsibility again and allow more people to contribute.
Descriptions in Wikidata are usually noun phrases, which are something that we will need to be able to do for Abstract Wikipedia anyway. We want to start thinking about how to implement this feature, and then derive from there what will need to happen in Wikifunctions and in Wikidata. This work will need to happen in close coöperation with the Wikidata team, and the communities of both Wikidata and Wikifunctions. It will represent a way to ramp-up our capabilities towards the wider vision of Abstract Wikipedia. Timewise, we hope to achieve that in 2022.
We don’t know yet how exactly this will work. Here are a few thoughts, but really I invite you so that we all work together on the design for abstract descriptions:
· It must be possible to overwrite a description for a given language
· It must be possible to retract a local overwrite for a given language
· The pair of label and description still must remain unique
· It would be great if implementing this would not be a large effort
· The goal is not to create automatic descriptions https://www.wikidata.org/wiki/Wikidata:Automating_descriptions, but abstract descriptions
The last point is subtle: an automatic description is a description generated automatically from the given statements of an Item. That’s a valuable and very difficult task. The above mentioned AutoDesc for example, starts the English description for Douglas Adams https://autodesc.toolforge.org/?q=Q42&lang=en&mode=short&links=text&redlinks=&format=html&get_infobox=yes&infobox_template= as follows: *“British playwright, screenwriter, novelist, children's writer, science fiction writer, comedian, and writer (1952–2001) ♂; member of Footlights and Groucho Club; child of Christopher Douglas Adams and Janet Adams; spouse of Jane Belson”*. The Item https://www.wikidata.org/wiki/Q42's current manual English description is the much more succinct *“English writer and humorist”*. There can be many subtle decisions and editorial judgements to be made in order to create the description for a given Item, and I think we should be working on this — but later.
Instead, we want to support abstract descriptions: a description, manually created, but instead of being written in a specific natural language, it is encoded in the abstract notation of Wikifunctions and then we use the renderers to generate the natural languages text. This allows the community to retain direct control over the content of a description.
Here are a few ideas to kick off the conversation:
· We introduce a new language code, qqz. That code is in the range reserved for local use, and is similar to the other dummy language codes https://www.mediawiki.org/wiki/Manual:$wgDummyLanguageCodes in MediaWiki, qqq and qqx. Wikidata is to support the qqz language code for descriptions.
· The content of the qqz description is an abstract content. Technically we could store it in some string notation such as “Z12367( Q3918 https://www.wikidata.org/wiki/Q3918, Q25287 https://www.wikidata.org/wiki/Q25287, Q34 https://www.wikidata.org/wiki/Q34)”. Or we could store the JSON ZObject.
· The abstract description would be edited using the same Vue components we develop for Wikifunctions for editing abstract content.
· The abstract description is a fallback for languages without a description. It can be overwritten by providing a description in that language.
· Every time the renderer function or the underlying lexicographic data changes, we also need to retrigger the relevant generations.
· One question is whether we should store the generated description in the Item, and if so, how to change the data model in order to mark the description as generated from the abstract description.
· We also need to figure out how to report changes to everyone who is interested in tracking them. If we store the generated description as proposed above, we can piggyback on the current system.
All of these are just ideas for discussion. Some of the major questions are whether to store all the generated descriptions in the Item or not, how to represent that in the edit history of the Item, how to design the caching and retriggering of the generated descriptions, etc.
What would that look like?
Let’s take a look at an oversimplified example. The description for Chalmers is *“university in Gothenburg, Sweden”*. That seems like a reasonably simple case that could easily be templated into abstract content say of the form “Z12367(Q3918 https://www.wikidata.org/wiki/Q3918, Q25287 https://www.wikidata.org/wiki/Q25287, Q34 https://www.wikidata.org/wiki/Q34)”, where Z12367 (that ZID is made-up) represents the abstract content saying in English *“(institution) in (city), (country)”*, Q3918 https://www.wikidata.org/wiki/Q3918 the QID for university, Q25287 https://www.wikidata.org/wiki/Q25287 the QID for Gothenburg, and Q34 https://www.wikidata.org/wiki/Q34 the QID for Sweden. (In reality, this template is actually nowhere near as simple as it looks like - we will discuss this more in an upcoming weekly newsletter. For now, let’s assume this to be so simple.)
Renderers would then take this abstract content and for each language generate the description, in this case *“university in Gothenburg, Sweden”* for English, or *“sveučilište u Göteborgu u Švedskoj”* in Croatian. Since there is already an English description, we wouldn’t store nor actually generate the text, but in Croatian we would generate it, store it, and mark it as a generated description.
We think of this as a good milestone on our path to Abstract Wikipedia, with a directly useful outcome. What are your thoughts? Join us in discussing this idea on the following talk page: https://meta.wikimedia.org/wiki/Talk:Abstract_Wikipedia/Updates/2021-07-29
In other news, Lindsay has created a video of a new feature: how Testers and Implementations work together to show whether the tests pass. The video is availabe here: https://commons.wikimedia.org/wiki/File:Wikilambda_Testers_on_Code_based_Imp...
The video shows how she is changing the implementation and re-running the testers several times. Testers will be a main component in ensuring the quality of Wikifunctions.
The next opportunity to meet us and ask us questions will be at Wikimania. On 14 August, at 17:00 UTC, we will host a 1.5 hour session on Wikifunctions and Abstract Wikipedia. This year, Wikimania will be an entirely virtual event and registration is free. Bring your questions and discussions to Wikimania 2021.
Next week, we are skipping the weekly update.
Abstract-Wikipedia mailing list -- abstract-wikipedia@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/abstract-wikipedia.lists.wikimed...
[image: Image removed by sender.] http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient
Virus-free. www.avg.com http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient
Abstract-Wikipedia mailing list -- abstract-wikipedia@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/abstract-wikipedia.lists.wikimed...
Did you not understand my comment? In any case it has been adequately addressed by someone who did.
Cheers,
Peyer
From: Gerard Meijssen [mailto:gerard.meijssen@gmail.com] Sent: 02 August 2021 10:05 To: General public mailing list for the discussion of Abstract Wikipedia and Wikifunctions Subject: [Abstract-wikipedia] Re: Newsletter #39: Abstract descriptions
Hoi,
What does the English Wikipedia community have to do with this. Is it not quality what should make the difference?
Thanks,
Gerard
On Mon, 2 Aug 2021 at 09:58, Peter Southwood peter.southwood@telkomsa.net wrote:
That would possibly depend on where the English description is stored. On English Wikipedia the answer is a simple No, not until you have broad consensus from the English Wikipedia editing community. Similar conditions are likely to exist on other projects. You will have to ask them each separately, on-wiki, at whatever venue they use for such discussions. Cheers, Peter
From: Julio974 [mailto:jules.bour.1@gmail.com] Sent: 31 July 2021 23:30 To: General public mailing list for the discussion of Abstract Wikipedia and Wikifunctions Subject: [Abstract-wikipedia] Re: Newsletter #39: Abstract descriptions
Since there is already an English description, we wouldn’t store nor actually generate the text
If there is already an English description and an abstract description is created, should we delete the English description to replace it with the description generated (and maybe cached) from abstract?
Envoyé de mon iPhone
Le 30 juil. 2021 à 00:14, Denny Vrandečić dvrandecic@wikimedia.org a écrit :
The on-wiki version of this newsletter is available here:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2021-07-29
--
Our goal with Abstract Wikipedia is to enable everyone to write content in any language that can be read in any language. Ultimately, the main form of content we aim for are Wikipedia articles, in order to allow everyone to equitably have and contribute to unbiased, up-to-date, comprehensive encyclopedic knowledge.
In the coming months, we will take major milestones towards that goal. Today, I want to sketch one possible milestone on our way: abstract descriptions for Wikidata.
Every https://www.wikidata.org/wiki/Help:Items Item in Wikidata has a https://www.wikidata.org/wiki/Help:Label label, a short https://www.wikidata.org/wiki/Help:Description description, and https://www.wikidata.org/wiki/Help:Aliases aliases in each language. Let’s say you take a look at Item https://www.wikidata.org/wiki/Q836805 Q836805. In English, that Item has the label “Chalmers University of Technology” and the description “university in Gothenburg, Sweden”. In Swedish it is “Chalmers tekniska högskola” and “universitet i Göteborg, Sverige”. The goal of the label is to be a common name for the Item, and together with the description it should uniquely identify the Item in the world. That’s why, although multiple Items can have the same label, as things in the world can be called the same but be different, no two Items should have both the same label and the same description in a given language. The aliases are used to help with improving the search experience.
The meaning of the descriptions across languages is often the same, and when it is not, although sometimes intentional, it usually differs by accident. Given there are more than 94 million Items in Wikidata, and Wikidata supports more than 430 languages, that would mean that if we had perfect coverage, we would have more than 40 billion labels and as many descriptions. And not only would the creation of all these labels and descriptions be a huge amount of work, they would also need to be maintained. If there are not enough contributors checking on the quality of these, it would be unfortunately easy to sneak in vandalism.
The Wikidata community has known about this issue for a long time, and made great efforts to correct it. Tools such as https://autodesc.toolforge.org/ AutoDesc by https://meta.wikimedia.org/wiki/User:Magnus_Manske Magnus Manske and bots such as https://www.wikidata.org/wiki/User:Edoderoobot Edoderoobot, https://www.wikidata.org/wiki/User:Mr.Ibrahembot Mr.Ibrahembot, https://www.wikidata.org/wiki/User:MatSuBot MatSuBot (these were selected by clicking “Random Item” and looking at the history) and many others have worked on increasing the coverage. And it shows: these bots often target descriptions, and so, even though only six languages have labels for more than 10% of Wikidata Items, a whopping 64 languages have a coverage over 10% for descriptions! Today, we have well over two billion descriptions in Wikidata.
These bots create descriptions, usually based on the existing statements of the Item. And that is great. But there is no easy way to fix an error across languages, nor is there an easy way to ensure that no vandalism has snuck in. Also, bots give an oversized responsibility to a comparably small group of bot operators. Our goal is to democratize that responsibility again and allow more people to contribute.
Descriptions in Wikidata are usually noun phrases, which are something that we will need to be able to do for Abstract Wikipedia anyway. We want to start thinking about how to implement this feature, and then derive from there what will need to happen in Wikifunctions and in Wikidata. This work will need to happen in close coöperation with the Wikidata team, and the communities of both Wikidata and Wikifunctions. It will represent a way to ramp-up our capabilities towards the wider vision of Abstract Wikipedia. Timewise, we hope to achieve that in 2022.
We don’t know yet how exactly this will work. Here are a few thoughts, but really I invite you so that we all work together on the design for abstract descriptions:
· It must be possible to overwrite a description for a given language
· It must be possible to retract a local overwrite for a given language
· The pair of label and description still must remain unique
· It would be great if implementing this would not be a large effort
· The goal is not to create https://www.wikidata.org/wiki/Wikidata:Automating_descriptions automatic descriptions, but abstract descriptions
The last point is subtle: an automatic description is a description generated automatically from the given statements of an Item. That’s a valuable and very difficult task. The above mentioned AutoDesc for example, starts the English https://autodesc.toolforge.org/?q=Q42&lang=en&mode=short&links=text&redlinks=&format=html&get_infobox=yes&infobox_template= description for Douglas Adams as follows: “British playwright, screenwriter, novelist, children's writer, science fiction writer, comedian, and writer (1952–2001) ♂; member of Footlights and Groucho Club; child of Christopher Douglas Adams and Janet Adams; spouse of Jane Belson”. The https://www.wikidata.org/wiki/Q42 Item's current manual English description is the much more succinct “English writer and humorist”. There can be many subtle decisions and editorial judgements to be made in order to create the description for a given Item, and I think we should be working on this — but later.
Instead, we want to support abstract descriptions: a description, manually created, but instead of being written in a specific natural language, it is encoded in the abstract notation of Wikifunctions and then we use the renderers to generate the natural languages text. This allows the community to retain direct control over the content of a description.
Here are a few ideas to kick off the conversation:
· We introduce a new language code, qqz. That code is in the range reserved for local use, and is similar to the other https://www.mediawiki.org/wiki/Manual:$wgDummyLanguageCodes dummy language codes in MediaWiki, qqq and qqx. Wikidata is to support the qqz language code for descriptions.
· The content of the qqz description is an abstract content. Technically we could store it in some string notation such as “Z12367( https://www.wikidata.org/wiki/Q3918 Q3918, https://www.wikidata.org/wiki/Q25287 Q25287, https://www.wikidata.org/wiki/Q34 Q34)”. Or we could store the JSON ZObject.
· The abstract description would be edited using the same Vue components we develop for Wikifunctions for editing abstract content.
· The abstract description is a fallback for languages without a description. It can be overwritten by providing a description in that language.
· Every time the renderer function or the underlying lexicographic data changes, we also need to retrigger the relevant generations.
· One question is whether we should store the generated description in the Item, and if so, how to change the data model in order to mark the description as generated from the abstract description.
· We also need to figure out how to report changes to everyone who is interested in tracking them. If we store the generated description as proposed above, we can piggyback on the current system.
All of these are just ideas for discussion. Some of the major questions are whether to store all the generated descriptions in the Item or not, how to represent that in the edit history of the Item, how to design the caching and retriggering of the generated descriptions, etc.
What would that look like?
Let’s take a look at an oversimplified example. The description for Chalmers is “university in Gothenburg, Sweden”. That seems like a reasonably simple case that could easily be templated into abstract content say of the form “Z12367( https://www.wikidata.org/wiki/Q3918 Q3918, https://www.wikidata.org/wiki/Q25287 Q25287, https://www.wikidata.org/wiki/Q34 Q34)”, where Z12367 (that ZID is made-up) represents the abstract content saying in English “(institution) in (city), (country)”, https://www.wikidata.org/wiki/Q3918 Q3918 the QID for university, https://www.wikidata.org/wiki/Q25287 Q25287 the QID for Gothenburg, and https://www.wikidata.org/wiki/Q34 Q34 the QID for Sweden. (In reality, this template is actually nowhere near as simple as it looks like - we will discuss this more in an upcoming weekly newsletter. For now, let’s assume this to be so simple.)
Renderers would then take this abstract content and for each language generate the description, in this case “university in Gothenburg, Sweden” for English, or “sveučilište u Göteborgu u Švedskoj” in Croatian. Since there is already an English description, we wouldn’t store nor actually generate the text, but in Croatian we would generate it, store it, and mark it as a generated description.
We think of this as a good milestone on our path to Abstract Wikipedia, with a directly useful outcome. What are your thoughts? Join us in discussing this idea on the following talk page: https://meta.wikimedia.org/wiki/Talk:Abstract_Wikipedia/Updates/2021-07-29
_____
In other news, Lindsay has created a video of a new feature: how Testers and Implementations work together to show whether the tests pass. The video is availabe here: https://commons.wikimedia.org/wiki/File:Wikilambda_Testers_on_Code_based_Implementation_-_Early_prototype.webm https://commons.wikimedia.org/wiki/File:Wikilambda_Testers_on_Code_based_Imp...
The video shows how she is changing the implementation and re-running the testers several times. Testers will be a main component in ensuring the quality of Wikifunctions.
The next opportunity to meet us and ask us questions will be at Wikimania. On 14 August, at 17:00 UTC, we will host a 1.5 hour session on Wikifunctions and Abstract Wikipedia. This year, Wikimania will be an entirely virtual event and registration is free. Bring your questions and discussions to Wikimania 2021.
Next week, we are skipping the weekly update.
_______________________________________________ Abstract-Wikipedia mailing list -- abstract-wikipedia@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/abstract-wikipedia.lists.wikimed...
http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient Image removed by sender.
Virus-free. http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient www.avg.com
_______________________________________________ Abstract-Wikipedia mailing list -- abstract-wikipedia@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/abstract-wikipedia.lists.wikimed...
Hoi, I do understand perfectly what has been said. However, the quality of Wikidata descriptions has been deficient when you compare it with the existing automated descriptions.
As I raised the issue about quality before, I know it has never been an issue for the community. It has not scaled and it is by and large uninformative for the bulk of the existing descriptions.
I am sure you have been aware of this and no, it has not been addressed. Thanks, GerardM
On Wed, 4 Aug 2021 at 11:16, Peter Southwood peter.southwood@telkomsa.net wrote:
Did you not understand my comment? In any case it has been adequately addressed by someone who did.
Cheers,
Peyer
*From:* Gerard Meijssen [mailto:gerard.meijssen@gmail.com] *Sent:* 02 August 2021 10:05 *To:* General public mailing list for the discussion of Abstract Wikipedia and Wikifunctions *Subject:* [Abstract-wikipedia] Re: Newsletter #39: Abstract descriptions
Hoi,
What does the English Wikipedia community have to do with this. Is it not quality what should make the difference?
Thanks,
Gerard
On Mon, 2 Aug 2021 at 09:58, Peter Southwood peter.southwood@telkomsa.net wrote:
That would possibly depend on where the English description is stored. On English Wikipedia the answer is a simple No, not until you have broad consensus from the English Wikipedia editing community. Similar conditions are likely to exist on other projects. You will have to ask them each separately, on-wiki, at whatever venue they use for such discussions. Cheers, Peter
*From:* Julio974 [mailto:jules.bour.1@gmail.com] *Sent:* 31 July 2021 23:30 *To:* General public mailing list for the discussion of Abstract Wikipedia and Wikifunctions *Subject:* [Abstract-wikipedia] Re: Newsletter #39: Abstract descriptions
Since there is already an English description, we wouldn’t store nor
actually generate the text
If there is already an English description and an abstract description is created, should we delete the English description to replace it with the description generated (and maybe cached) from abstract?
Envoyé de mon iPhone
Le 30 juil. 2021 à 00:14, Denny Vrandečić dvrandecic@wikimedia.org a écrit :
The on-wiki version of this newsletter is available here:
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2021-07-29
--
Our goal with Abstract Wikipedia is to enable everyone to write content in any language that can be read in any language. Ultimately, the main form of content we aim for are Wikipedia articles, in order to allow everyone to equitably have and contribute to unbiased, up-to-date, comprehensive encyclopedic knowledge.
In the coming months, we will take major milestones towards that goal. Today, I want to sketch one possible milestone on our way: abstract descriptions for Wikidata.
Every Item https://www.wikidata.org/wiki/Help:Items in Wikidata has a label https://www.wikidata.org/wiki/Help:Label, a short description https://www.wikidata.org/wiki/Help:Description, and aliases https://www.wikidata.org/wiki/Help:Aliases in each language. Let’s say you take a look at Item Q836805 https://www.wikidata.org/wiki/Q836805. In English, that Item has the label *“Chalmers University of Technology”* and the description *“university in Gothenburg, Sweden”*. In Swedish it is *“Chalmers tekniska högskola”* and *“universitet i Göteborg, Sverige”*. The goal of the label is to be a common name for the Item, and together with the description it should uniquely identify the Item in the world. That’s why, although multiple Items can have the same label, as things in the world can be called the same but be different, no two Items should have both the same label and the same description in a given language. The aliases are used to help with improving the search experience.
The meaning of the descriptions across languages is often the same, and when it is not, although sometimes intentional, it usually differs by accident. Given there are more than 94 million Items in Wikidata, and Wikidata supports more than 430 languages, that would mean that if we had perfect coverage, we would have more than 40 billion labels and as many descriptions. And not only would the creation of all these labels and descriptions be a huge amount of work, they would also need to be maintained. If there are not enough contributors checking on the quality of these, it would be unfortunately easy to sneak in vandalism.
The Wikidata community has known about this issue for a long time, and made great efforts to correct it. Tools such as AutoDesc https://autodesc.toolforge.org/ by Magnus Manske https://meta.wikimedia.org/wiki/User:Magnus_Manske and bots such as Edoderoobot https://www.wikidata.org/wiki/User:Edoderoobot, Mr.Ibrahembot https://www.wikidata.org/wiki/User:Mr.Ibrahembot, MatSuBot https://www.wikidata.org/wiki/User:MatSuBot (these were selected by clicking “Random Item” and looking at the history) and many others have worked on increasing the coverage. And it shows: these bots often target descriptions, and so, even though only six languages have *labels* for more than 10% of Wikidata Items, a whopping 64 languages have a coverage over 10% for *descriptions*! Today, we have well over two billion descriptions in Wikidata.
These bots create descriptions, usually based on the existing statements of the Item. And that is great. But there is no easy way to fix an error across languages, nor is there an easy way to ensure that no vandalism has snuck in. Also, bots give an oversized responsibility to a comparably small group of bot operators. Our goal is to democratize that responsibility again and allow more people to contribute.
Descriptions in Wikidata are usually noun phrases, which are something that we will need to be able to do for Abstract Wikipedia anyway. We want to start thinking about how to implement this feature, and then derive from there what will need to happen in Wikifunctions and in Wikidata. This work will need to happen in close coöperation with the Wikidata team, and the communities of both Wikidata and Wikifunctions. It will represent a way to ramp-up our capabilities towards the wider vision of Abstract Wikipedia. Timewise, we hope to achieve that in 2022.
We don’t know yet how exactly this will work. Here are a few thoughts, but really I invite you so that we all work together on the design for abstract descriptions:
· It must be possible to overwrite a description for a given language
· It must be possible to retract a local overwrite for a given language
· The pair of label and description still must remain unique
· It would be great if implementing this would not be a large effort
· The goal is not to create automatic descriptions https://www.wikidata.org/wiki/Wikidata:Automating_descriptions, but abstract descriptions
The last point is subtle: an automatic description is a description generated automatically from the given statements of an Item. That’s a valuable and very difficult task. The above mentioned AutoDesc for example, starts the English description for Douglas Adams https://autodesc.toolforge.org/?q=Q42&lang=en&mode=short&links=text&redlinks=&format=html&get_infobox=yes&infobox_template= as follows: *“British playwright, screenwriter, novelist, children's writer, science fiction writer, comedian, and writer (1952–2001) ♂; member of Footlights and Groucho Club; child of Christopher Douglas Adams and Janet Adams; spouse of Jane Belson”*. The Item https://www.wikidata.org/wiki/Q42's current manual English description is the much more succinct *“English writer and humorist”*. There can be many subtle decisions and editorial judgements to be made in order to create the description for a given Item, and I think we should be working on this — but later.
Instead, we want to support abstract descriptions: a description, manually created, but instead of being written in a specific natural language, it is encoded in the abstract notation of Wikifunctions and then we use the renderers to generate the natural languages text. This allows the community to retain direct control over the content of a description.
Here are a few ideas to kick off the conversation:
· We introduce a new language code, qqz. That code is in the range reserved for local use, and is similar to the other dummy language codes https://www.mediawiki.org/wiki/Manual:$wgDummyLanguageCodes in MediaWiki, qqq and qqx. Wikidata is to support the qqz language code for descriptions.
· The content of the qqz description is an abstract content. Technically we could store it in some string notation such as “Z12367( Q3918 https://www.wikidata.org/wiki/Q3918, Q25287 https://www.wikidata.org/wiki/Q25287, Q34 https://www.wikidata.org/wiki/Q34)”. Or we could store the JSON ZObject.
· The abstract description would be edited using the same Vue components we develop for Wikifunctions for editing abstract content.
· The abstract description is a fallback for languages without a description. It can be overwritten by providing a description in that language.
· Every time the renderer function or the underlying lexicographic data changes, we also need to retrigger the relevant generations.
· One question is whether we should store the generated description in the Item, and if so, how to change the data model in order to mark the description as generated from the abstract description.
· We also need to figure out how to report changes to everyone who is interested in tracking them. If we store the generated description as proposed above, we can piggyback on the current system.
All of these are just ideas for discussion. Some of the major questions are whether to store all the generated descriptions in the Item or not, how to represent that in the edit history of the Item, how to design the caching and retriggering of the generated descriptions, etc.
What would that look like?
Let’s take a look at an oversimplified example. The description for Chalmers is *“university in Gothenburg, Sweden”*. That seems like a reasonably simple case that could easily be templated into abstract content say of the form “Z12367(Q3918 https://www.wikidata.org/wiki/Q3918, Q25287 https://www.wikidata.org/wiki/Q25287, Q34 https://www.wikidata.org/wiki/Q34)”, where Z12367 (that ZID is made-up) represents the abstract content saying in English *“(institution) in (city), (country)”*, Q3918 https://www.wikidata.org/wiki/Q3918 the QID for university, Q25287 https://www.wikidata.org/wiki/Q25287 the QID for Gothenburg, and Q34 https://www.wikidata.org/wiki/Q34 the QID for Sweden. (In reality, this template is actually nowhere near as simple as it looks like - we will discuss this more in an upcoming weekly newsletter. For now, let’s assume this to be so simple.)
Renderers would then take this abstract content and for each language generate the description, in this case *“university in Gothenburg, Sweden”* for English, or *“sveučilište u Göteborgu u Švedskoj”* in Croatian. Since there is already an English description, we wouldn’t store nor actually generate the text, but in Croatian we would generate it, store it, and mark it as a generated description.
We think of this as a good milestone on our path to Abstract Wikipedia, with a directly useful outcome. What are your thoughts? Join us in discussing this idea on the following talk page: https://meta.wikimedia.org/wiki/Talk:Abstract_Wikipedia/Updates/2021-07-29
In other news, Lindsay has created a video of a new feature: how Testers and Implementations work together to show whether the tests pass. The video is availabe here: https://commons.wikimedia.org/wiki/File:Wikilambda_Testers_on_Code_based_Imp...
The video shows how she is changing the implementation and re-running the testers several times. Testers will be a main component in ensuring the quality of Wikifunctions.
The next opportunity to meet us and ask us questions will be at Wikimania. On 14 August, at 17:00 UTC, we will host a 1.5 hour session on Wikifunctions and Abstract Wikipedia. This year, Wikimania will be an entirely virtual event and registration is free. Bring your questions and discussions to Wikimania 2021.
Next week, we are skipping the weekly update.
Abstract-Wikipedia mailing list -- abstract-wikipedia@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/abstract-wikipedia.lists.wikimed...
[image: Image removed by sender.] http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient
Virus-free. www.avg.com http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient
Abstract-Wikipedia mailing list -- abstract-wikipedia@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/abstract-wikipedia.lists.wikimed...
Abstract-Wikipedia mailing list -- abstract-wikipedia@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/abstract-wikipedia.lists.wikimed...
On 04 August 2021 at 11:24 Gerard Meijssen gerard.meijssen@gmail.com wrote:
I do understand perfectly what has been said. However, the quality of Wikidata descriptions has been deficient when you compare it with the existing automated descriptions.
I guess there are at least two different views about Wikidata descriptions, and this debate can't really happen without some clarification.
What you can call the "historic view" dates from the time when Wikidata was defined by the interwiki graph. It is simple, and says that descriptions are just there to disambiguate, so that for example "disease" could be enough. When the item always has to have a sitelink, this could often be enough.
As we know, the scope of Wikidata was expanded - massively it turns out - by requiring only an external link. I don't know whether this step, around 2015?, was accompanied by a discussion of the new function of descriptions.
I think it is at least clear that descriptions then became more important.
Charles
Hoi, Sorry Charles, you lost me. When disambiguation is the name of the game, you probably only consider English whereas Wikidata items should have descriptions in all its languages. The notion of sitelinks was of interest only in the first few months because the wikidatification of sitelinks replaced a process full of issues to something that was not as complicated and more robust.
With "automated descriptions" we have functionality that changes the descriptions as items evolve. It is functionality that works in multiple languages and it has served us better than the existing descriptions. When "abstract descriptions" are fixed and serve us in supported languages it is a big advantage over English only. When the "abstract descriptions" are fixed and do not reflect the evolving content inherent in an item, "abstract descriptions" will not fully replace "automated descriptions".. When there is an update function things will be peachy. Thanks, Gerard
On Wed, 4 Aug 2021 at 13:38, Charles Matthews via Abstract-Wikipedia < abstract-wikipedia@lists.wikimedia.org> wrote:
On 04 August 2021 at 11:24 Gerard Meijssen gerard.meijssen@gmail.com wrote:
I do understand perfectly what has been said. However, the quality of Wikidata descriptions has been deficient when you compare it with the existing automated descriptions.
I guess there are at least two different views about Wikidata descriptions, and this debate can't really happen without some clarification.
What you can call the "historic view" dates from the time when Wikidata was defined by the interwiki graph. It is simple, and says that descriptions are just there to disambiguate, so that for example "disease" could be enough. When the item always has to have a sitelink, this could often be enough.
As we know, the scope of Wikidata was expanded - massively it turns out - by requiring only an external link. I don't know whether this step, around 2015?, was accompanied by a discussion of the new function of descriptions.
I think it is at least clear that descriptions then became more important.
Charles _______________________________________________ Abstract-Wikipedia mailing list -- abstract-wikipedia@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/abstract-wikipedia.lists.wikimed...
On Sat, Jul 31, 2021 at 2:30 PM Julio974 jules.bour.1@gmail.com wrote:
Since there is already an English description, we wouldn’t store nor
actually generate the text
If there is already an English description and an abstract description is created, should we delete the English description to replace it with the description generated (and maybe cached) from abstract?
@Julio974: This is ultimately up to the Wikidata community to decide, as it is about the content stored on that wiki. I don't know if we'd ever want to *automatically* replace an existing manual description (probably not?), but I imagine it would be good to create a workflow for local language experts to check whether the new Abstract Description was an improvement on any existing manual or bot-generated Description. As the original email said, one of the proposed goals is:
"It must be possible to overwrite a[n abstract] description for a given
language" Human control is key.
@Peter: This idea is just about the Descriptions stored in the Wikidata project itself. What happens beyond that, is beyond the scope of this.
Agree with Nick. This could be simply a flexible presentation problem to solve. Automatic descriptions are generated and stored somewhere, and an opt-in display of the auto-description is available for those users that are interested in visualizing them or helping out with them.
Thad https://www.linkedin.com/in/thadguidry/ https://calendly.com/thadguidry/
On Mon, Aug 2, 2021 at 1:55 PM Nick Wilson (Quiddity) nwilson@wikimedia.org wrote:
On Sat, Jul 31, 2021 at 2:30 PM Julio974 jules.bour.1@gmail.com wrote:
Since there is already an English description, we wouldn’t store nor
actually generate the text
If there is already an English description and an abstract description is created, should we delete the English description to replace it with the description generated (and maybe cached) from abstract?
@Julio974: This is ultimately up to the Wikidata community to decide, as it is about the content stored on that wiki. I don't know if we'd ever want to *automatically* replace an existing manual description (probably not?), but I imagine it would be good to create a workflow for local language experts to check whether the new Abstract Description was an improvement on any existing manual or bot-generated Description. As the original email said, one of the proposed goals is:
"It must be possible to overwrite a[n abstract] description for a given
language" Human control is key.
@Peter: This idea is just about the Descriptions stored in the Wikidata project itself. What happens beyond that, is beyond the scope of this.
Abstract-Wikipedia mailing list -- abstract-wikipedia@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/abstract-wikipedia.lists.wikimed...
Hoi, On Twitter I applauded the notion of Abstract descriptions replacing Automated descriptions. Indeed we have suffered inadequate descriptions long enough. Inadequate because descriptions were concocted to mitigate restrictions in the software (uniquencess and consequently "researcher, ORCID 0000.0000.0000.0000" used as an uninformative description on thousands of people. It is why the automated descriptions have been so valuable; given additional statements, qualifiers the descriptions reflect what is known.
My hope for abstract descriptions is that it will replace automated descriptions and update its text when changes are made to the item. Having a text string representing the Abstract representation is fine as it reduces the time to prepare the string that is to be presented in any language. It is however crucial for an abstract description that is represents the data as available.
One key reason for having automated descriptions is that it facilitates editors and it does not require knowledge about the construction of abstract wikipedia texts. As a result the text will *always* reflect the current knowledge about each item. Thanks, GerardM
On Fri, 30 Jul 2021 at 00:14, Denny Vrandečić dvrandecic@wikimedia.org wrote:
The on-wiki version of this newsletter is available here: https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Updates/2021-07-29
--
Our goal with Abstract Wikipedia is to enable everyone to write content in any language that can be read in any language. Ultimately, the main form of content we aim for are Wikipedia articles, in order to allow everyone to equitably have and contribute to unbiased, up-to-date, comprehensive encyclopedic knowledge.
In the coming months, we will take major milestones towards that goal. Today, I want to sketch one possible milestone on our way: abstract descriptions for Wikidata.
Every Item https://www.wikidata.org/wiki/Help:Items in Wikidata has a label https://www.wikidata.org/wiki/Help:Label, a short description https://www.wikidata.org/wiki/Help:Description, and aliases https://www.wikidata.org/wiki/Help:Aliases in each language. Let’s say you take a look at Item Q836805 https://www.wikidata.org/wiki/Q836805. In English, that Item has the label *“Chalmers University of Technology”* and the description *“university in Gothenburg, Sweden”*. In Swedish it is *“Chalmers tekniska högskola”* and *“universitet i Göteborg, Sverige”*. The goal of the label is to be a common name for the Item, and together with the description it should uniquely identify the Item in the world. That’s why, although multiple Items can have the same label, as things in the world can be called the same but be different, no two Items should have both the same label and the same description in a given language. The aliases are used to help with improving the search experience.
The meaning of the descriptions across languages is often the same, and when it is not, although sometimes intentional, it usually differs by accident. Given there are more than 94 million Items in Wikidata, and Wikidata supports more than 430 languages, that would mean that if we had perfect coverage, we would have more than 40 billion labels and as many descriptions. And not only would the creation of all these labels and descriptions be a huge amount of work, they would also need to be maintained. If there are not enough contributors checking on the quality of these, it would be unfortunately easy to sneak in vandalism.
The Wikidata community has known about this issue for a long time, and made great efforts to correct it. Tools such as AutoDesc https://autodesc.toolforge.org/ by Magnus Manske https://meta.wikimedia.org/wiki/User:Magnus_Manske and bots such as Edoderoobot https://www.wikidata.org/wiki/User:Edoderoobot, Mr.Ibrahembot https://www.wikidata.org/wiki/User:Mr.Ibrahembot, MatSuBot https://www.wikidata.org/wiki/User:MatSuBot (these were selected by clicking “Random Item” and looking at the history) and many others have worked on increasing the coverage. And it shows: these bots often target descriptions, and so, even though only six languages have *labels* for more than 10% of Wikidata Items, a whopping 64 languages have a coverage over 10% for *descriptions*! Today, we have well over two billion descriptions in Wikidata.
These bots create descriptions, usually based on the existing statements of the Item. And that is great. But there is no easy way to fix an error across languages, nor is there an easy way to ensure that no vandalism has snuck in. Also, bots give an oversized responsibility to a comparably small group of bot operators. Our goal is to democratize that responsibility again and allow more people to contribute.
Descriptions in Wikidata are usually noun phrases, which are something that we will need to be able to do for Abstract Wikipedia anyway. We want to start thinking about how to implement this feature, and then derive from there what will need to happen in Wikifunctions and in Wikidata. This work will need to happen in close coöperation with the Wikidata team, and the communities of both Wikidata and Wikifunctions. It will represent a way to ramp-up our capabilities towards the wider vision of Abstract Wikipedia. Timewise, we hope to achieve that in 2022.
We don’t know yet how exactly this will work. Here are a few thoughts, but really I invite you so that we all work together on the design for abstract descriptions:
- It must be possible to overwrite a description for a given language
- It must be possible to retract a local overwrite for a given language
- The pair of label and description still must remain unique
- It would be great if implementing this would not be a large effort
- The goal is not to create automatic descriptions
https://www.wikidata.org/wiki/Wikidata:Automating_descriptions, but abstract descriptions
The last point is subtle: an automatic description is a description generated automatically from the given statements of an Item. That’s a valuable and very difficult task. The above mentioned AutoDesc for example, starts the English description for Douglas Adams https://autodesc.toolforge.org/?q=Q42&lang=en&mode=short&links=text&redlinks=&format=html&get_infobox=yes&infobox_template= as follows: *“British playwright, screenwriter, novelist, children's writer, science fiction writer, comedian, and writer (1952–2001) ♂; member of Footlights and Groucho Club; child of Christopher Douglas Adams and Janet Adams; spouse of Jane Belson”*. The Item https://www.wikidata.org/wiki/Q42's current manual English description is the much more succinct *“English writer and humorist”*. There can be many subtle decisions and editorial judgements to be made in order to create the description for a given Item, and I think we should be working on this — but later.
Instead, we want to support abstract descriptions: a description, manually created, but instead of being written in a specific natural language, it is encoded in the abstract notation of Wikifunctions and then we use the renderers to generate the natural languages text. This allows the community to retain direct control over the content of a description.
Here are a few ideas to kick off the conversation:
- We introduce a new language code, qqz. That code is in the range
reserved for local use, and is similar to the other dummy language codes https://www.mediawiki.org/wiki/Manual:$wgDummyLanguageCodes in MediaWiki, qqq and qqx. Wikidata is to support the qqz language code for descriptions.
- The content of the qqz description is an abstract content.
Technically we could store it in some string notation such as “Z12367( Q3918 https://www.wikidata.org/wiki/Q3918, Q25287 https://www.wikidata.org/wiki/Q25287, Q34 https://www.wikidata.org/wiki/Q34)”. Or we could store the JSON ZObject.
- The abstract description would be edited using the same Vue
components we develop for Wikifunctions for editing abstract content.
- The abstract description is a fallback for languages without a
description. It can be overwritten by providing a description in that language.
- Every time the renderer function or the underlying lexicographic
data changes, we also need to retrigger the relevant generations.
- One question is whether we should store the generated description in
the Item, and if so, how to change the data model in order to mark the description as generated from the abstract description.
- We also need to figure out how to report changes to everyone who is
interested in tracking them. If we store the generated description as proposed above, we can piggyback on the current system.
All of these are just ideas for discussion. Some of the major questions are whether to store all the generated descriptions in the Item or not, how to represent that in the edit history of the Item, how to design the caching and retriggering of the generated descriptions, etc.
What would that look like?
Let’s take a look at an oversimplified example. The description for Chalmers is *“university in Gothenburg, Sweden”*. That seems like a reasonably simple case that could easily be templated into abstract content say of the form “Z12367(Q3918 https://www.wikidata.org/wiki/Q3918, Q25287 https://www.wikidata.org/wiki/Q25287, Q34 https://www.wikidata.org/wiki/Q34)”, where Z12367 (that ZID is made-up) represents the abstract content saying in English *“(institution) in (city), (country)”*, Q3918 https://www.wikidata.org/wiki/Q3918 the QID for university, Q25287 https://www.wikidata.org/wiki/Q25287 the QID for Gothenburg, and Q34 https://www.wikidata.org/wiki/Q34 the QID for Sweden. (In reality, this template is actually nowhere near as simple as it looks like - we will discuss this more in an upcoming weekly newsletter. For now, let’s assume this to be so simple.)
Renderers would then take this abstract content and for each language generate the description, in this case *“university in Gothenburg, Sweden”* for English, or *“sveučilište u Göteborgu u Švedskoj”* in Croatian. Since there is already an English description, we wouldn’t store nor actually generate the text, but in Croatian we would generate it, store it, and mark it as a generated description.
We think of this as a good milestone on our path to Abstract Wikipedia, with a directly useful outcome. What are your thoughts? Join us in discussing this idea on the following talk page: https://meta.wikimedia.org/wiki/Talk:Abstract_Wikipedia/Updates/2021-07-29
In other news, Lindsay has created a video of a new feature: how Testers and Implementations work together to show whether the tests pass. The video is availabe here: https://commons.wikimedia.org/wiki/File:Wikilambda_Testers_on_Code_based_Imp...
The video shows how she is changing the implementation and re-running the testers several times. Testers will be a main component in ensuring the quality of Wikifunctions.
The next opportunity to meet us and ask us questions will be at Wikimania. On 14 August, at 17:00 UTC, we will host a 1.5 hour session on Wikifunctions and Abstract Wikipedia. This year, Wikimania will be an entirely virtual event and registration is free. Bring your questions and discussions to Wikimania 2021.
Next week, we are skipping the weekly update. _______________________________________________ Abstract-Wikipedia mailing list -- abstract-wikipedia@lists.wikimedia.org List information: https://lists.wikimedia.org/postorius/lists/abstract-wikipedia.lists.wikimed...
abstract-wikipedia@lists.wikimedia.org