Wikidata Statement Provenance, Automated Reasoning, and Natural Language Generation

List overview All Threads
Download

newer

older

Related work: The Function hub

Development Part PP2 Suggestion

Adam Sobieski

14 Jul 2020 14 Jul '20

10:11 a.m.

I would like to indicate, for discussion, some theoretical topics pertaining to natural language generation from Wikidata.

An interesting feature of Wikidata is that it has sourced statements [1]. We can envision these sources and materials, the provenance of statements, propagating from the statements, through potential intermediate representations, to output natural language articles. In automatically-generated articles, these sources and materials could appear – as one might expect – as citations referring to referenced sources and materials which appear in “References” sections.

Also interesting is that, should Wikidata come to support automated reasoning [2], it could be implemented in such a way that the enhanced provenance data (e.g. reasoning supporting, argumentation, proofs) for statements could similarly propagate from the statements, through intermediate representations, to output natural language articles.

That is, automatically-generated articles could provide reasoning supporting, arguments for, and/or proofs of the contents of natural language sentences in a manner similar to how they can provide referenced sources and materials.

Any thoughts on these topics?

Best regards, Adam

[1] https://www.wikidata.org/wiki/Help:Sources [2] https://www.wikidata.org/wiki/Wikidata:WikiProject_Reasoning

Attachments:

attachment.htm (text/html — 3.0 KB)

Show replies by date

Charles Matthews

14 Jul 14 Jul

6:49 p.m.

...

On 14 July 2020 at 03:11 Adam Sobieski adamsobieski@hotmail.com wrote:
I

...

That is, automatically-generated articles could provide reasoning supporting, arguments for, and/or proofs of the contents of natural language sentences in a manner similar to how they can provide referenced sources and materials.



Any thoughts on these topics?

The idea is in tension with standard Wikipedia policies on original research and synthesis. Admittedly, it would be interesting to see some edge cases that were/were not acceptable under those policies. But on the whole, I think work in the sort of symbolic AI tradition suggested would be better received as research built on top of the drive to create articles, rather than integrated with it.

Charles

Adam Sobieski

15 Jul 15 Jul

1:22 a.m.

Charles,

For an example, we can refer to the Douglas Adams article (https://www.wikidata.org/wiki/Q42). We can see that the statement that Douglas Adams was a science fiction writer is attributed to the Bibliothèque nationale de France.

Let us imagine that that statement was not asserted and sourced to the Bibliothèque nationale de France, but was instead derived from the combination of facts that he authored works which were science fiction. Douglas Adams authored the Hitchhiker's Guide to the Galaxy pentalogy and those works were science fiction. I am not saying that that is a valid rule; it is merely an example for this discussion: authors of science fictions works are science fiction writers.

In the hypothetical, there would be a statement in the knowledgebase with, instead of a reference, a derivation (statements could, however, have both sources and derivations). Perhaps, one day, readers will be able to click on a derivation’s hyperlink on Wikidata to view an automatically-generated page explaining it.

An automatically-generated natural language article for Q42 might contain a sentence “Douglas Adams was a science fiction writer” which would have a numbered citation, but, in the hypothetical, instead of that citation referring to a referenced material from the Bibliothèque nationale de France, it would refer to an automatically-generated document which explained the origin of the statement from automated reasoning upon component statements, those recursively either asserted and sourced or the result of automated reasoning.

It is interesting to consider the propagation of sources [1] and/or derivations [2] from Wikidata knowledgebase statements to citations for referenced materials or derivations in automatically-generated articles.

I hope that automated reasoning would not be in tension with standard Wikipedia policies on original research and synthesis [3]. Perhaps there would be new policies for reviewing each logical rule desired to be entered into the knowledgebase. In my opinion, a privileged user role (e.g. administrator) would be needed for activating and deactivating proposed logical rules used to produce knowledgebase statements.

Best regards, Adam

[1] https://www.wikidata.org/wiki/Help:Sources [2] https://www.wikidata.org/wiki/Wikidata:WikiProject_Reasoning [3] https://en.wikipedia.org/wiki/Wikipedia:No_original_research

From: Charles Matthews via Abstract-Wikipediamailto:abstract-wikipedia@lists.wikimedia.org Sent: Tuesday, July 14, 2020 6:49 AM To: General public mailing list for the discussion of Abstract Wikipedia (aka Wikilambda)mailto:abstract-wikipedia@lists.wikimedia.org Subject: Re: [Abstract-wikipedia] Wikidata Statement Provenance, Automated Reasoning, and Natural Language Generation

On 14 July 2020 at 03:11 Adam Sobieski adamsobieski@hotmail.com wrote:

Any thoughts on these topics?

Charles

Paula Kate Marmor

3:08 a.m.

Perhaps these generated statements can be supported using the “based on heuristic” property, which is well-supported for derived references. We don’t have a “based on heuristic” for “deduced from genre of works published” but it would be easy to create such a value. Today these values are deduced and added to Wikidata by humans; I can see a future state where they are deduced by bots and then they would be perfectly normal to include in AW.

I suppose I am suggesting that this use case should be/could be solved in Wikidata rather than in AW, with the same result in the end.

Paula (PKM)

On Tue, Jul 14, 2020 at 10:22 AM Adam Sobieski adamsobieski@hotmail.com wrote:

...

Charles,

For an example, we can refer to the Douglas Adams article ( https://www.wikidata.org/wiki/Q42). We can see that the statement that Douglas Adams was a science fiction writer is attributed to the Bibliothèque nationale de France.

Let us imagine that that statement was not asserted and sourced to the Bibliothèque nationale de France, but was instead derived from the combination of facts that he authored works which were science fiction. Douglas Adams authored the Hitchhiker's Guide to the Galaxy pentalogy and those works were science fiction. I am not saying that that is a valid rule; it is merely an example for this discussion: authors of science fictions works are science fiction writers.

In the hypothetical, there would be a statement in the knowledgebase with, instead of a reference, a derivation (statements could, however, have both sources and derivations). Perhaps, one day, readers will be able to click on a derivation’s hyperlink on Wikidata to view an automatically-generated page explaining it.

An automatically-generated natural language article for Q42 might contain a sentence “Douglas Adams was a science fiction writer” which would have a numbered citation, but, in the hypothetical, instead of that citation referring to a referenced material from the Bibliothèque nationale de France, it would refer to an automatically-generated document which explained the origin of the statement from automated reasoning upon component statements, those recursively either asserted and sourced or the result of automated reasoning.

It is interesting to consider the propagation of sources [1] and/or derivations [2] from Wikidata knowledgebase statements to citations for referenced materials or derivations in automatically-generated articles.

I hope that automated reasoning would not be in tension with standard Wikipedia policies on original research and synthesis [3]. Perhaps there would be new policies for reviewing each logical rule desired to be entered into the knowledgebase. In my opinion, a privileged user role (e.g. administrator) would be needed for activating and deactivating proposed logical rules used to produce knowledgebase statements.

Best regards,

Adam

[1] https://www.wikidata.org/wiki/Help:Sources

[2] https://www.wikidata.org/wiki/Wikidata:WikiProject_Reasoning

[3] https://en.wikipedia.org/wiki/Wikipedia:No_original_research

*From: *Charles Matthews via Abstract-Wikipedia abstract-wikipedia@lists.wikimedia.org *Sent: *Tuesday, July 14, 2020 6:49 AM *To: *General public mailing list for the discussion of Abstract Wikipedia (aka Wikilambda) abstract-wikipedia@lists.wikimedia.org *Subject: *Re: [Abstract-wikipedia] Wikidata Statement Provenance, Automated Reasoning, and Natural Language Generation

On 14 July 2020 at 03:11 Adam Sobieski adamsobieski@hotmail.com wrote:

I

That is, automatically-generated articles could provide reasoning supporting, arguments for, and/or proofs of the contents of natural language sentences in a manner similar to how they can provide referenced sources and materials.

Any thoughts on these topics?

The idea is in tension with standard Wikipedia policies on original research and synthesis. Admittedly, it would be interesting to see some edge cases that were/were not acceptable under those policies. But on the whole, I think work in the sort of symbolic AI tradition suggested would be better received as research built on top of the drive to create articles, rather than integrated with it.

Charles

Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia

Charles Matthews

3:38 a.m.

...

On 14 July 2020 at 20:08 Paula Kate Marmor pkm@pobox.com wrote:

Perhaps these generated statements can be supported using the “based on heuristic” property, which is well-supported for derived references.  We don’t have a “based on heuristic” for “deduced from genre of works published” but it would be easy to create such a value. Today these values are deduced and added to Wikidata by humans; I can see a future state where they are deduced by bots and then they would be perfectly normal to include in AW.

In the case with which I'm familiar, applied to P921, they typically mean the result of text-mining applied to the title of a scientific paper. So, to give a real example, you may be told that the paper is about "hygiene", when it is about the "hygiene hypothesis" on the causes of asthma.

...

I suppose I am suggesting that this use case should be/could be solved in Wikidata rather than in AW, with the same result in the end.

Wikidata critics, who are still quite numerous, would have quite a lot of fun with that.

Charles

Mike Bennett

3:39 a.m.

What about using the W3C Provenance ontology Prov-O? Or has that already been addressed?

Mike

On 7/14/2020 1:22 PM, Adam Sobieski wrote:

...

Charles,

For an example, we can refer to the Douglas Adams article (https://www.wikidata.org/wiki/Q42). We can see that the statement that Douglas Adams was a science fiction writer is attributed to the Bibliothèque nationale de France.

Let us imagine that that statement was not asserted and sourced to the Bibliothèque nationale de France, but was instead derived from the combination of facts that he authored works which were science fiction. Douglas Adams authored the Hitchhiker's Guide to the Galaxy pentalogy and those works were science fiction. I am not saying that that is a valid rule; it is merely an example for this discussion: authors of science fictions works are science fiction writers.

In the hypothetical, there would be a statement in the knowledgebase with, instead of a reference, a derivation (statements could, however, have both sources and derivations). Perhaps, one day, readers will be able to click on a derivation’s hyperlink on Wikidata to view an automatically-generated page explaining it.

An automatically-generated natural language article for Q42 might contain a sentence “Douglas Adams was a science fiction writer” which would have a numbered citation, but, in the hypothetical, instead of that citation referring to a referenced material from the Bibliothèque nationale de France, it would refer to an automatically-generated document which explained the origin of the statement from automated reasoning upon component statements, those recursively either asserted and sourced or the result of automated reasoning.

It is interesting to consider the propagation of sources [1] and/or derivations [2] from Wikidata knowledgebase statements to citations for referenced materials or derivations in automatically-generated articles.

I hope that automated reasoning would not be in tension with standard Wikipedia policies on original research and synthesis [3]. Perhaps there would be new policies for reviewing each logical rule desired to be entered into the knowledgebase. In my opinion, a privileged user role (e.g. administrator) would be needed for activating and deactivating proposed logical rules used to produce knowledgebase statements.

Best regards,

Adam

[1] https://www.wikidata.org/wiki/Help:Sources

[2] https://www.wikidata.org/wiki/Wikidata:WikiProject_Reasoning https://www.wikidata.org/wiki/Wikidata:WikiProject_Reasoning

[3] https://en.wikipedia.org/wiki/Wikipedia:No_original_research https://en.wikipedia.org/wiki/Wikipedia:No_original_research

*From: *Charles Matthews via Abstract-Wikipedia mailto:abstract-wikipedia@lists.wikimedia.org *Sent: *Tuesday, July 14, 2020 6:49 AM *To: *General public mailing list for the discussion of Abstract Wikipedia (aka Wikilambda) mailto:abstract-wikipedia@lists.wikimedia.org *Subject: *Re: [Abstract-wikipedia] Wikidata Statement Provenance, Automated Reasoning, and Natural Language Generation
On 14 July 2020 at 03:11 Adam Sobieski <adamsobieski@hotmail.com>
wrote:

I

That is, automatically-generated articles could provide reasoning
supporting, arguments for, and/or proofs of the contents of
natural language sentences in a manner similar to how they can
provide referenced sources and materials.

Any thoughts on these topics?
The idea is in tension with standard Wikipedia policies on original research and synthesis. Admittedly, it would be interesting to see some edge cases that were/were not acceptable under those policies. But on the whole, I think work in the sort of symbolic AI tradition suggested would be better received as research built on top of the drive to create articles, rather than integrated with it.

Charles

Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia

-- Mike Bennett Hypercube Limited Level 18, 40 Bank Street (HQ3) Canary Wharf, London, E14 5NR Tel 020 7917 9522 Mob. 07721 420 730 Twitter: @MikeHypercube www.hypercube.co.uk

Adam Sobieski

11:04 a.m.

Mike,

Thank you for mentioning PROV-O [1][2]; it hadn’t been addressed.

One of the PROV-O editors was Deborah L. McGuinness. She was involved in both the Inference Web [3] and PML [4].

Best regards, Adam

[1] https://www.w3.org/TR/prov-o/ [2] https://en.wikipedia.org/wiki/PROV_(Provenance) [3] https://web.archive.org/web/20170610061815/http://inference-web.org/wiki/Mai...https://web.archive.org/web/20170610061815/http:/inference-web.org/wiki/Main_Page [4] https://en.wikipedia.org/wiki/Provenance_Markup_Language

From: Mike Bennettmailto:mbennett@hypercube.co.uk Sent: Tuesday, July 14, 2020 3:39 PM To: abstract-wikipedia@lists.wikimedia.orgmailto:abstract-wikipedia@lists.wikimedia.org Subject: Re: [Abstract-wikipedia] Wikidata Statement Provenance, Automated Reasoning, and Natural Language Generation

What about using the W3C Provenance ontology Prov-O? Or has that already been addressed?

Mike On 7/14/2020 1:22 PM, Adam Sobieski wrote: Charles,

Best regards, Adam

[1] https://www.wikidata.org/wiki/Help:Sources [2] https://www.wikidata.org/wiki/Wikidata:WikiProject_Reasoning [3] https://en.wikipedia.org/wiki/Wikipedia:No_original_research

On 14 July 2020 at 03:11 Adam Sobieski adamsobieski@hotmail.com mailto:adamsobieski@hotmail.com wrote:

Any thoughts on these topics?

Charles

_______________________________________________

Abstract-Wikipedia mailing list

Abstract-Wikipedia@lists.wikimedia.orgmailto:Abstract-Wikipedia@lists.wikimedia.org

https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia

Mike Bennett

Hypercube Limited

Level 18, 40 Bank Street (HQ3)

Canary Wharf, London, E14 5NR

Tel 020 7917 9522 Mob. 07721 420 730

Twitter: @MikeHypercube

www.hypercube.co.ukhttp://www.hypercube.co.uk

Denny Vrandečić

16 Jul 16 Jul

11:49 a.m.

Yes, the Provenance ontology is being used by Wikidata and has been so since pretty much the beginning. The original data model was built on it and on discussions with members of the Provenance WG.

On Tue, Jul 14, 2020 at 8:04 PM Adam Sobieski adamsobieski@hotmail.com wrote:

...

Mike,

Thank you for mentioning PROV-O [1][2]; it hadn’t been addressed.

One of the PROV-O editors was Deborah L. McGuinness. She was involved in both the Inference Web [3] and PML [4].

Best regards,

Adam

[1] https://www.w3.org/TR/prov-o/

[2] https://en.wikipedia.org/wiki/PROV_(Provenance)

[3] https://web.archive.org/web/20170610061815/http://inference-web.org/wiki/Mai... https://web.archive.org/web/20170610061815/http:/inference-web.org/wiki/Main_Page

[4] https://en.wikipedia.org/wiki/Provenance_Markup_Language

*From: *Mike Bennett mbennett@hypercube.co.uk *Sent: *Tuesday, July 14, 2020 3:39 PM *To: *abstract-wikipedia@lists.wikimedia.org *Subject: *Re: [Abstract-wikipedia] Wikidata Statement Provenance, Automated Reasoning, and Natural Language Generation

What about using the W3C Provenance ontology Prov-O? Or has that already been addressed?

Mike

On 7/14/2020 1:22 PM, Adam Sobieski wrote:

Charles,

For an example, we can refer to the Douglas Adams article ( https://www.wikidata.org/wiki/Q42). We can see that the statement that Douglas Adams was a science fiction writer is attributed to the Bibliothèque nationale de France.

Let us imagine that that statement was not asserted and sourced to the Bibliothèque nationale de France, but was instead derived from the combination of facts that he authored works which were science fiction. Douglas Adams authored the Hitchhiker's Guide to the Galaxy pentalogy and those works were science fiction. I am not saying that that is a valid rule; it is merely an example for this discussion: authors of science fictions works are science fiction writers.

In the hypothetical, there would be a statement in the knowledgebase with, instead of a reference, a derivation (statements could, however, have both sources and derivations). Perhaps, one day, readers will be able to click on a derivation’s hyperlink on Wikidata to view an automatically-generated page explaining it.

An automatically-generated natural language article for Q42 might contain a sentence “Douglas Adams was a science fiction writer” which would have a numbered citation, but, in the hypothetical, instead of that citation referring to a referenced material from the Bibliothèque nationale de France, it would refer to an automatically-generated document which explained the origin of the statement from automated reasoning upon component statements, those recursively either asserted and sourced or the result of automated reasoning.

It is interesting to consider the propagation of sources [1] and/or derivations [2] from Wikidata knowledgebase statements to citations for referenced materials or derivations in automatically-generated articles.

I hope that automated reasoning would not be in tension with standard Wikipedia policies on original research and synthesis [3]. Perhaps there would be new policies for reviewing each logical rule desired to be entered into the knowledgebase. In my opinion, a privileged user role (e.g. administrator) would be needed for activating and deactivating proposed logical rules used to produce knowledgebase statements.

Best regards,

Adam

[1] https://www.wikidata.org/wiki/Help:Sources

[2] https://www.wikidata.org/wiki/Wikidata:WikiProject_Reasoning

[3] https://en.wikipedia.org/wiki/Wikipedia:No_original_research

*From: *Charles Matthews via Abstract-Wikipedia abstract-wikipedia@lists.wikimedia.org *Sent: *Tuesday, July 14, 2020 6:49 AM *To: *General public mailing list for the discussion of Abstract Wikipedia (aka Wikilambda) abstract-wikipedia@lists.wikimedia.org *Subject: *Re: [Abstract-wikipedia] Wikidata Statement Provenance, Automated Reasoning, and Natural Language Generation

On 14 July 2020 at 03:11 Adam Sobieski adamsobieski@hotmail.com adamsobieski@hotmail.com wrote:

I

That is, automatically-generated articles could provide reasoning supporting, arguments for, and/or proofs of the contents of natural language sentences in a manner similar to how they can provide referenced sources and materials.

Any thoughts on these topics?

The idea is in tension with standard Wikipedia policies on original research and synthesis. Admittedly, it would be interesting to see some edge cases that were/were not acceptable under those policies. But on the whole, I think work in the sort of symbolic AI tradition suggested would be better received as research built on top of the drive to create articles, rather than integrated with it.

Charles

Abstract-Wikipedia mailing list

Abstract-Wikipedia@lists.wikimedia.org

https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia

--

Mike Bennett

Hypercube Limited

Level 18, 40 Bank Street (HQ3)

Canary Wharf, London, E14 5NR

Tel 020 7917 9522 Mob. 07721 420 730

Twitter: @MikeHypercube

www.hypercube.co.uk

Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia

Charles Matthews

15 Jul 15 Jul

3:52 a.m.

...

On 14 July 2020 at 18:22 Adam Sobieski adamsobieski@hotmail.com wrote:

Charles,



For an example, we can refer to the Douglas Adams article (https://www.wikidata.org/wiki/Q42). We can see that the statement that Douglas Adams was a science fiction writer is attributed to the Bibliothèque nationale de France.



Let us imagine that that statement was not asserted and sourced to the Bibliothèque nationale de France, but was instead derived from the combination of facts that he authored works which were science fiction. Douglas Adams authored the Hitchhiker's Guide to the Galaxy pentalogy and those works were science fiction. I am not saying that that is a valid rule; it is merely an example for this discussion: authors of science fictions works are science fiction writers.

Problem with the whole direction of this. I can know "electrolysis is a method of producing sodium" and "sodium is an alkali metal", and then write "Electrolysis is a method of producing sodium, an alkali metal." Fair enough, a fine sentence. Now start with "Prince Andrew was a friend of Jeffrey Epstein." I think people see where this goes.

Trial lawyers, and propagandists, know the value of simple sentence patterns that in practice serve to assert more than they do logically. Connotation by apposition is just one trope in this whole business.

Wikipedia has a guideline WP:SYNTH, and to be a decent writer of the Wikipedia neutral house style you have to internalise its gist. It is not just that you must not put 2 and 2 together and get 5. You must not put any things together in a way that "constructs", that pushes a thesis.

...

I hope that automated reasoning would not be in tension with standard Wikipedia policies on original research and synthesis [3]. Perhaps there would be new policies for reviewing each logical rule desired to be entered into the knowledgebase. In my opinion, a privileged user role (e.g. administrator) would be needed for activating and deactivating proposed logical rules used to produce knowledgebase statements.

"Do not combine material from multiple sources to reach or imply a conclusion not explicitly stated by any of the sources."

Why does WP:SYNTH say this?

Choose from:

(1) Hard cases make bad law.

(2) Bitter experience.

I'm with (2).

Charles

Adam Sobieski

11:33 a.m.

We are approaching some interesting and project-specific topics. The natural language generation output is desired to be specifically encyclopedic, with that involving a number of established style guidelines. For instance, automatically-generated encyclopedia articles should be neutral [1] and devoid of sentiment.

My opinion is that WP:SYNTH [2] should not be an obstacle to projects exploring Wikidata reasoning [3]. My opinion is that guidelines intended for human contributors need not be interpreted as applicable to sound, administrator-approved, mechanical reasoning processes. My opinion is that natural language generation can generate encyclopedic articles from both sourced and soundly derived statements.

There is also a difference between: (a) reasoning processes occurring on Wikidata data to obtain soundly derived data, (b) processes of reasoning which appear in the rhetoric of articles.

All that said, I don’t know the current status of projects exploring automated reasoning with Wikidata [3].

Whether or not WP:SYNTH [2] applies to administrator-approved logical rules with which to generate Wikidata content from other Wikidata content… perhaps an interested group of participants could, at some point, create a wiki page for others to review and to make use of towards a determination.

Also, another use of automated reasoning for Wikidata is that a system could use automated reasoning to provide editors with candidate or potential statements for them to find sources which support or oppose the statements… man-machine collaboration.

Best regards, Adam

[1] https://en.wikipedia.org/wiki/Wikipedia:Neutral_point_of_view [2] https://en.wikipedia.org/wiki/Wikipedia:No_original_research#Synthesis_of_pu... [3] https://www.wikidata.org/wiki/Wikidata:WikiProject_Reasoning

Charles Matthews

3:55 p.m.

...

On 15 July 2020 at 04:33 Adam Sobieski adamsobieski@hotmail.com wrote:

We are approaching some interesting and project-specific topics. The natural language generation output is desired to be specifically encyclopedic, with that involving a number of established style guidelines. For instance, automatically-generated encyclopedia articles should be neutral [1] and devoid of sentiment.



My opinion is that WP:SYNTH [2] should not be an obstacle to projects exploring Wikidata reasoning [3]. My opinion is that guidelines intended for human contributors need not be interpreted as applicable to sound, administrator-approved, mechanical reasoning processes. My opinion is that natural language generation can generate encyclopedic articles from both sourced and soundly derived statements.

To amplify and then sum up what I was saying, more "abstractly".

Abstract Wikipedia will have downstream users, since that is the announced plan. These will be Wikipedia communities: which are individual online communities having their own content policies. Using enWP's content policies is for definiteness in discussion: I'm not assuming AW is trying to produce content primarily for enWP.

The situation is analogous to, but seriously more complex than, that with Wikidata-powered infoboxes. Those infoboxes have been rolled out quite unevenly over Wikipedias, with the smaller Wikipedias generally receptive, and some of the largest most resistant.

Wikidata is a community of its own. It is certainly not bound by Wikipedia guidelines. What it does about verifiability, for example, is under its own control. This is the typical wiki situation of autonomy, subject to some framework set up by the WMF.

AW can expect to enjoy the same type of autonomy. It will presumably by design be downstream of Wikidata, and therefore will have an interest in how Wikidata handles its own content. It will be upstream of all Wikipedias that ingest any AW content.

Those who have seen the enWP debates on infoboxes will understand it when I say that "pushback" against imported content is to be expected, under some circumstances, and is not particularly easy to handle.

So, given that Wikimedia is a community of communities, the complexities are social as well as technical, and "philosophical" (relating for example to foundational debates in analytical philosophy of around a century ago). Guidelines are indicators of certain fault lines, which may become divisive. In the best of all possible Leibnizian worlds, rhetorical problems with generated content that consists of assertoric propositions just fall away when the foundations are correctly laid.

My point is that AW is intended to deal with a hike of expressive power, compared with the infobox situation; and my concern is not that "Wikidata reasoning" is in itself a problem (which is a Wikidata internal issue), but that appeal to machine syllogisms on Wikidata adds a "black box" upstream of AW.

The Douglas Adams example given comes down to saying the predicate "is a science fiction author" can be taken to be the relational composite of "is author of a work" and "work has genre science fiction". In other words it can be witnessed by a single work, which has the genre. This is debatable in various ways: verification of the two parts separately, rather than requiring a witnessing statement stating explicitly that Adams is a science fiction author, will give different results in practice; and the SF genre has been contested since the 1940s at least. If this was supposed to be a trite example, I think it is not that.

Wikimedia communities are far from mechanical. I expressed no opposition to "projects exploring Wikidata reasoning". I think there is good reason to avoid conflating those with AW.

Charles

Adam Sobieski

5:45 p.m.

The example about inferring that Douglas Adams [1] was a science fiction writer from some of his works being science fiction may have muddied the waters.

The example from the Wikidata reasoning project [2] may be more useful for discussion. “The spouse (P26) of Douglas Adams (Q42) was Jane Belson (Q14623681). Clearly, this means that, conversely, the spouse (P26) of Jane Belson (Q14623681) was Douglas Adams (Q42). This is a simple example of a case where one statement (about Jane Belson (Q14623681)) can be inferred from another statement (about Douglas Adams (Q42)).”

The sources and derivations of Wikidata statements could propagate through natural language generation to resultant articles. For sourced statements, one can consider resultant articles having citations which refer to statements’ sources. For derived statements, one could consider resultant articles having citations which refer to hyperlinked-to, automatically-generated documents which explain statements’ derivations.

That is, both sources and derivations of Wikidata statements could propagate through natural language generation so that citations (“[a]” and “[b]”) appear in the resultant article on appropriate sentences.

The spouse of Douglas Adams was Jane Belson [a]. The spouse of Jane Belson was Douglas Adams [b].

References [a] here would be a referenced source material (https://www.nndb.com/people/731/000023662/). [b] here one could click on a hyperlink to navigate to an automatically-generated page which explains a derivation.

Best regards, Adam

[1] https://www.wikidata.org/wiki/Q42 [2] https://www.wikidata.org/wiki/Wikidata:WikiProject_Reasoning

From: Charles Matthews via Abstract-Wikipediamailto:abstract-wikipedia@lists.wikimedia.org Sent: Wednesday, July 15, 2020 3:55 AM To: General public mailing list for the discussion of Abstract Wikipedia (aka Wikilambda)mailto:abstract-wikipedia@lists.wikimedia.org Subject: Re: [Abstract-wikipedia] Wikidata Statement Provenance, Automated Reasoning, and Natural Language Generation

On 15 July 2020 at 04:33 Adam Sobieski adamsobieski@hotmail.com wrote:

To amplify and then sum up what I was saying, more "abstractly".

Wikimedia communities are far from mechanical. I expressed no opposition to "projects exploring Wikidata reasoning". I think there is good reason to avoid conflating those with AW.

Charles

Charles Matthews

6:02 p.m.

...

On 15 July 2020 at 10:45 Adam Sobieski adamsobieski@hotmail.com wrote:

The example about inferring that Douglas Adams [1] was a science fiction writer from some of his works being science fiction may have muddied the waters.



The example from the Wikidata reasoning project [2] may be more useful for discussion. “The spouse (P26) of Douglas Adams (Q42) was Jane Belson (Q14623681). Clearly, this means that, conversely, the spouse (P26) of Jane Belson (Q14623681) was Douglas Adams (Q42). This is a simple example of a case where one statement (about Jane Belson (Q14623681)) can be inferred from another statement (about Douglas Adams (Q42)).”

Yes, inference from family relationships to others is not, in my view, a concern under WP:SYNTH. Inferences in temporal logic likewise.

I have commented elsewhere (in tweets) on the ability of Wikidata to formalise the quite complex way historians actually qualify dates. Correct propagation of the temporal logic of scholarly sources into AW format might be a good test case of some of these ideas.

Wikidata has not yet tackled the plethora of calendars currently in use; so I would be interested to see if AW could contribute in this direction, code-wise.

These are examples of areas that seem to me fruitful.

Charles

Denny Vrandečić

16 Jul 16 Jul

12:45 p.m.

Oh, yes, I certainly want us to tackle calendars (and quantities, for that matter) in Wikidata. This is something I left in a state I wasn't perfectly happy yet, and I would like to help improve that, and Wikilambda will offer an option for that (by providing a library of functions to deal with calendars and quantities, specifically).

Regarding the reasoning and inferences: I certainly expect and hope that we will be implementing such rules as envisioned by Adam in Wikilambda, and I equally expect that we won't use them much in Abstract Wikipedia for the reasons Charles mentioned.

There's an additional reason I see that would make it less likely to use this kind of inference in Abstract Wikipedia, which is that I pretty much expect that we as contributors will have rather fine-grained control of what is stated in the article. I.e. I don't think that there is ever a need for the system to try to infer that Douglas Adams is a Science Fiction author. The abstract article for Doulas Adams will likely start with a sentence such as "Douglas Adams was an English author, screenwriter, essayist, humorist, satirist and dramatist." Now, this is not a full list of all the occupations of Douglas Adams - Wikidata also lists children's writer and playwright, he also was an environmental conservationist, a computer game author, and probably much more. Now, to choose the right selection of occupations for this first sentence, and whether to additionally infer science fiction author, and then also to order this list - I think, that might be rather a challenge.

Instead I would think that the author creating this first sentence can explicitly set and choose which occupations to list in which order, and which to drop. The list may or may not vibe with what is stated in Wikidata, and that's OK.

We will have the possibility to query Wikidata for certain things, and we will have the ability to do inferences, but I expect these superpowers to be used in very measured way, and mostly for the longer tail of content. But all of these are editorial decisions.

On Wed, Jul 15, 2020 at 3:02 AM Charles Matthews via Abstract-Wikipedia < abstract-wikipedia@lists.wikimedia.org> wrote:

...

On 15 July 2020 at 10:45 Adam Sobieski adamsobieski@hotmail.com wrote:

The example about inferring that Douglas Adams [1] was a science fiction writer from some of his works being science fiction may have muddied the waters.

The example from the Wikidata reasoning project [2] may be more useful for discussion. “The spouse (P26) of Douglas Adams (Q42) was Jane Belson (Q14623681). Clearly, this means that, conversely, the spouse (P26) of Jane Belson (Q14623681) was Douglas Adams (Q42). This is a simple example of a case where one statement (about Jane Belson (Q14623681)) can be inferred from another statement (about Douglas Adams (Q42)).”

Yes, inference from family relationships to others is not, in my view, a concern under WP:SYNTH. Inferences in temporal logic likewise.

I have commented elsewhere (in tweets) on the ability of Wikidata to formalise the quite complex way historians actually qualify dates. Correct propagation of the temporal logic of scholarly sources into AW format might be a good test case of some of these ideas.

Wikidata has not yet tackled the plethora of calendars currently in use; so I would be interested to see if AW could contribute in this direction, code-wise.

These are examples of areas that seem to me fruitful.

Charles _______________________________________________ Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia

Denny Vrandečić

12:48 p.m.

Extend "I don't think that there is ever a need for the system to try to infer that Douglas Adams is a Science Fiction author." with "I don't think that there is ever a need for the system to try to infer that Douglas Adams is a Science Fiction author when creating text for the Wikipedia article for Douglas Adams." There might obviously be other use cases where such a capability is needed.

On Wed, Jul 15, 2020 at 9:45 PM Denny Vrandečić dvrandecic@wikimedia.org wrote:

...

Oh, yes, I certainly want us to tackle calendars (and quantities, for that matter) in Wikidata. This is something I left in a state I wasn't perfectly happy yet, and I would like to help improve that, and Wikilambda will offer an option for that (by providing a library of functions to deal with calendars and quantities, specifically).

Regarding the reasoning and inferences: I certainly expect and hope that we will be implementing such rules as envisioned by Adam in Wikilambda, and I equally expect that we won't use them much in Abstract Wikipedia for the reasons Charles mentioned.

There's an additional reason I see that would make it less likely to use this kind of inference in Abstract Wikipedia, which is that I pretty much expect that we as contributors will have rather fine-grained control of what is stated in the article. I.e. I don't think that there is ever a need for the system to try to infer that Douglas Adams is a Science Fiction author. The abstract article for Doulas Adams will likely start with a sentence such as "Douglas Adams was an English author, screenwriter, essayist, humorist, satirist and dramatist." Now, this is not a full list of all the occupations of Douglas Adams - Wikidata also lists children's writer and playwright, he also was an environmental conservationist, a computer game author, and probably much more. Now, to choose the right selection of occupations for this first sentence, and whether to additionally infer science fiction author, and then also to order this list

I think, that might be rather a challenge.

Instead I would think that the author creating this first sentence can explicitly set and choose which occupations to list in which order, and which to drop. The list may or may not vibe with what is stated in Wikidata, and that's OK.

We will have the possibility to query Wikidata for certain things, and we will have the ability to do inferences, but I expect these superpowers to be used in very measured way, and mostly for the longer tail of content. But all of these are editorial decisions.

On Wed, Jul 15, 2020 at 3:02 AM Charles Matthews via Abstract-Wikipedia < abstract-wikipedia@lists.wikimedia.org> wrote:

...
On 15 July 2020 at 10:45 Adam Sobieski adamsobieski@hotmail.com wrote:

The example about inferring that Douglas Adams [1] was a science fiction writer from some of his works being science fiction may have muddied the waters.

The example from the Wikidata reasoning project [2] may be more useful for discussion. “The spouse (P26) of Douglas Adams (Q42) was Jane Belson (Q14623681). Clearly, this means that, conversely, the spouse (P26) of Jane Belson (Q14623681) was Douglas Adams (Q42). This is a simple example of a case where one statement (about Jane Belson (Q14623681)) can be inferred from another statement (about Douglas Adams (Q42)).”

Yes, inference from family relationships to others is not, in my view, a concern under WP:SYNTH. Inferences in temporal logic likewise.

I have commented elsewhere (in tweets) on the ability of Wikidata to formalise the quite complex way historians actually qualify dates. Correct propagation of the temporal logic of scholarly sources into AW format might be a good test case of some of these ideas.

Wikidata has not yet tackled the plethora of calendars currently in use; so I would be interested to see if AW could contribute in this direction, code-wise.

These are examples of areas that seem to me fruitful.

Charles _______________________________________________ Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia

Samuel Klein

10:57 p.m.

I appreciate this extension.

I hope any sort of inference will be on the table for someone to propose via an appropriate function, while others could debate the range of contexts in which a) that function would have the desired result with appropriate false ±, and b) that function would be a socially appropriate source

🌍🌏🌎🌑

On Thu., Jul. 16, 2020, 12:48 a.m. Denny Vrandečić, < dvrandecic@wikimedia.org> wrote:

...

Extend "I don't think that there is ever a need for the system to try to infer that Douglas Adams is a Science Fiction author." with "I don't think that there is ever a need for the system to try to infer that Douglas Adams is a Science Fiction author when creating text for the Wikipedia article for Douglas Adams." There might obviously be other use cases where such a capability is needed.

On Wed, Jul 15, 2020 at 9:45 PM Denny Vrandečić dvrandecic@wikimedia.org wrote:

...
Oh, yes, I certainly want us to tackle calendars (and quantities, for that matter) in Wikidata. This is something I left in a state I wasn't perfectly happy yet, and I would like to help improve that, and Wikilambda will offer an option for that (by providing a library of functions to deal with calendars and quantities, specifically).

Regarding the reasoning and inferences: I certainly expect and hope that we will be implementing such rules as envisioned by Adam in Wikilambda, and I equally expect that we won't use them much in Abstract Wikipedia for the reasons Charles mentioned.

There's an additional reason I see that would make it less likely to use this kind of inference in Abstract Wikipedia, which is that I pretty much expect that we as contributors will have rather fine-grained control of what is stated in the article. I.e. I don't think that there is ever a need for the system to try to infer that Douglas Adams is a Science Fiction author. The abstract article for Doulas Adams will likely start with a sentence such as "Douglas Adams was an English author, screenwriter, essayist, humorist, satirist and dramatist." Now, this is not a full list of all the occupations of Douglas Adams - Wikidata also lists children's writer and playwright, he also was an environmental conservationist, a computer game author, and probably much more. Now, to choose the right selection of occupations for this first sentence, and whether to additionally infer science fiction author, and then also to order this list

I think, that might be rather a challenge.

Instead I would think that the author creating this first sentence can explicitly set and choose which occupations to list in which order, and which to drop. The list may or may not vibe with what is stated in Wikidata, and that's OK.

We will have the possibility to query Wikidata for certain things, and we will have the ability to do inferences, but I expect these superpowers to be used in very measured way, and mostly for the longer tail of content. But all of these are editorial decisions.

On Wed, Jul 15, 2020 at 3:02 AM Charles Matthews via Abstract-Wikipedia < abstract-wikipedia@lists.wikimedia.org> wrote:

...
On 15 July 2020 at 10:45 Adam Sobieski adamsobieski@hotmail.com wrote:

The example about inferring that Douglas Adams [1] was a science fiction writer from some of his works being science fiction may have muddied the waters.

The example from the Wikidata reasoning project [2] may be more useful for discussion. “The spouse (P26) of Douglas Adams (Q42) was Jane Belson (Q14623681). Clearly, this means that, conversely, the spouse (P26) of Jane Belson (Q14623681) was Douglas Adams (Q42). This is a simple example of a case where one statement (about Jane Belson (Q14623681)) can be inferred from another statement (about Douglas Adams (Q42)).”

Yes, inference from family relationships to others is not, in my view, a concern under WP:SYNTH. Inferences in temporal logic likewise.

I have commented elsewhere (in tweets) on the ability of Wikidata to formalise the quite complex way historians actually qualify dates. Correct propagation of the temporal logic of scholarly sources into AW format might be a good test case of some of these ideas.

Wikidata has not yet tackled the plethora of calendars currently in use; so I would be interested to see if AW could contribute in this direction, code-wise.

These are examples of areas that seem to me fruitful.

Charles _______________________________________________ Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia

Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia

Adam Sobieski

17 Jul 17 Jul

3:51 p.m.

It is exciting that we will have the ability to do inferences; I think that inference engines for Wikidata knowledgebases are a good idea.

Individual rules should be considered in contexts. In my opinion, a good policy is for privileged users (e.g. admins) to be able to activate and deactivate individual rules, e.g. in accordance with community deliberation.

I find interesting that natural language generation could be useful for generating explanations of reasoning processes and derivations to readers and that these explanations of derivations could be hyperlinked-to as referenced materials in automatically-generated articles.

From: Samuel Kleinmailto:meta.sj@gmail.com Sent: Thursday, July 16, 2020 10:57 AM To: General public mailing list for the discussion of Abstract Wikipedia (aka Wikilambda)mailto:abstract-wikipedia@lists.wikimedia.org Subject: Re: [Abstract-wikipedia] Wikidata Statement Provenance, Automated Reasoning, and Natural Language Generation

I appreciate this extension.

On Thu., Jul. 16, 2020, 12:48 a.m. Denny Vrandečić, <dvrandecic@wikimedia.orgmailto:dvrandecic@wikimedia.org> wrote: Extend "I don't think that there is ever a need for the system to try to infer that Douglas Adams is a Science Fiction author." with "I don't think that there is ever a need for the system to try to infer that Douglas Adams is a Science Fiction author when creating text for the Wikipedia article for Douglas Adams." There might obviously be other use cases where such a capability is needed.

On Wed, Jul 15, 2020 at 9:45 PM Denny Vrandečić <dvrandecic@wikimedia.orgmailto:dvrandecic@wikimedia.org> wrote: Oh, yes, I certainly want us to tackle calendars (and quantities, for that matter) in Wikidata. This is something I left in a state I wasn't perfectly happy yet, and I would like to help improve that, and Wikilambda will offer an option for that (by providing a library of functions to deal with calendars and quantities, specifically).

On Wed, Jul 15, 2020 at 3:02 AM Charles Matthews via Abstract-Wikipedia <abstract-wikipedia@lists.wikimedia.orgmailto:abstract-wikipedia@lists.wikimedia.org> wrote:

On 15 July 2020 at 10:45 Adam Sobieski <adamsobieski@hotmail.commailto:adamsobieski@hotmail.com> wrote:

The example about inferring that Douglas Adams [1] was a science fiction writer from some of his works being science fiction may have muddied the waters.

Yes, inference from family relationships to others is not, in my view, a concern under WP:SYNTH. Inferences in temporal logic likewise.

Wikidata has not yet tackled the plethora of calendars currently in use; so I would be interested to see if AW could contribute in this direction, code-wise.

These are examples of areas that seem to me fruitful.

Charles _______________________________________________ Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.orgmailto:Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia _______________________________________________ Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.orgmailto:Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia

Charles Matthews

5:20 p.m.

...

On 17 July 2020 at 08:51 Adam Sobieski adamsobieski@hotmail.com wrote:

It is exciting that we will have the ability to do inferences; I think that inference engines for Wikidata knowledgebases are a good idea.



Individual rules should be considered in contexts. In my opinion, a good policy is for privileged users (e.g. admins) to be able to activate and deactivate individual rules, e.g. in accordance with community deliberation.

As someone who has been involved, over the past year, with a couple of heavy-duty disputes with bots on Wikidata, I beg to differ.

Some reasons: Wikidata is so vast (pushing 100M items) that patrolling is very difficult in practical terms. The required tools to figure out easily what has gone on are not yet there. The site is still in the growth spurt recognisable in early (English) Wikipedia history as "quantity over quality". The technophile tendency has yet to be balanced by a curation ethic of the same clout.

Tl;dr is that the site is not mature. I don't think community deliberation is yet any sort of warranty.

There is already a degree of inference, on missing information, within the system for flagging up data constraint violations. That can be built on, clearly. The gradual setting up of more stringent data modelling likewise tends towards identifying gaps in the statements held on an item of a particular kind (for example, a book edition item, publication date after about 1970, published in a country such as the USA, should probably have a potential ISBN statement, if it is not yet there).

What I wrote on 14 July about P887, "based on heuristic", may have been misleading. Here anyway is a sample query that finds items where it is in use:

https://w.wiki/X9R

That is for P921, on which I work, but this type of query can be used to explore the space in which P887 is used. There is a great deal of tacit use, for example of the heuristic that given name can be used to deduce gender, that is not flagged up in that way: maybe we'll get to that.

The heuristic for P143 "imported from Wikimedia project" was deprecated long since. I looked through the references for Q254, the item for Mozart, and you can see there the extent of referencing using it.

I think the way to go is to build up the "manual", by which I mean the constraint violation apparatus, the "shape expression" data modelling and its later iterations, and generally the existing community-developed tools. That is where there is a need for consolidation and implementation of maintenance routines, to put it in a downbeat way.

Charles

Adam Sobieski

18 Jul 18 Jul

6:01 a.m.

Ontological reasoning, curation and maintenance are pragmatic concerns for crowdsourced knowledgebase resources. The matter need not be phrased as being one of choosing one or the other. I think that we would tend to want ontological reasoning and inference.

Regarding bots, should Wikidata come to support the expressiveness for statements having both references and derivations, bots could then provide derivations for added statements. Wikidata could verify and/or validate these derivations, ensuring that each step of reasoning was approved by the system admins. This could be a new paradigm for bots and bot-generated content.

Narrative is one of the four modes of rhetoric and generating narratives from events is an important natural language generation scenario. By means of inference rules, events can be produced from knowledgebase statements [1][2]. Events can also be inferred from other events or combinations of events. A layer of modeled events and event-based reasoning could be considered atop Wikidata statements. Events could be as first-class objects, reasoned upon, and utilized during natural language generation.

Pertinent to this discussion thread, in addition to statements having provenance, modeled events could as well.

The spouse of Douglas Adams was Jane Belson [a]. The spouse of Jane Belson was Douglas Adams [b]. Douglas Adams and Jane Belson were married on November 25, 1991 [c].

References [a] here could be a referenced source material (https://www.nndb.com/people/731/000023662/). [b] here one could click on a hyperlink to navigate to an automatically-generated page which explains a derivation of a statement. [c] here one could click on a hyperlink to navigate to an automatically-generated page which explains a derivation of an event.

In the above example automatically-generated article, the first two sentences could be generated from statements (one in the knowledgebase and the other derived) and the third sentence could be generated from a modeled event (derived from statements).

Best regards, Adam

[1] Metilli, Daniele, Maria Simi, Carlo Meghini, and Valentina Bartalesi Lenzi. "A Wikidata-based Tool for the Creation of Narratives." PhD diss., Master’s thesis, University of Pisa, 2016. (PDFhttps://core.ac.uk/download/pdf/79622653.pdf)

[2] Metilli, Daniele, Valentina Bartalesi, Carlo Meghini, and Nicola Aloia. "Populating Narratives Using Wikidata Events: An Initial Experiment." In Italian Research Conference on Digital Libraries, pp. 159-166. Springer, Cham, 2019. (PDFhttps://openportal.isti.cnr.it/data/2019/403191/2019_403191.postprint.pdf)

From: Charles Matthews via Abstract-Wikipediamailto:abstract-wikipedia@lists.wikimedia.org Sent: Friday, July 17, 2020 5:21 AM To: General public mailing list for the discussion of Abstract Wikipedia (aka Wikilambda)mailto:abstract-wikipedia@lists.wikimedia.org Subject: Re: [Abstract-wikipedia] Wikidata Statement Provenance, Automated Reasoning, and Natural Language Generation

On 17 July 2020 at 08:51 Adam Sobieski adamsobieski@hotmail.com wrote:

It is exciting that we will have the ability to do inferences; I think that inference engines for Wikidata knowledgebases are a good idea.

As someone who has been involved, over the past year, with a couple of heavy-duty disputes with bots on Wikidata, I beg to differ.

Tl;dr is that the site is not mature. I don't think community deliberation is yet any sort of warranty.

What I wrote on 14 July about P887, "based on heuristic", may have been misleading. Here anyway is a sample query that finds items where it is in use:

https://w.wiki/X9R

Charles

Charles Matthews

4:11 p.m.

...

On 17 July 2020 at 23:01 Adam Sobieski adamsobieski@hotmail.com wrote:

Regarding bots, should Wikidata come to support the expressiveness for statements having both references and derivations, bots could then provide derivations for added statements. Wikidata could verify and/or validate these derivations, ensuring that each step of reasoning was approved by the system admins. This could be a new paradigm for bots and bot-generated content.

Indeed, who knows what the future of Wikidata holds.

My points about this vision:

(a) Giving admins a role excluding others in decisions about content lies rather outside the Wikimedia tradition.

(b) The synthesis issue lies not with the atomic steps, but with concatenations of those.

(c) Bot roles include creation of items from imported data, and editing existing items, the former being much easier. Within what I was calling the "manual", a reference to the Wikipedia Manual of Style, the use case for some such derivation-based additions is clear enough.

Charles

1591

Age (days ago)

1595

Last active (days ago)

abstract-wikipedia@lists.wikimedia.org

19 comments

6 participants

tags (0)

participants (6)

Adam Sobieski
Charles Matthews
Denny Vrandečić
Mike Bennett
Paula Kate Marmor
Samuel Klein