On 15 July 2020 at 04:33 Adam Sobieski <adamsobieski@hotmail.com> wrote:

We are approaching some interesting and project-specific topics. The natural language generation output is desired to be specifically encyclopedic, with that involving a number of established style guidelines. For instance, automatically-generated encyclopedia articles should be neutral [1] and devoid of sentiment.

 

My opinion is that WP:SYNTH [2] should not be an obstacle to projects exploring Wikidata reasoning [3]. My opinion is that guidelines intended for human contributors need not be interpreted as applicable to sound, administrator-approved, mechanical reasoning processes. My opinion is that natural language generation can generate encyclopedic articles from both sourced and soundly derived statements.

To amplify and then sum up what I was saying, more "abstractly".

Abstract Wikipedia will have downstream users, since that is the announced plan. These will be Wikipedia communities: which are individual online communities having their own content policies. Using enWP's content policies is for definiteness in discussion: I'm not assuming AW is trying to produce content primarily for enWP.

The situation is analogous to, but seriously more complex than, that with Wikidata-powered infoboxes. Those infoboxes have been rolled out quite unevenly over Wikipedias, with the smaller Wikipedias generally receptive, and some of the largest most resistant.

Wikidata is a community of its own. It is certainly not bound by Wikipedia guidelines. What it does about verifiability, for example, is under its own control. This is the typical wiki situation of autonomy, subject to some framework set up by the WMF.

AW can expect to enjoy the same type of autonomy. It will presumably by design be downstream of Wikidata, and therefore will have an interest in how Wikidata handles its own content. It will be upstream of all Wikipedias that ingest any AW content.

Those who have seen the enWP debates on infoboxes will understand it when I say that "pushback" against imported content is to be expected, under some circumstances, and is not particularly easy to handle.

So, given that Wikimedia is a community of communities, the complexities are social as well as technical, and "philosophical" (relating for example to foundational debates in analytical philosophy of around a century ago). Guidelines are indicators of certain fault lines, which may become divisive. In the best of all possible Leibnizian worlds, rhetorical problems with generated content that consists of assertoric propositions just fall away when the foundations are correctly laid.

My point is that AW is intended to deal with a hike of expressive power, compared with the infobox situation; and my concern is not that "Wikidata reasoning" is in itself a problem (which is a Wikidata internal issue), but that appeal to machine syllogisms on Wikidata adds a "black box" upstream of AW.

The Douglas Adams example given comes down to saying the predicate "is a science fiction author" can be taken to be the relational composite of "is author of a work" and "work has genre science fiction". In other words it can be witnessed by a single work, which has the genre. This is debatable in various ways: verification of the two parts separately, rather than requiring a witnessing statement stating explicitly that Adams is a science fiction author, will give different results in practice; and the SF genre has been contested since the 1940s at least. If this was supposed to be a trite example, I think it is not that.

Wikimedia communities are far from mechanical. I expressed no opposition to "projects exploring Wikidata reasoning". I think there is good reason to avoid conflating those with AW.

Charles