On 15 July 2020 at 04:33 Adam Sobieski
<adamsobieski(a)hotmail.com> wrote:
We are approaching some interesting and project-specific topics. The natural language
generation output is desired to be specifically encyclopedic, with that involving a number
of established style guidelines. For instance, automatically-generated encyclopedia
articles should be neutral [1] and devoid of sentiment.
My opinion is that WP:SYNTH [2] should not be an obstacle to projects exploring
Wikidata reasoning [3]. My opinion is that guidelines intended for human contributors need
not be interpreted as applicable to sound, administrator-approved, mechanical reasoning
processes. My opinion is that natural language generation can generate encyclopedic
articles from both sourced and soundly derived statements.
To amplify and then sum up what I was saying, more "abstractly".
Abstract Wikipedia will have downstream users, since that is the announced plan. These
will be Wikipedia communities: which are individual online communities having their own
content policies. Using enWP's content policies is for definiteness in discussion:
I'm not assuming AW is trying to produce content primarily for enWP.
The situation is analogous to, but seriously more complex than, that with Wikidata-powered
infoboxes. Those infoboxes have been rolled out quite unevenly over Wikipedias, with the
smaller Wikipedias generally receptive, and some of the largest most resistant.
Wikidata is a community of its own. It is certainly not bound by Wikipedia guidelines.
What it does about verifiability, for example, is under its own control. This is the
typical wiki situation of autonomy, subject to some framework set up by the WMF.
AW can expect to enjoy the same type of autonomy. It will presumably by design be
downstream of Wikidata, and therefore will have an interest in how Wikidata handles its
own content. It will be upstream of all Wikipedias that ingest any AW content.
Those who have seen the enWP debates on infoboxes will understand it when I say that
"pushback" against imported content is to be expected, under some circumstances,
and is not particularly easy to handle.
So, given that Wikimedia is a community of communities, the complexities are social as
well as technical, and "philosophical" (relating for example to foundational
debates in analytical philosophy of around a century ago). Guidelines are indicators of
certain fault lines, which may become divisive. In the best of all possible Leibnizian
worlds, rhetorical problems with generated content that consists of assertoric
propositions just fall away when the foundations are correctly laid.
My point is that AW is intended to deal with a hike of expressive power, compared with the
infobox situation; and my concern is not that "Wikidata reasoning" is in itself
a problem (which is a Wikidata internal issue), but that appeal to machine syllogisms on
Wikidata adds a "black box" upstream of AW.
The Douglas Adams example given comes down to saying the predicate "is a science
fiction author" can be taken to be the relational composite of "is author of a
work" and "work has genre science fiction". In other words it can be
witnessed by a single work, which has the genre. This is debatable in various ways:
verification of the two parts separately, rather than requiring a witnessing statement
stating explicitly that Adams is a science fiction author, will give different results in
practice; and the SF genre has been contested since the 1940s at least. If this was
supposed to be a trite example, I think it is not that.
Wikimedia communities are far from mechanical. I expressed no opposition to "projects
exploring Wikidata reasoning". I think there is good reason to avoid conflating those
with AW.
Charles