Ontological reasoning, curation and maintenance are pragmatic concerns for crowdsourced knowledgebase resources. The matter need not be phrased as being one of choosing one or the other. I think that we would tend to want ontological reasoning and inference.

 

Regarding bots, should Wikidata come to support the expressiveness for statements having both references and derivations, bots could then provide derivations for added statements. Wikidata could verify and/or validate these derivations, ensuring that each step of reasoning was approved by the system admins. This could be a new paradigm for bots and bot-generated content.

 

Narrative is one of the four modes of rhetoric and generating narratives from events is an important natural language generation scenario. By means of inference rules, events can be produced from knowledgebase statements [1][2]. Events can also be inferred from other events or combinations of events. A layer of modeled events and event-based reasoning could be considered atop Wikidata statements. Events could be as first-class objects, reasoned upon, and utilized during natural language generation.

 

Pertinent to this discussion thread, in addition to statements having provenance, modeled events could as well.

 

The spouse of Douglas Adams was Jane Belson [a].

The spouse of Jane Belson was Douglas Adams [b].

Douglas Adams and Jane Belson were married on November 25, 1991 [c].

 

References

[a] here could be a referenced source material (https://www.nndb.com/people/731/000023662/).

[b] here one could click on a hyperlink to navigate to an automatically-generated page which explains a derivation of a statement.

[c] here one could click on a hyperlink to navigate to an automatically-generated page which explains a derivation of an event.

 

In the above example automatically-generated article, the first two sentences could be generated from statements (one in the knowledgebase and the other derived) and the third sentence could be generated from a modeled event (derived from statements).

 

 

Best regards,

Adam

 

[1] Metilli, Daniele, Maria Simi, Carlo Meghini, and Valentina Bartalesi Lenzi. "A Wikidata-based Tool for the Creation of Narratives." PhD diss., Master’s thesis, University of Pisa, 2016. (PDF)

 

[2] Metilli, Daniele, Valentina Bartalesi, Carlo Meghini, and Nicola Aloia. "Populating Narratives Using Wikidata Events: An Initial Experiment." In Italian Research Conference on Digital Libraries, pp. 159-166. Springer, Cham, 2019. (PDF)

 

From: Charles Matthews via Abstract-Wikipedia
Sent: Friday, July 17, 2020 5:21 AM
To: General public mailing list for the discussion of Abstract Wikipedia (aka Wikilambda)
Subject: Re: [Abstract-wikipedia] Wikidata Statement Provenance, Automated Reasoning, and Natural Language Generation

 

 

On 17 July 2020 at 08:51 Adam Sobieski <adamsobieski@hotmail.com> wrote:

It is exciting that we will have the ability to do inferences; I think that inference engines for Wikidata knowledgebases are a good idea.

 

Individual rules should be considered in contexts. In my opinion, a good policy is for privileged users (e.g. admins) to be able to activate and deactivate individual rules, e.g. in accordance with community deliberation.

 

As someone who has been involved, over the past year, with a couple of heavy-duty disputes with bots on Wikidata, I beg to differ.

Some reasons: Wikidata is so vast (pushing 100M items) that patrolling is very difficult in practical terms. The required tools to figure out easily what has gone on are not yet there. The site is still in the growth spurt recognisable in early (English) Wikipedia history as "quantity over quality". The technophile tendency has yet to be balanced by a curation ethic of the same clout.

Tl;dr is that the site is not mature. I don't think community deliberation is yet any sort of warranty.

There is already a degree of inference, on missing information, within the system for flagging up data constraint violations. That can be built on, clearly. The gradual setting up of more stringent data modelling likewise tends towards identifying gaps in the statements held on an item of a particular kind (for example, a book edition item, publication date after about 1970, published in a country such as the USA, should probably have a potential ISBN statement, if it is not yet there).

What I wrote on 14 July about P887, "based on heuristic", may have been misleading. Here anyway is a sample query that finds items where it is in use:

https://w.wiki/X9R

That is for P921, on which I work, but this type of query can be used to explore the space in which P887 is used. There is a great deal of tacit use, for example of the heuristic that given name can be used to deduce gender, that is not flagged up in that way: maybe we'll get to that.

The heuristic for P143 "imported from Wikimedia project" was deprecated long since. I looked through the references for Q254, the item for Mozart, and you can see there the extent of referencing using it. 

I think the way to go is to build up the "manual", by which I mean the constraint violation apparatus, the "shape expression" data modelling and its later iterations, and generally the existing community-developed tools. That is where there is a need for consolidation and implementation of maintenance routines, to put it in a downbeat way.

Charles