On 17 July 2020 at 08:51 Adam Sobieski adamsobieski@hotmail.com wrote:
It is exciting that we will have the ability to do inferences; I think that inference engines for Wikidata knowledgebases are a good idea. Individual rules should be considered in contexts. In my opinion, a good policy is for privileged users (e.g. admins) to be able to activate and deactivate individual rules, e.g. in accordance with community deliberation.
As someone who has been involved, over the past year, with a couple of heavy-duty disputes with bots on Wikidata, I beg to differ.
Some reasons: Wikidata is so vast (pushing 100M items) that patrolling is very difficult in practical terms. The required tools to figure out easily what has gone on are not yet there. The site is still in the growth spurt recognisable in early (English) Wikipedia history as "quantity over quality". The technophile tendency has yet to be balanced by a curation ethic of the same clout.
Tl;dr is that the site is not mature. I don't think community deliberation is yet any sort of warranty.
There is already a degree of inference, on missing information, within the system for flagging up data constraint violations. That can be built on, clearly. The gradual setting up of more stringent data modelling likewise tends towards identifying gaps in the statements held on an item of a particular kind (for example, a book edition item, publication date after about 1970, published in a country such as the USA, should probably have a potential ISBN statement, if it is not yet there).
What I wrote on 14 July about P887, "based on heuristic", may have been misleading. Here anyway is a sample query that finds items where it is in use:
That is for P921, on which I work, but this type of query can be used to explore the space in which P887 is used. There is a great deal of tacit use, for example of the heuristic that given name can be used to deduce gender, that is not flagged up in that way: maybe we'll get to that.
The heuristic for P143 "imported from Wikimedia project" was deprecated long since. I looked through the references for Q254, the item for Mozart, and you can see there the extent of referencing using it.
I think the way to go is to build up the "manual", by which I mean the constraint violation apparatus, the "shape expression" data modelling and its later iterations, and generally the existing community-developed tools. That is where there is a need for consolidation and implementation of maintenance routines, to put it in a downbeat way.
Charles