Wikidata considered unable to support hierarchical search in Structured Data for Commons

List overview All Threads
Download

newer

older

Re: [Wikidata] Wikidata considered...

Query example: What other...

James Heald

27 Sep 2018 27 Sep '18

11:34 p.m.

This recent announcement by the Structured Data team perhaps ought to be quite a heads-up for us:

https://commons.wikimedia.org/wiki/Commons_talk:Structured_data#Searching_Co...

Essentially the team has given up on the hope of using Wikidata hierarchies to suggest generalised "depicts" values to store for images on Commons, to match against terms in incoming search requests.

i.e. if an image is of a German Shepherd dog, and identified as such, the team has given up on trying to infer in general from Wikidata that 'dog' is also a search term that such an image should score positively with.

Apparently the Wikidata hierarchies were simply too complicated, too unpredictable, and too arbitrary and inconsistent in their design across different subject areas to be readily assimilated (before one even starts on the density of bugs and glitches that then undermine them).

Instead, if that image ought to be considered in a search for 'dog', it looks as though an explicit 'depicts:dog' statement may be going to be needed to be specifically present, in addition to 'depicts:German Shepherd'.

Some of the background behind this assessment can be read in https://phabricator.wikimedia.org/T199119 in particular the first substantive comment on that ticket, by Cparle on 10 July, giving his quick initial read of some of the issues using Wikidata would face.

SDC was considered a flagship end-application for Wikidata. If the data in Wikidata is not usable enough to supply the dogfood that project was expected to be going to be relying on, that should be a serious wake-up call, a red flag we should not ignore.

If the way data is organised across different subjects is currently too inconsistent and confusing to be usable by our own SDC project, are there actions we can take to address that? Are there design principles to be chosen that then need to be applied consistently? Is this something the community can do, or is some more active direction going to need to be applied?

Wikidata's 'ontology' has grown haphazardly, with little oversight, like an untended bank of weeds. Is some more active gardening now required?

-- James.

--- This email has been checked for viruses by AVG. https://www.avg.com

Show replies by date

Stas Malyshev

28 Sep 28 Sep

12:23 a.m.

Hi!

...

Apparently the Wikidata hierarchies were simply too complicated, too unpredictable, and too arbitrary and inconsistent in their design across different subject areas to be readily assimilated (before one even starts on the density of bugs and glitches that then undermine them).

The main problem is that there is no standard way (or even defined small number of ways) to get the hierarchy that is relevant for "depicts" from current Wikidata data. It may even be that for a specific type or class the hierarchy is well defined, but the sheer number of different ways it is done in different areas is overwhelming and ill-suited for automatic processing. Of course things like "is "cat" a common name of an animal or a taxon and which one of these will be used in depicts" adds complexity too.

One way of solving it is to create a special hierarchy for "depicts" purposes that would serve this particular use case. Another way is to amend existing hierarchies and meta-hierarchies so that there would be an algorithmic way of navigating them in a common case. This is something that would be nice to hear about from people that are experienced in ontology creation and maintenance.

...

to be chosen that then need to be applied consistently? Is this something the community can do, or is some more active direction going to need to be applied?

I think this is very much something that the community can do.

-- Stas Malyshev smalyshev@wikimedia.org

Thad Guidry

4:41 a.m.

James,

It looks like a lot of that phabricator issue was around Taxons ? For the Poodle to show a class of Mammal...

Seems like many of these could be answered if someone responded to https://www.wikidata.org/wiki/User:Danyaljj on their last question about if an "OR" could be used with linktype with gas:service ... where no one gave an answer to their final question comment here: https://www.wikidata.org/wiki/Wikidata:Request_a_query/Archive/2017/01#Timeo...

I tried myself to answer that question and find either Parent Taxon OR Subclass of a Poodle, but couldn't seem to pull it off using gas:service and 1 hour of trial and error in many forms, even duplicating the program twice ...

http://tinyurl.com/yb7wfpwh

#defaultView:Graph PREFIX gas: http://www.bigdata.com/rdf/gas#

SELECT ?item ?itemLabel WHERE { SERVICE gas:service { gas:program gas:gasClass "com.bigdata.rdf.graph.analytics.SSSP" ; gas:in wd:Q38904 ; gas:traversalDirection "Forward" ; gas:out ?item ; gas:out1 ?depth ; gas:maxIterations 10 ; gas:linkType wdt:P279 . } SERVICE gas:service { gas:program gas:gasClass "com.bigdata.rdf.graph.analytics.SSSP" ; gas:in wd:Q38904 ; gas:traversalDirection "Forward" ; gas:out ?item ; gas:out1 ?depth ; gas:maxIterations 10 ; gas:linkType wdt:P171 . }

SERVICE wikibase:label {bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" } }

On Thu, Sep 27, 2018 at 5:24 PM Stas Malyshev smalyshev@wikimedia.org wrote:

...

Hi!

...
Apparently the Wikidata hierarchies were simply too complicated, too unpredictable, and too arbitrary and inconsistent in their design across different subject areas to be readily assimilated (before one even starts on the density of bugs and glitches that then undermine them).

The main problem is that there is no standard way (or even defined small number of ways) to get the hierarchy that is relevant for "depicts" from current Wikidata data. It may even be that for a specific type or class the hierarchy is well defined, but the sheer number of different ways it is done in different areas is overwhelming and ill-suited for automatic processing. Of course things like "is "cat" a common name of an animal or a taxon and which one of these will be used in depicts" adds complexity too.

One way of solving it is to create a special hierarchy for "depicts" purposes that would serve this particular use case. Another way is to amend existing hierarchies and meta-hierarchies so that there would be an algorithmic way of navigating them in a common case. This is something that would be nice to hear about from people that are experienced in ontology creation and maintenance.

...
to be chosen that then need to be applied consistently? Is this something the community can do, or is some more active direction going to need to be applied?

I think this is very much something that the community can do.

-- Stas Malyshev smalyshev@wikimedia.org

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Ettore RIZZA

29 Sep 29 Sep

9:48 a.m.

Hi,

The Wikidata's ontology is a mess, and I do not see how it could be otherwise. While the creation of new properties is controlled, any fool can decide that a woman https://www.wikidata.org/wiki/Q467is no longer a human or is part of family. Maybe I'm a fool too? I wanted to remove the claim that a ship https://www.wikidata.org/wiki/Q11446 is an instance of "ship type" because it produces weird circular inferences in my application; but maybe that makes sense to someone else.

There will never be a universal ontology on which everyone agrees. I wonder (sorry to think aloud) if Wikidata should not rather facilitate the use of external classifications. Many external ids are knowledge organization systems (ontologies, thesauri, classifications ...) I dream of a simple query that could search, in Wikidata, "all elements of the same class as 'poodle' according to the classification of imagenet http://imagenet.stanford.edu/synset?wnid=n02113335.

On Fri, 28 Sep 2018 at 04:42, Thad Guidry thadguidry@gmail.com wrote:

...

James,

It looks like a lot of that phabricator issue was around Taxons ? For the Poodle to show a class of Mammal...

Seems like many of these could be answered if someone responded to https://www.wikidata.org/wiki/User:Danyaljj on their last question about if an "OR" could be used with linktype with gas:service ... where no one gave an answer to their final question comment here:

https://www.wikidata.org/wiki/Wikidata:Request_a_query/Archive/2017/01#Timeo...

I tried myself to answer that question and find either Parent Taxon OR Subclass of a Poodle, but couldn't seem to pull it off using gas:service and 1 hour of trial and error in many forms, even duplicating the program twice ...

http://tinyurl.com/yb7wfpwh

#defaultView:Graph PREFIX gas: http://www.bigdata.com/rdf/gas#

SELECT ?item ?itemLabel WHERE { SERVICE gas:service { gas:program gas:gasClass "com.bigdata.rdf.graph.analytics.SSSP" ; gas:in wd:Q38904 ; gas:traversalDirection "Forward" ; gas:out ?item ; gas:out1 ?depth ; gas:maxIterations 10 ; gas:linkType wdt:P279 . } SERVICE gas:service { gas:program gas:gasClass "com.bigdata.rdf.graph.analytics.SSSP" ; gas:in wd:Q38904 ; gas:traversalDirection "Forward" ; gas:out ?item ; gas:out1 ?depth ; gas:maxIterations 10 ; gas:linkType wdt:P171 . }

SERVICE wikibase:label {bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" } }

On Thu, Sep 27, 2018 at 5:24 PM Stas Malyshev smalyshev@wikimedia.org wrote:

...
Hi!

...
Apparently the Wikidata hierarchies were simply too complicated, too unpredictable, and too arbitrary and inconsistent in their design across different subject areas to be readily assimilated (before one even starts on the density of bugs and glitches that then undermine them).

The main problem is that there is no standard way (or even defined small number of ways) to get the hierarchy that is relevant for "depicts" from current Wikidata data. It may even be that for a specific type or class the hierarchy is well defined, but the sheer number of different ways it is done in different areas is overwhelming and ill-suited for automatic processing. Of course things like "is "cat" a common name of an animal or a taxon and which one of these will be used in depicts" adds complexity too.

One way of solving it is to create a special hierarchy for "depicts" purposes that would serve this particular use case. Another way is to amend existing hierarchies and meta-hierarchies so that there would be an algorithmic way of navigating them in a common case. This is something that would be nice to hear about from people that are experienced in ontology creation and maintenance.

...
to be chosen that then need to be applied consistently? Is this something the community can do, or is some more active direction going to need to be applied?

I think this is very much something that the community can do.

-- Stas Malyshev smalyshev@wikimedia.org

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Thad Guidry

4:38 p.m.

Ettore,

Wikidata has the ability of crowdsourcing...unfortunately, it is not effectively utilized.

Its because Wikidata does not yet provide a voting feature on statements...where as the vote gets higher...more resistance to change the statement is required. But that breaks the notion of a "wiki" for some folks. And there we circle back to Gerard's age old question of ... should Wikidata really be considered a wiki at all for the benefit of society ? or should it apply voting/resistance to keep it tidy, factual and less messy.

We have the technology to implement voting/resistance on statements. I personally would utilize that feature and many others probably would as well. Crowdsourcing the low voted facts back to applications like OpenRefine, or the recently sent out Survey vote mechanism for spam analysis on the low voted statements could highlight where things are untidy and implement vote casting to clean them up.

"...the burden of proof has to be placed on authority, and it should be dismantled if that burden cannot be met..."

-Thad +ThadGuidry https://plus.google.com/+ThadGuidry

On Sat, Sep 29, 2018 at 2:49 AM Ettore RIZZA ettorerizza@gmail.com wrote:

...

Hi,

The Wikidata's ontology is a mess, and I do not see how it could be otherwise. While the creation of new properties is controlled, any fool can decide that a woman https://www.wikidata.org/wiki/Q467is no longer a human or is part of family. Maybe I'm a fool too? I wanted to remove the claim that a ship https://www.wikidata.org/wiki/Q11446 is an instance of "ship type" because it produces weird circular inferences in my application; but maybe that makes sense to someone else.

There will never be a universal ontology on which everyone agrees. I wonder (sorry to think aloud) if Wikidata should not rather facilitate the use of external classifications. Many external ids are knowledge organization systems (ontologies, thesauri, classifications ...) I dream of a simple query that could search, in Wikidata, "all elements of the same class as 'poodle' according to the classification of imagenet http://imagenet.stanford.edu/synset?wnid=n02113335.

Ettore RIZZA

6:58 p.m.

Hi Thad,

I understand that an open Wiki has its advantages and disadvantages (I sometimes prefer a system like StackOverflow, where you need a certain reputation to do some things). I am afraid that a voting system simply favors the opinions shared by the majority of Wikidata editors, namely a Western worldview. And even within this subgroup opinions may legitimately differ.

But there may be ways to avoid messing up the ontology while respecting the wiki spirit. For example, a warning pop-up every time you edit an ontological property (P31, P279, P361...). Something like: "OK, you added the statement "a poodle is an instance of toy". Do you agree with the fact that poodle is now a goods, a work, an artificial physical object? "

But that would only work for manual edits...

On Sat, 29 Sep 2018 at 16:38, Thad Guidry thadguidry@gmail.com wrote:

...

Ettore,

Wikidata has the ability of crowdsourcing...unfortunately, it is not effectively utilized.

Its because Wikidata does not yet provide a voting feature on statements...where as the vote gets higher...more resistance to change the statement is required. But that breaks the notion of a "wiki" for some folks. And there we circle back to Gerard's age old question of ... should Wikidata really be considered a wiki at all for the benefit of society ? or should it apply voting/resistance to keep it tidy, factual and less messy.

We have the technology to implement voting/resistance on statements. I personally would utilize that feature and many others probably would as well. Crowdsourcing the low voted facts back to applications like OpenRefine, or the recently sent out Survey vote mechanism for spam analysis on the low voted statements could highlight where things are untidy and implement vote casting to clean them up.

"...the burden of proof has to be placed on authority, and it should be dismantled if that burden cannot be met..."

-Thad +ThadGuidry https://plus.google.com/+ThadGuidry

On Sat, Sep 29, 2018 at 2:49 AM Ettore RIZZA ettorerizza@gmail.com wrote:

...
Hi,

The Wikidata's ontology is a mess, and I do not see how it could be otherwise. While the creation of new properties is controlled, any fool can decide that a woman https://www.wikidata.org/wiki/Q467is no longer a human or is part of family. Maybe I'm a fool too? I wanted to remove the claim that a ship https://www.wikidata.org/wiki/Q11446 is an instance of "ship type" because it produces weird circular inferences in my application; but maybe that makes sense to someone else.

There will never be a universal ontology on which everyone agrees. I wonder (sorry to think aloud) if Wikidata should not rather facilitate the use of external classifications. Many external ids are knowledge organization systems (ontologies, thesauri, classifications ...) I dream of a simple query that could search, in Wikidata, "all elements of the same class as 'poodle' according to the classification of imagenet http://imagenet.stanford.edu/synset?wnid=n02113335.

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Gerard Meijssen

7:25 p.m.

Hoi, There is also the age old conundrum where some want to enforce their rules for the good all all because (argument of the day follows).

First of all, Wikidata is very much a child of Wikipedia. It has its own structures and people have endeavoured to build those same structures in Wikidata never mind that it is a very different medium and never mind that there are 280+ Wikipedias that might consider things to be different. The start of Wikidata was also an auspicious occasion where it was thought to be OK to adopt an external German authority. That proved to be a disaster and there are still residues of this awful decision. It took not long to show the short comings of this schedule and it was replaced by something more sensible.

However, we got something really Wiki and it was all too wild. It took not long for me to ask for someone to explain the current structures and nobody volunteered. So I did what I do best, I largely ignored the results of the classes and subclasses. It does not work for me. It works against me so me current strategy is to ignore this nonsense and concentrate on including data. The reason is simple; once data is included, it is easy to slice it and dice it.structure it as we see fit at a later date.

So when our priority becomes to make our data reusable, more open we should agree on it. So far we have not because we choose to fight each other. Some have ideas, some have invested too much in what we have at this time. When we are to make our data reusable, we should agree on what it is exactly we aim to achieve. Is it to support Commons, it is to support some external standard that is academically sound. I would always favour what is practical and easily measured.

I would support Commons first. It has the benefit that it will bring our communities together in a clear objective. It has the benefit that changes in the operations of Wikidata support the whole of the Wikimedia universe and consequentially financial, technical and operational needs and investments are easily understood. It also means that all the bureaucracy that has materialised will show to be in the way when it is.

So my question is not if we are a Wiki, my question is are we a Wiki enough and willing to change our way for our own good. Thanks, GerardM

On Sat, 29 Sep 2018 at 16:38, Thad Guidry thadguidry@gmail.com wrote:

...

Ettore,

Wikidata has the ability of crowdsourcing...unfortunately, it is not effectively utilized.

Its because Wikidata does not yet provide a voting feature on statements...where as the vote gets higher...more resistance to change the statement is required. But that breaks the notion of a "wiki" for some folks. And there we circle back to Gerard's age old question of ... should Wikidata really be considered a wiki at all for the benefit of society ? or should it apply voting/resistance to keep it tidy, factual and less messy.

We have the technology to implement voting/resistance on statements. I personally would utilize that feature and many others probably would as well. Crowdsourcing the low voted facts back to applications like OpenRefine, or the recently sent out Survey vote mechanism for spam analysis on the low voted statements could highlight where things are untidy and implement vote casting to clean them up.

"...the burden of proof has to be placed on authority, and it should be dismantled if that burden cannot be met..."

-Thad +ThadGuidry https://plus.google.com/+ThadGuidry

On Sat, Sep 29, 2018 at 2:49 AM Ettore RIZZA ettorerizza@gmail.com wrote:

...
Hi,

The Wikidata's ontology is a mess, and I do not see how it could be otherwise. While the creation of new properties is controlled, any fool can decide that a woman https://www.wikidata.org/wiki/Q467is no longer a human or is part of family. Maybe I'm a fool too? I wanted to remove the claim that a ship https://www.wikidata.org/wiki/Q11446 is an instance of "ship type" because it produces weird circular inferences in my application; but maybe that makes sense to someone else.

There will never be a universal ontology on which everyone agrees. I wonder (sorry to think aloud) if Wikidata should not rather facilitate the use of external classifications. Many external ids are knowledge organization systems (ontologies, thesauri, classifications ...) I dream of a simple query that could search, in Wikidata, "all elements of the same class as 'poodle' according to the classification of imagenet http://imagenet.stanford.edu/synset?wnid=n02113335.

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Daniel Kinzler

17 Oct 17 Oct

4:04 p.m.

My (very belated) thoughts on this issue:

Wiki content grows in a messy way, and it stays messy until the messiness causes problems. Once it causes problems, people are motivated to clean it up.

I propose to implement hierarchical search based on very simple, predictable rules, e.g. by having a configurable list of transitive relationships that get evaluated to a certain depth. I'd go for subclasses, geographical inclusion, and subspecies at first.

Doing this will NOT produce good results. You would have to implement a lot of special cases and heuristics to work around dirty data. I say: let it produce bad results, tell people why the results are bad, and what they can do about it!

The Wikimedia community is AMAZING at making good use of whatever capabilities the software, and adapting content to make the software produce the results they want. By providing limited but clearly defined software support for hierarchical search, we allow the community to optimize the content to work with that search. Keeping the rules simple means that other consumers can then follow the same rules, and the content will work for them as well.

-- daniel

Am 29.09.2018 um 19:25 schrieb Gerard Meijssen:

...

Hoi, There is also the age old conundrum where some want to enforce their rules for the good all all because (argument of the day follows).

First of all, Wikidata is very much a child of Wikipedia. It has its own structures and people have endeavoured to build those same structures in Wikidata never mind that it is a very different medium and never mind that there are 280+ Wikipedias that might consider things to be different. The start of Wikidata was also an auspicious occasion where it was thought to be OK to adopt an external German authority. That proved to be a disaster and there are still residues of this awful decision. It took not long to show the short comings of this schedule and it was replaced by something more sensible.

However, we got something really Wiki and it was all too wild. It took not long for me to ask for someone to explain the current structures and nobody volunteered. So I did what I do best, I largely ignored the results of the classes and subclasses. It does not work for me. It works against me so me current strategy is to ignore this nonsense and concentrate on including data. The reason is simple; once data is included, it is easy to slice it and dice it.structure it as we see fit at a later date.

So when our priority becomes to make our data reusable, more open we should agree on it. So far we have not because we choose to fight each other. Some have ideas, some have invested too much in what we have at this time. When we are to make our data reusable, we should agree on what it is exactly we aim to achieve. Is it to support Commons, it is to support some external standard that is academically sound. I would always favour what is practical and easily measured.

I would support Commons first. It has the benefit that it will bring our communities together in a clear objective. It has the benefit that changes in the operations of Wikidata support the whole of the Wikimedia universe and consequentially financial, technical and operational needs and investments are easily understood. It also means that all the bureaucracy that has materialised will show to be in the way when it is.

So my question is not if we are a Wiki, my question is are we a Wiki enough and willing to change our way for our own good. Thanks, GerardM

On Sat, 29 Sep 2018 at 16:38, Thad Guidry <thadguidry@gmail.com mailto:thadguidry@gmail.com> wrote:
Ettore,

Wikidata has the ability of crowdsourcing...unfortunately, it is not
effectively utilized.

Its because Wikidata does not yet provide a voting feature on
statements...where as the vote gets higher...more resistance to change the
statement is required.
But that breaks the notion of a "wiki" for some folks.
And there we circle back to Gerard's age old question of ... should Wikidata
really be considered a wiki at all for the benefit of society ?  or should
it apply voting/resistance to keep it tidy, factual and less messy.

We have the technology to implement voting/resistance on statements.  I
personally would utilize that feature and many others probably would as
well.  Crowdsourcing the low voted facts back to applications like
OpenRefine, or the recently sent out Survey vote mechanism for spam analysis
on the low voted statements could highlight where things are untidy and
implement vote casting to clean them up.

"...the burden of proof has to be placed on authority, and it should be
dismantled if that burden cannot be met..."

-Thad
+ThadGuidry <https://plus.google.com/+ThadGuidry>


On Sat, Sep 29, 2018 at 2:49 AM Ettore RIZZA <ettorerizza@gmail.com
<mailto:ettorerizza@gmail.com>> wrote:

    Hi,

    The Wikidata's ontology is a mess, and I do not see how it could be
    otherwise. While the creation of new properties is controlled, any fool
    can decide that a woman <https://www.wikidata.org/wiki/Q467>is no longer
    a human or is part of family. Maybe I'm a fool too? I wanted to remove
    the claim that a ship <https://www.wikidata.org/wiki/Q11446> is an
    instance of "ship type" because it produces weird circular inferences in
    my application; but maybe that makes sense to someone else.

    There will never be a universal ontology on which everyone agrees. I
    wonder (sorry to think aloud) if Wikidata should not rather facilitate
    the use of external classifications. Many external ids are knowledge
    organization systems (ontologies, thesauri, classifications ...) I dream
    of a simple query that could search, in Wikidata, "all elements of the
    same class as 'poodle' according to the classification of imagenet
    <http://imagenet.stanford.edu/synset?wnid=n02113335>.

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

-- Daniel Kinzler Principal Software Engineer, Core Platform Wikimedia Foundation

Luca Martinelli

18 Oct 18 Oct

3:11 p.m.

Il mer 17 ott 2018, 16:04 Daniel Kinzler dkinzler@wikimedia.org ha scritto:

...

I say: let it produce bad results, tell people why the results are bad, and what they can do about it!

TL;DR: let's produce bad results, and let's analyse those results to find the best practical solution we can come up with.

I totally agree with Daniel here. It is definitely a red flag that we should tackle head-first, but we need data first. We need to know *where* ontology fails, *why* it fails, and *how* can we fix it.

Now it's probably the best time to talk about this, not just because we have a potential big application such as Structured Data, but also because we focused on other not-so-easy problems such as dealing with isolated sitelinks/projects and try to establish relations between items, and between items and other databases.

What we need to do IMHO is to find whatever best practical solution we have at hand, in order to primarily use it on Wikimedia projects. My only fear is that such discussions may end up in a swamp because of "that one user" who doesn't want to apply that particular solution (not accusing anyone in particular, I've been that user too in some discussions). Anyway, if we start from data, we can come up with some solution.

Peter F. Patel-Schneider

7:05 p.m.

On 10/17/18 7:04 AM, Daniel Kinzler wrote:

...

My (very belated) thoughts on this issue:

[...]

...

I say: let it produce> bad results, tell people why the results are bad, and

what they can do about it! [...]

...

-- daniel

My view is that there is a big problem with this for industrial use of Wikidata.

I would very much like to use Wikidata more in my company. However, I view it as my duty in my company to point out problems with the use of any technology. So whenever I talk about Wikidata I also have to talk about the problems I see in the Wikidata ontology and how they will affect use of Wikidata in my company.

If Wikidata is going to have significant use in my company there needs to be at least some indication that the problems in Wikidata are being addressed. I don't see that happening at the moment.

What is the biggest problem I see in Wikidata? It is the poor organization of the Wikidata ontology. To fix the ontology, beyond doing point fixes, is going to require some commitment from the Wikidata community.

Peter F. Patel-Schneider Nuance Communications

Daniel Kinzler

8:13 p.m.

Am 18.10.2018 um 19:05 schrieb Peter F. Patel-Schneider:

...

On 10/17/18 7:04 AM, Daniel Kinzler wrote:

...
My (very belated) thoughts on this issue:

[...]

...
I say: let it produce> bad results, tell people why the results are bad, and

what they can do about it! [...]

...
-- daniel

My view is that there is a big problem with this for industrial use of Wikidata.

[...]

...

What is the biggest problem I see in Wikidata? It is the poor organization of the Wikidata ontology. To fix the ontology, beyond doing point fixes, is going to require some commitment from the Wikidata community.

I agree. And I think the best way to achieve this is to start using the ontology as an ontology on wikimedia projects, and thus expose the fact that the ontology is broken. This gives incentive to fix it, and examples as to what things should be possible using that ontology (namely, some level of basic inference).

-- Daniel Kinzler Principal Software Engineer, MediaWiki Platform Wikimedia Foundation

Markus Kroetzsch

11:33 p.m.

+1 to Daniel

And, on another note, there is also a huge misunderstanding exposed in the discussion on th search-related tracker item [1]: Cparle there speaks about "traversing the subclass hierarchy" but is actually looking at *super*classes of, e.g., "Clarinet", which he mostly finds irrelevant to users who care about clarinets. But surely that's the wrong direction! You have to look for *sub*classes to find special cases of what you are looking for. Looking downwards will often lead to much saner ontologies than when turning your head towards the dizzy heights of upper ontology. Yes, the few of us looking for instances of "logical consequence" will still get clarinets, but those who look for instances of clarinet merely will see instances of alto clarinet, piccolo clarinet, basset horn, Saxonette, and so on [2]. So instead of trying to suggest to Commons editors meaningful "upper concepts", one could simply enable the use of lower concepts in search. It does not work in all cases yet, but it many.

There are still problems (such as the biological taxonomy being modelled as a hierarchy of names rather than animal classes, placing dog far away from mammal), but it is still always much easier to come up with a sane organisation for the *sub*classes of a concrete class.

FYI, I recently gave a talk about ontological modelling in Wikidata that discussed some of the current issues: https://iccl.inf.tu-dresden.de/web/Misc3058/en (audience were ontology design pattern researchers there).

Cheers,

Markus

[1] https://phabricator.wikimedia.org/T199119 [2] http://tinyurl.com/y7tvkuzk

On 17/10/2018 16:04, Daniel Kinzler wrote:

...

My (very belated) thoughts on this issue:

Wiki content grows in a messy way, and it stays messy until the messiness causes problems. Once it causes problems, people are motivated to clean it up.

I propose to implement hierarchical search based on very simple, predictable rules, e.g. by having a configurable list of transitive relationships that get evaluated to a certain depth. I'd go for subclasses, geographical inclusion, and subspecies at first.

Doing this will NOT produce good results. You would have to implement a lot of special cases and heuristics to work around dirty data. I say: let it produce bad results, tell people why the results are bad, and what they can do about it!

The Wikimedia community is AMAZING at making good use of whatever capabilities the software, and adapting content to make the software produce the results they want. By providing limited but clearly defined software support for hierarchical search, we allow the community to optimize the content to work with that search. Keeping the rules simple means that other consumers can then follow the same rules, and the content will work for them as well.

-- daniel

Am 29.09.2018 um 19:25 schrieb Gerard Meijssen:

...
Hoi, There is also the age old conundrum where some want to enforce their rules for the good all all because (argument of the day follows).

First of all, Wikidata is very much a child of Wikipedia. It has its own structures and people have endeavoured to build those same structures in Wikidata never mind that it is a very different medium and never mind that there are 280+ Wikipedias that might consider things to be different. The start of Wikidata was also an auspicious occasion where it was thought to be OK to adopt an external German authority. That proved to be a disaster and there are still residues of this awful decision. It took not long to show the short comings of this schedule and it was replaced by something more sensible.

However, we got something really Wiki and it was all too wild. It took not long for me to ask for someone to explain the current structures and nobody volunteered. So I did what I do best, I largely ignored the results of the classes and subclasses. It does not work for me. It works against me so me current strategy is to ignore this nonsense and concentrate on including data. The reason is simple; once data is included, it is easy to slice it and dice it.structure it as we see fit at a later date.

So when our priority becomes to make our data reusable, more open we should agree on it. So far we have not because we choose to fight each other. Some have ideas, some have invested too much in what we have at this time. When we are to make our data reusable, we should agree on what it is exactly we aim to achieve. Is it to support Commons, it is to support some external standard that is academically sound. I would always favour what is practical and easily measured.

I would support Commons first. It has the benefit that it will bring our communities together in a clear objective. It has the benefit that changes in the operations of Wikidata support the whole of the Wikimedia universe and consequentially financial, technical and operational needs and investments are easily understood. It also means that all the bureaucracy that has materialised will show to be in the way when it is.

So my question is not if we are a Wiki, my question is are we a Wiki enough and willing to change our way for our own good. Thanks, GerardM

On Sat, 29 Sep 2018 at 16:38, Thad Guidry <thadguidry@gmail.com mailto:thadguidry@gmail.com> wrote:
 Ettore,

 Wikidata has the ability of crowdsourcing...unfortunately, it is not
 effectively utilized.

 Its because Wikidata does not yet provide a voting feature on
 statements...where as the vote gets higher...more resistance to change the
 statement is required.
 But that breaks the notion of a "wiki" for some folks.
 And there we circle back to Gerard's age old question of ... should Wikidata
 really be considered a wiki at all for the benefit of society ?  or should
 it apply voting/resistance to keep it tidy, factual and less messy.

 We have the technology to implement voting/resistance on statements.  I
 personally would utilize that feature and many others probably would as
 well.  Crowdsourcing the low voted facts back to applications like
 OpenRefine, or the recently sent out Survey vote mechanism for spam analysis
 on the low voted statements could highlight where things are untidy and
 implement vote casting to clean them up.

 "...the burden of proof has to be placed on authority, and it should be
 dismantled if that burden cannot be met..."

 -Thad
 +ThadGuidry <https://plus.google.com/+ThadGuidry>


 On Sat, Sep 29, 2018 at 2:49 AM Ettore RIZZA <ettorerizza@gmail.com
 <mailto:ettorerizza@gmail.com>> wrote:

     Hi,

     The Wikidata's ontology is a mess, and I do not see how it could be
     otherwise. While the creation of new properties is controlled, any fool
     can decide that a woman <https://www.wikidata.org/wiki/Q467>is no longer
     a human or is part of family. Maybe I'm a fool too? I wanted to remove
     the claim that a ship <https://www.wikidata.org/wiki/Q11446> is an
     instance of "ship type" because it produces weird circular inferences in
     my application; but maybe that makes sense to someone else.

     There will never be a universal ontology on which everyone agrees. I
     wonder (sorry to think aloud) if Wikidata should not rather facilitate
     the use of external classifications. Many external ids are knowledge
     organization systems (ontologies, thesauri, classifications ...) I dream
     of a simple query that could search, in Wikidata, "all elements of the
     same class as 'poodle' according to the classification of imagenet
     <http://imagenet.stanford.edu/synset?wnid=n02113335>.

 _______________________________________________
 Wikidata mailing list
 Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
 https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

James Heald

19 Oct 19 Oct

1:09 a.m.

On 18/10/2018 22:33, Markus Kroetzsch wrote:

...

And, on another note, there is also a huge misunderstanding exposed in the discussion on th search-related tracker item [1]: Cparle there speaks about "traversing the subclass hierarchy" but is actually looking at *super*classes of, e.g., "Clarinet", which he mostly finds irrelevant to users who care about clarinets. But surely that's the wrong direction! You have to look for *sub*classes to find special cases of what you are looking for. Looking downwards will often lead to much saner ontologies than when turning your head towards the dizzy heights of upper ontology. Yes, the few of us looking for instances of "logical consequence" will still get clarinets, but those who look for instances of clarinet merely will see instances of alto clarinet, piccolo clarinet, basset horn, Saxonette, and so on [2]. So instead of trying to suggest to Commons editors meaningful "upper concepts", one could simply enable the use of lower concepts in search. It does not work in all cases yet, but it many.

Not really.

Cparle wants to make sure that people searching for "clarinet" also get shown images of "piccolo clarinet" etc.

To make this possible, where an image has been tagged "basset horn" he is therefore looking to add "clarinet" as an additional keyword, so that if somebody types "clarinet" into the search box, one of the images retrieved by ElasticSearch will be the basset horn one.

I imagine there are pluses and minuses both ways, whether you try to make sure one search returns more hits, or try to run multiple searches each returning fewer hits.

Your suggestion of the latter approach may not involve so much pre-investigation of the top of the tree, which may be terms that people are less likely to search for; but on the other hand, the actual searching may be less efficient than a single indexed search.

...

There are still problems (such as the biological taxonomy being modelled as a hierarchy of names rather than animal classes, placing dog far away from mammal), but it is still always much easier to come up with a sane organisation for the *sub*classes of a concrete class.

For what it's worth, there's currently quite a lively discussion on Project Chat about issues with the current modelling of biological taxonomies, https://www.wikidata.org/wiki/Wikidata:Project_chat#Taxonomy:_concept_centri...

People on this thread might like to comment on some of the less fortunate elements of current practice, and the appropriateness of some of the thoughts that have been suggested.

But the taxo project has become such a walled garden, answerable only to itself, that people with comments may need to be quite forceful to get their message through, if we are to deal eg with some of the difficulties Cparle describes in the ticket at https://phabricator.wikimedia.org/T199119

-- James.

--- This email has been checked for viruses by AVG. https://www.avg.com

Markus Kroetzsch

11:40 a.m.

Hi James,

On 19/10/2018 01:09, James Heald wrote:

...

On 18/10/2018 22:33, Markus Kroetzsch wrote:

...
And, on another note, there is also a huge misunderstanding exposed in the discussion on th search-related tracker item [1]: Cparle there speaks about "traversing the subclass hierarchy" but is actually looking at *super*classes of, e.g., "Clarinet", which he mostly finds irrelevant to users who care about clarinets. But surely that's the wrong direction! You have to look for *sub*classes to find special cases of what you are looking for. Looking downwards will often lead to much saner ontologies than when turning your head towards the dizzy heights of upper ontology. Yes, the few of us looking for instances of "logical consequence" will still get clarinets, but those who look for instances of clarinet merely will see instances of alto clarinet, piccolo clarinet, basset horn, Saxonette, and so on [2]. So instead of trying to suggest to Commons editors meaningful "upper concepts", one could simply enable the use of lower concepts in search. It does not work in all cases yet, but it many.

Not really.

Cparle wants to make sure that people searching for "clarinet" also get shown images of "piccolo clarinet" etc.

To make this possible, where an image has been tagged "basset horn" he is therefore looking to add "clarinet" as an additional keyword, so that if somebody types "clarinet" into the search box, one of the images retrieved by ElasticSearch will be the basset horn one.

I imagine there are pluses and minuses both ways, whether you try to make sure one search returns more hits, or try to run multiple searches each returning fewer hits.

Your suggestion of the latter approach may not involve so much pre-investigation of the top of the tree, which may be terms that people are less likely to search for; but on the other hand, the actual searching may be less efficient than a single indexed search.

True, but with the Wikidata Query Service we already have infrastructure that completes millions of search requests of this kind (involving path queries), so that seems doable for Commons as well. WDQS already has Wikimedia API bindings that allow it to use Lucene-based results in addition, if needed (though this would only make sense if the search should use some content that for some reason cannot be imported into a query service as graph data, mostly free-text search over longer texts).

I think the approach of completing tags towards the upper classes is not a good idea in general, since it creates extra work for editors that requires a million times the resources needed in the other approach: if the subclass hierarchy is wrong, you only need to fix it once to improve search for all existing Commons content; if you rely on manual extra tags, you'd have to add them to every file on Commons and keep it up-to-date with changes in the concepts -- an enormous, redundant effort that will invariably lead to a very non-uniform search experience across otherwise similar media. This seems like a huge waste of editors' time even if it would work (i.e., if we would live in a world where the superclasses of a class would be easy to understand and closely related to the topic that an editor is working on -- which will never happen for Wikidata or Commons, since both cover such a breadth of topics that their upper ontology necessarily has to be very general even if modelled in a clean and fully correct way).

Cheers,

Markus

...

...
There are still problems (such as the biological taxonomy being modelled as a hierarchy of names rather than animal classes, placing dog far away from mammal), but it is still always much easier to come up with a sane organisation for the *sub*classes of a concrete class.

For what it's worth, there's currently quite a lively discussion on Project Chat about issues with the current modelling of biological taxonomies, https://www.wikidata.org/wiki/Wikidata:Project_chat#Taxonomy:_concept_centri...

People on this thread might like to comment on some of the less fortunate elements of current practice, and the appropriateness of some of the thoughts that have been suggested.

But the taxo project has become such a walled garden, answerable only to itself, that people with comments may need to be quite forceful to get their message through, if we are to deal eg with some of the difficulties Cparle describes in the ticket at https://phabricator.wikimedia.org/T199119

-- James.

This email has been checked for viruses by AVG. https://www.avg.com

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Luca Martinelli

12:32 p.m.

Il giorno ven 19 ott 2018 alle ore 01:09 James Heald jpm.heald@gmail.com ha scritto:

...

But the taxo project has become such a walled garden, answerable only to itself, that people with comments may need to be quite forceful to get their message through, if we are to deal eg with some of the difficulties Cparle describes in the ticket [...]

Me and other admins are unfortunately aware of this and this is exactly what I was referring to in my previous e-mail. I do agree with you the situation there is frankly unbearable, and IMHO it will likely be ended also through "removals" of some users who think they should be the only one in charge of deciding what's good and what's not. You might easily understand why this situation deteriorated like this, but I acknowledge this is no excuse for it to continue.

Markus Kroetzsch

12:55 p.m.

On 19/10/2018 12:32, Luca Martinelli wrote:

...

Il giorno ven 19 ott 2018 alle ore 01:09 James Heald jpm.heald@gmail.com ha scritto:

...
But the taxo project has become such a walled garden, answerable only to itself, that people with comments may need to be quite forceful to get their message through, if we are to deal eg with some of the difficulties Cparle describes in the ticket [...]

Me and other admins are unfortunately aware of this and this is exactly what I was referring to in my previous e-mail. I do agree with you the situation there is frankly unbearable, and IMHO it will likely be ended also through "removals" of some users who think they should be the only one in charge of deciding what's good and what's not. You might easily understand why this situation deteriorated like this, but I acknowledge this is no excuse for it to continue.

Re this tricky situation, it might be good that the taxonomy part of Wikidata avoid the use of "subclass of" altogether. Doesn't this open up a path for compromise? Wikidata could intentionally "overload" taxons to also refer to sets of organisms (in some cases). The taxonomic model would not be affected by this in any way, since it ignores "subclass of". Some (historic or debated) taxons could be ignored for this "colloquial" subclass hierarchy, while other merely colloquially defined classes of animals could be put in relation to proper species. I think such overloading is acceptable as long as there cannot be confusion between which statement refers to which facet of the concept. Then no use of either facet will be impaired by the presence of the "irrelevant" extra data.

The only alternative seems to build a "mirror taxonomy" that consists not of taxon names but of animal classes (and that would include "dog" somewhere in its hierarchy [1]). But then we will need a community-wide decision on which of the two (class of organisms vs. scientific name) is the subject of actual Wikipedia articles, which might be a difficult topic to discuss.

Alternatively, if the taxons are mostly considered as "names" (syntax) rather than classes of individual organism, then it seems we are actually building a kind of scientific dictionary here that might rather belong into the lexeme space.

Whatever happens, this problem needs some solution.

Cheers,

Markus

[1] It seems that the strange position of "dog" is mostly due to the fact that two taxons are associated with it. In general, this seems an important issue (many common names are not clearly specifying a taxon), but in the case of dog it seems that the two taxons are synonyms of one another, i.e., the taxon for dog simply changed names over time.

Stas Malyshev

20 Oct 20 Oct

12:41 a.m.

Hi!

...

Cparle wants to make sure that people searching for "clarinet" also get shown images of "piccolo clarinet" etc.

To make this possible, where an image has been tagged "basset horn" he is therefore looking to add "clarinet" as an additional keyword, so that if somebody types "clarinet" into the search box, one of the images retrieved by ElasticSearch will be the basset horn one.

Generally if the image is tagged with "basset horn" and the user query is "clarinet", we can do one of the following:

1. Index all upstream-hierarchy for "basset horn" (presumably we would have to cut off when it gets too deep or too abstract) and then match directly when searching.

2. Expand hierarchy down-stream from "clarinet" and then match against search index.

3. Have some manual or automatic process that ensures that both "clarinet" and "basset horn" are indexed (not necessarily at once) and rely on it to discover the matches.

The problem with (1) is that if hierarchy changes, we will have to do huge number of updates which might overwhelm the system, and most of these updates would be not even for things people search for, but we have no way to know that.

The problem with (2) is that downstream hierarchies explode very fast, and if you search for "clarinet" and there are 10000 descendants in these hierarchies, we can't search for all of them, so you may never get a chance to find the basset horn. Also, of course, querying big downstream hierarchies takes time too, which means performance hit.

-- Stas Malyshev smalyshev@wikimedia.org

Markus Kroetzsch

2:21 a.m.

On 20/10/2018 00:41, Stas Malyshev wrote:

...

Hi!

...
Cparle wants to make sure that people searching for "clarinet" also get shown images of "piccolo clarinet" etc.

To make this possible, where an image has been tagged "basset horn" he is therefore looking to add "clarinet" as an additional keyword, so that if somebody types "clarinet" into the search box, one of the images retrieved by ElasticSearch will be the basset horn one.

Generally if the image is tagged with "basset horn" and the user query is "clarinet", we can do one of the following:

Index all upstream-hierarchy for "basset horn" (presumably we would

have to cut off when it gets too deep or too abstract) and then match directly when searching.

Expand hierarchy down-stream from "clarinet" and then match against

search index.

Have some manual or automatic process that ensures that both

"clarinet" and "basset horn" are indexed (not necessarily at once) and rely on it to discover the matches.

The problem with (1) is that if hierarchy changes, we will have to do huge number of updates which might overwhelm the system, and most of these updates would be not even for things people search for, but we have no way to know that.

The problem with (2) is that downstream hierarchies explode very fast, and if you search for "clarinet" and there are 10000 descendants in these hierarchies, we can't search for all of them, so you may never get a chance to find the basset horn. Also, of course, querying big downstream hierarchies takes time too, which means performance hit.

Is this such a problem? It is what people now commonly do with P31/P279* queries. For example, finding 10K instances of (some subclass of) building takes 9 secs: http://tinyurl.com/y7e5j5sd (I think this is one of the more complex hierarchies; maybe you know larger downstream hierarchies one could try?) If you omit the labels, it takes 650ms. That's maybe not quite autocompletion speed yet, but seems acceptable for a media search.

Cheers,

Markus

...

Ettore RIZZA

3:29 p.m.

Hello,

It is interesting to note that what Cparle wants are "is a" relationships based on common sense. For most people, ants are insects, not instances of taxon. A clarinet is a woodwind instrument, and woodwind instruments are musical instruments, not an instance of "first order metaclass".

One of the best sources of "common sense" hypernymy is probably the first sentence of a Wikipedia page. Whether in English, French, Italian, a woman is always "a female *human *being."

For "poodle", this would look like (following the links in the English version of Wikipedia):

- The poodle is a group of formal *dog breeds*

- Dog breeds are *dogs* that...

- The domestic dog (...) is a member of the genus *Canis* (canines)

- Canis is a genus of the *Canidae*

- The biological family Canidae (...) is a lineage of *carnivorans*

- Carnivora (...) is a diverse *scrotiferan *order

- Scrotifera is a clade of *placental mammals*

- Placentalia ("Placentals") is one of the three extant subdivisions of the class of animals *Mammalia*...

- Mammals are the *vertebrates *within the class Mammalia...

...

From my point of view, this classification looks much better than the

current relationships in Wikidata's ontology.

The automatic extraction of hypernymic relationships from English texts (especially Wikipedia) has been studied for a long time and gives good results, even with simple methods based on hand-crafted rules. In the case of Wikipedia, the hypernym often has a page itself (and therefore a link to Wikidata), which could simplify the NLP extraction and the mapping with Wikidata items.

Of course, the extracted relationships will not always be "subclass of" or "instance of". But if someone proposed a new property called "Wikipedia Hypernyms" (and its symmetric property "Wikipedia Hyponyms"), I would use it more willingly and with more confidence than the current system. This would also better respect the logic of Wikidata's descriptions.

I mean, if the description of Zoroastrianism (Q9601) says this is an "Ancient Iranian *religion *founded by Zoroaster", one would expect the class "religion" to appear much earlier in the hierarchy of superclasses of this item. If there was this property "Wikipedia Hypernyms", we could mention it in the same page - since Wikipedia describes Zoroastrianism as "one of the world's oldest *religions *that remains active." And a SPARQL query looking for 'all items that have "religion" as "Wikipedia hypernyms" property' would be much much faster.

Note: sorry if this reflection is naive or if it has already been discussed/tested.

Cheers,

Ettore

On Thu, 27 Sep 2018 at 23:35, James Heald jpm.heald@gmail.com wrote:

...

This recent announcement by the Structured Data team perhaps ought to be quite a heads-up for us:

https://commons.wikimedia.org/wiki/Commons_talk:Structured_data#Searching_Co...

Essentially the team has given up on the hope of using Wikidata hierarchies to suggest generalised "depicts" values to store for images on Commons, to match against terms in incoming search requests.

i.e. if an image is of a German Shepherd dog, and identified as such, the team has given up on trying to infer in general from Wikidata that 'dog' is also a search term that such an image should score positively with.

Apparently the Wikidata hierarchies were simply too complicated, too unpredictable, and too arbitrary and inconsistent in their design across different subject areas to be readily assimilated (before one even starts on the density of bugs and glitches that then undermine them).

Instead, if that image ought to be considered in a search for 'dog', it looks as though an explicit 'depicts:dog' statement may be going to be needed to be specifically present, in addition to 'depicts:German Shepherd'.

Some of the background behind this assessment can be read in https://phabricator.wikimedia.org/T199119 in particular the first substantive comment on that ticket, by Cparle on 10 July, giving his quick initial read of some of the issues using Wikidata would face.

SDC was considered a flagship end-application for Wikidata. If the data in Wikidata is not usable enough to supply the dogfood that project was expected to be going to be relying on, that should be a serious wake-up call, a red flag we should not ignore.

If the way data is organised across different subjects is currently too inconsistent and confusing to be usable by our own SDC project, are there actions we can take to address that? Are there design principles to be chosen that then need to be applied consistently? Is this something the community can do, or is some more active direction going to need to be applied?

Wikidata's 'ontology' has grown haphazardly, with little oversight, like an untended bank of weeds. Is some more active gardening now required?

-- James.

This email has been checked for viruses by AVG. https://www.avg.com

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Peter F. Patel-Schneider

7:20 p.m.

On 10/20/18 6:29 AM, Ettore RIZZA wrote:

...

For most people, ants are insects, not instances of taxon.

Sure, but Wikidata doesn't have ants being instances of taxon. Instead, Formicidae (aka ant) is an instance of taxon, which seems right to me.

Here are some extracts from Wikidata as of a few minutes ago, also showing the English Wikipedia page for the Wikidata item.

https://www.wikidata.org/wiki/Q7386 Formicidae ant https://en.wikipedia.org/wiki/Ant instance of taxon no subclass of statement

https://www.wikidata.org/wiki/Q1390 insect https://en.wikipedia.org/wiki/Insect subclass of animal instance of taxon

What is missing is that Q7386 is a subclass of Q1390, which is sanctioned by the "Ants are eusocial insects" phrase at the start of https://en.wikipedia.org/wiki/Ant. I added that statement and put as source English Wikipedia. (By the way, how can I source a statement to a particular Wikipedia page?)

I see no reason that this should not be done for other groups of living organisms where subclass relationships are missing.

peter

Ettore RIZZA

8:57 p.m.

Hi,

I see no reason that this should not be done for other groups of living

...

organisms where subclass relationships are missing.

It seems very simple to me. Maybe too simple. Perhaps I am intimidated by the kilometers of discussions I'm reading about the taxon-centric aspect of Wikidata, when I'm not a biologist. So, there is no problem if we add that Cetacea https://www.wikidata.org/wiki/Q160is a subclass of aquatic mammals https://www.wikidata.org/wiki/Q3039055, as indicated by its Wikipedia page https://en.wikipedia.org/wiki/Cetacea?

Cheers,

Ettore

On Sat, 20 Oct 2018 at 19:20, Peter F. Patel-Schneider < pfpschneider@gmail.com> wrote:

...

On 10/20/18 6:29 AM, Ettore RIZZA wrote:

...
For most people, ants are insects, not instances of taxon.

Sure, but Wikidata doesn't have ants being instances of taxon. Instead, Formicidae (aka ant) is an instance of taxon, which seems right to me.

Here are some extracts from Wikidata as of a few minutes ago, also showing the English Wikipedia page for the Wikidata item.

https://www.wikidata.org/wiki/Q7386 Formicidae ant https://en.wikipedia.org/wiki/Ant instance of taxon no subclass of statement

https://www.wikidata.org/wiki/Q1390 insect https://en.wikipedia.org/wiki/Insect subclass of animal instance of taxon

What is missing is that Q7386 is a subclass of Q1390, which is sanctioned by the "Ants are eusocial insects" phrase at the start of https://en.wikipedia.org/wiki/Ant. I added that statement and put as source English Wikipedia. (By the way, how can I source a statement to a particular Wikipedia page?)

I see no reason that this should not be done for other groups of living organisms where subclass relationships are missing.

peter

Peter F. Patel-Schneider

10:02 p.m.

On 10/20/18 11:57 AM, Ettore RIZZA wrote:

...

From Peter F. Patel-Schneider Hi,
I see no reason that this [adding subclass relationships sanctioned by corresponding Wikipedia pages]
should not be done for other groups of living organisms where subclass relationships are missing.

It seems very simple to me. Maybe too simple. Perhaps I am intimidated by the kilometers of discussions I'm reading about the taxon-centric aspect of Wikidata, when I'm not a biologist. So, there is no problem if we add that Cetacea https://www.wikidata.org/wiki/Q160is a subclass of aquatic mammals https://www.wikidata.org/wiki/Q3039055, as indicated by its Wikipedia page https://en.wikipedia.org/wiki/Cetacea?

Cheers,

Ettore

How can there be any effective counter to adding these relationships? Many Wikidata items correspond to Wikipedia pages. If the true information about the Wikidata item in the corresponding pages cannot be added to the Wikidata items, then the correspondence is not correct and should be removed.

peter

PS: Of course, determining truth may be contentious in some cases, but these will be a small minority.

2230

Age (days ago)

2253

Last active (days ago)

wikidata@lists.wikimedia.org

21 comments

9 participants

tags (0)

participants (9)

Daniel Kinzler
Ettore RIZZA
Gerard Meijssen
James Heald
Luca Martinelli
Markus Kroetzsch
Peter F. Patel-Schneider
Stas Malyshev
Thad Guidry