Re: [Wikidata] Wikidata considered unable to support hierarchical search in Structured Data for Commons

18 Oct 2018

+1 to Daniel

And, on another note, there is also a huge misunderstanding exposed in 
the discussion on th search-related tracker item [1]: Cparle there 
speaks about "traversing the subclass hierarchy" but is actually looking 
at *super*classes of, e.g., "Clarinet", which he mostly finds irrelevant 
to users who care about clarinets. But surely that's the wrong 
direction! You have to look for *sub*classes to find special cases of 
what you are looking for. Looking downwards will often lead to much 
saner ontologies than when turning your head towards the dizzy heights 
of upper ontology. Yes, the few of us looking for instances of "logical 
consequence" will still get clarinets, but those who look for instances 
of clarinet merely will see instances of alto clarinet, piccolo 
clarinet, basset horn, Saxonette, and so on [2]. So instead of trying to 
suggest to Commons editors meaningful "upper concepts", one could simply 
enable the use of lower concepts in search. It does not work in all 
cases yet, but it many.

There are still problems (such as the biological taxonomy being modelled 
as a hierarchy of names rather than animal classes, placing dog far away 
from mammal), but it is still always much easier to come up with a sane 
organisation for the *sub*classes of a concrete class.

FYI, I recently gave a talk about ontological modelling in Wikidata that 
discussed some of the current issues: 
https://iccl.inf.tu-dresden.de/web/Misc3058/en (audience were ontology 
design pattern researchers there).

Cheers,

Markus

[1] https://phabricator.wikimedia.org/T199119
[2] http://tinyurl.com/y7tvkuzk

On 17/10/2018 16:04, Daniel Kinzler wrote:
...
  My (very belated) thoughts on this issue:

 Wiki content grows in a messy way, and it stays messy until the messiness causes
 problems. Once it causes problems, people are motivated to clean it up.

 I propose to implement hierarchical search based on very simple, predictable
 rules, e.g. by having a configurable list of transitive relationships that get
 evaluated to a certain depth. I'd go for subclasses, geographical inclusion, and
 subspecies at first.

 Doing this will NOT produce good results. You would have to implement a lot of
 special cases and heuristics to work around dirty data. I say: let it produce
 bad results, tell people why the results are bad, and what they can do about it!

 The Wikimedia community is AMAZING at making good use of whatever capabilities
 the software, and adapting content to make the software produce the results they
 want. By providing limited but clearly defined software support for hierarchical
 search, we allow the community to optimize the content to work with that search.
 Keeping the rules simple means that other consumers can then follow the same
 rules, and the content will work for them as well.

 -- daniel

 Am 29.09.2018 um 19:25 schrieb Gerard Meijssen:
  Hoi,
 There is also the age old conundrum where some want to enforce their rules for
 the good all all because (argument of the day follows).

 First of all, Wikidata is very much a child of Wikipedia. It has its own
 structures and people have endeavoured to build those same structures in
 Wikidata never mind that it is a very different medium and never mind that there
 are 280+ Wikipedias that might consider things to be different.  The start of
 Wikidata was also an auspicious occasion where it was thought to be OK to adopt
 an external German authority. That proved to be a disaster and there are still
 residues of this awful decision. It took not long to show the short comings of
 this schedule and it was replaced by something more sensible.

 However, we got something really Wiki and it was all too wild. It took not long
 for me to ask for someone to explain the current structures and nobody
 volunteered. So I did what I do best, I largely ignored the results of the
 classes and subclasses. It does not work for me. It works against me so me
 current strategy is to ignore this nonsense and concentrate on including data.
 The reason is simple; once data is included, it is easy to slice it and dice
 it.structure it as we see fit at a later date.

 So when our priority becomes to make our data reusable, more open we should
 agree on it. So far we have not because we choose to fight each other. Some have
 ideas, some have invested too much in what we have at this time. When we are to
 make our data reusable, we should agree on what it is exactly we aim to achieve.
 Is it to support Commons, it is to support some external standard that is
 academically sound. I would always favour what is practical and easily measured.

 I would support Commons first. It has the benefit that it will bring our
 communities together in a clear objective. It has the benefit that changes in
 the operations of Wikidata support the whole of the Wikimedia universe and
 consequentially financial, technical and operational needs and investments are
 easily understood. It also means that all the bureaucracy that has materialised
 will show to be in the way when it is.

 So my question is not if we are a Wiki, my question is are we a Wiki enough and
 willing to change our way for our own good.
 Thanks,
        GerardM

 On Sat, 29 Sep 2018 at 16:38, Thad Guidry &lt;thadguidry(a)gmail.com
 <mailto:thadguidry@gmail.com>> wrote:

      Ettore,

      Wikidata has the ability of crowdsourcing...unfortunately, it is not
      effectively utilized.

      Its because Wikidata does not yet provide a voting feature on
      statements...where as the vote gets higher...more resistance to change the
      statement is required.
      But that breaks the notion of a "wiki" for some folks.
      And there we circle back to Gerard's age old question of ... should Wikidata
      really be considered a wiki at all for the benefit of society ?  or should
      it apply voting/resistance to keep it tidy, factual and less messy.

      We have the technology to implement voting/resistance on statements.  I
      personally would utilize that feature and many others probably would as
      well.  Crowdsourcing the low voted facts back to applications like
      OpenRefine, or the recently sent out Survey vote mechanism for spam analysis
      on the low voted statements could highlight where things are untidy and
      implement vote casting to clean them up.

      "...the burden of proof has to be placed on authority, and it should be
      dismantled if that burden cannot be met..."

      -Thad
      +ThadGuidry <https://plus.google.com/+ThadGuidry>

      On Sat, Sep 29, 2018 at 2:49 AM Ettore RIZZA &lt;ettorerizza(a)gmail.com
      <mailto:ettorerizza@gmail.com>> wrote:

          Hi,

          The Wikidata's ontology is a mess, and I do not see how it could be
          otherwise. While the creation of new properties is controlled, any fool
          can decide that a woman <https://www.wikidata.org/wiki/Q467>is no longer
          a human or is part of family. Maybe I'm a fool too? I wanted to remove
          the claim that a ship <https://www.wikidata.org/wiki/Q11446> is an
          instance of "ship type" because it produces weird circular inferences
in
          my application; but maybe that makes sense to someone else.

          There will never be a universal ontology on which everyone agrees. I
          wonder (sorry to think aloud) if Wikidata should not rather facilitate
          the use of external classifications. Many external ids are knowledge
          organization systems (ontologies, thesauri, classifications ...) I dream
          of a simple query that could search, in Wikidata, "all elements of the
          same class as 'poodle' according to the classification of imagenet
          <http://imagenet.stanford.edu/synset?wnid=n02113335>.

      _______________________________________________
      Wikidata mailing list
      Wikidata(a)lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
      https://lists.wikimedia.org/mailman/listinfo/wikidata

 _______________________________________________
 Wikidata mailing list
 Wikidata(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikidata] Wikidata considered unable to support hierarchical search in Structured Data for Commons