Re: [Wikidata] Wikidata considered unable to support hierarchical search in Structured Data for Commons

19 Oct 2018

Hi James,

On 19/10/2018 01:09, James Heald wrote:
...
  On 18/10/2018 22:33, Markus Kroetzsch wrote:

 And, on another note, there is also a huge misunderstanding exposed in 
 the discussion on th search-related tracker item [1]: Cparle there 
 speaks about "traversing the subclass hierarchy" but is actually 
 looking at *super*classes of, e.g., "Clarinet", which he mostly finds 
 irrelevant to users who care about clarinets. But surely that's the 
 wrong direction! You have to look for *sub*classes to find special 
 cases of what you are looking for. Looking downwards will often lead 
 to much saner ontologies than when turning your head towards the dizzy 
 heights of upper ontology. Yes, the few of us looking for instances of 
 "logical consequence" will still get clarinets, but those who look for 
 instances of clarinet merely will see instances of alto clarinet, 
 piccolo clarinet, basset horn, Saxonette, and so on [2]. So instead of 
 trying to suggest to Commons editors meaningful "upper concepts", one 
 could simply enable the use of lower concepts in search. It does not 
 work in all cases yet, but it many.  
 Not really.

 Cparle wants to make sure that people searching for "clarinet" also get 
 shown images of "piccolo clarinet" etc.

 To make this possible, where an image has been tagged "basset horn" he 
 is therefore looking to add "clarinet" as an additional keyword, so that 
 if somebody types "clarinet" into the search box, one of the images 
 retrieved by ElasticSearch will be the basset horn one.

 I imagine there are pluses and minuses both ways, whether you try to 
 make sure one search returns more hits, or try to run multiple searches 
 each returning fewer hits.

 Your suggestion of the latter approach may not involve so much 
 pre-investigation of the top of the tree, which may be terms that people 
 are less likely to search for; but on the other hand, the actual 
 searching may be less efficient than a single indexed search. 
True, but with the Wikidata Query Service we already have infrastructure 
that completes millions of search requests of this kind (involving path 
queries), so that seems doable for Commons as well. WDQS already has 
Wikimedia API bindings that allow it to use Lucene-based results in 
addition, if needed (though this would only make sense if the search 
should use some content that for some reason cannot be imported into a 
query service as graph data, mostly free-text search over longer texts).

I think the approach of completing tags towards the upper classes is not 
a good idea in general, since it creates extra work for editors that 
requires a million times the resources needed in the other approach: if 
the subclass hierarchy is wrong, you only need to fix it once to improve 
search for all existing Commons content; if you rely on manual extra 
tags, you'd have to add them to every file on Commons and keep it 
up-to-date with changes in the concepts -- an enormous, redundant effort 
that will invariably lead to a very non-uniform search experience across 
otherwise similar media. This seems like a huge waste of editors' time 
even if it would work (i.e., if we would live in a world where the 
superclasses of a class would be easy to understand and closely related 
to the topic that an editor is working on -- which will never happen for 
Wikidata or Commons, since both cover such a breadth of topics that 
their upper ontology necessarily has to be very general even if modelled 
in a clean and fully correct way).

Cheers,

Markus

...

  There are still problems (such as the biological
taxonomy being 
 modelled as a hierarchy of names rather than animal classes, placing 
 dog far away from mammal), but it is still always much easier to come 
 up with a sane organisation for the *sub*classes of a concrete class.  
 For what it's worth, there's currently quite a lively discussion on 
 Project Chat about issues with the current modelling of biological 
 taxonomies,

https://www.wikidata.org/wiki/Wikidata:Project_chat#Taxonomy:_concept_centr…

 People on this thread might like to comment on some of the less 
 fortunate elements of current practice, and the appropriateness of some 
 of the thoughts that have been suggested.

 But the taxo project has become such a walled garden, answerable only to 
 itself, that people with comments may need to be quite forceful to get 
 their message through, if we are to deal eg with some of the 
 difficulties Cparle describes in the ticket at
   https://phabricator.wikimedia.org/T199119

    -- James.

 ---
 This email has been checked for viruses by AVG.
 https://www.avg.com

 _______________________________________________
 Wikidata mailing list
 Wikidata(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata 

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikidata] Wikidata considered unable to support hierarchical search in Structured Data for Commons