Re: [Wikidata-l] [Multimedia] Commons file-topic searching and storage (was Re: Commons Categories again)

14 Sep 2014

You should look at the dev team plans about the query engine. Queries will
be associated to a query item, and the results of the query will be cached
and maintained by the Wikibase software as the datas will be modified, if I
understand well.

So this discussion will make sense when we will know how powerful the query
engine will be.

Otherwise we are talking into the void. Which "norm" are we using, what
should we denormalize ? According to which rules ? To optimize exactly what
?

If it's just the parent classes, Reasonator already does that, and
templates like {{Item documentation}} or {{classification}} as well on
Wikidata. Without any "denormalization". For an example see
https://tools.wmflabs.org/reasonator/?&q=1638134 or the heading of
https://www.wikidata.org/wiki/Talk:Q5 for an example of item doc

2014-09-14 0:14 GMT+02:00 James Heald &lt;j.heald(a)ucl.ac.uk&gt;uk>:

...
  Hi Thomas,

 I'm not really talking about the specific query *engine* that will work on
 the file topic data.  (Well, maybe a little, in general terms about some of
 the functionality we might want in such a search).

 What I'm more talking about is the kind of data that will likely need to
 stored on the CommonsData wikibase to make any kind of such query engine
 *possible* with reasonable speed -- in particular not just the most
 specific Q-numbers that apply to a file, but (IMO) *any* Q-number that the
 file should be returned from if the topic corresponding to that Q-number
 was searched for.

 I'm saying that such a Q-number needs to be included on the item on
 CommonsData for the file -- it's not enough that if used Wikidata to look
 up the more specific Q-number, then the less specific Q-number would be
 returned: I'm saying that lookup already needs to have been done (and
 maintained), so the less specific Q-number is already sitting on
 CommonsData when someone comes to search for it.

 This doesn't need to be a manual process (though the presence of a
 Q-number on a CommonsData item perhaps needs to subject to manual overrule,
 in case the inference chain has gone wrong, and it really isn't relevant);
 but what I'm saying is that you can't wait to do the inference when the
 search request comes in -- instead the relevant Q-numbers for each file
 need to be pre-computed, and stored on the CommonsData item, so that when
 the search request comes in, they are already there to be searched on.
 That denormalisation of information really needs to be in place whatever
 the fine coding of the engine -- it's data design, rather than engine
 coding.

   -- James.

 On 13/09/2014 20:56, Thomas Douillard wrote:

  Hi James, I don't understand (I must admit I
did not read the whole
 topic).
 Are we talking about a specific query engine ? The one the development
 team
 will implement in Wikibase, or are we talking of something else ?

 If we do not know that, I seems difficult to have this conversation at
 that
 point.

 2014-09-13 21:51 GMT+02:00 James Heald &lt;j.heald(a)ucl.ac.uk&gt;uk>:

  "Let the ops worry about time" is not an answer.

 We're talking about the something we're hoping to turn into a world-class
 mass-use image bank, and its front-line public-facing search capability.

 That's on an altogether different scale to WDQ running a few hundred
 searches a day.

 Moreover, we're talking about a public-facing search capability, where
 you're user clicks a tag and they want an updated results set *instantly*
 -- their sitting around while the server makes a cup of tea, or declares
 the query is too complex and goes into a sulk is not an option.

 If the user wants a search on "palace" and "soldier", there simply is
not
 time for the server to first recursively build a list of every palace it
 knows about, then every image related to each of those palaces, then
 every
 soldier it knows about, every image related to each of those soldiers,
 then
 intersect the two (very big) lists before it can start delivering any
 image
 hits at all.  That is not acceptable.  A random internet user wants those
 hits straight away.

 The only way to routinely be able to deliver that is denormalisation.

 It's not a question of just buying some more blades and filling up some
 more racks.  That doesn't get you a big enough factor of speedup.

 What we have is a design challenge, which needs a design solution.

    -- James.

  Let the ops worry about time, I have not heard them complain about a
  search
 dystopia yet. Even the Wiki Data Query has reasonable response time
 compairing to the power it offers in the queries. And that is on
 wmflabs,
 not a production server.
 You're saying that even when we make the effort to get structured linked
 data we should not exploit the single most important advantage it
 offers.
 It does not make sense.
 It almost like just repeating the category sysem again but with another
 software (albeit it offers multilinguality).

 /Jan

 _______________________________________________
 Multimedia mailing list
 Multimedia(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/multimedia

  _______________________________________________
 Wikidata-l mailing list
 Wikidata-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l

 _______________________________________________
 Wikidata-l mailing list
 Wikidata-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l

 _______________________________________________
 Wikidata-l mailing list
 Wikidata-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikidata-l] [Multimedia] Commons file-topic searching and storage (was Re: Commons Categories again)