2014-09-13 20:15 GMT+02:00 James Heald j.heald@ucl.ac.uk:
On 13/09/2014 18:15, Jan Ainali wrote:
2014-09-13 18:14 GMT+02:00 James Heald j.heald@ucl.ac.uk:
- Where will topics be stored ? *
On the question of where the list of topics will be stored, the initial thoughts of the Structured Data team would seem to be clear: they are to be stored on the new CommonsData wikibase.
See eg: https://commons.wikimedia.org/w/index.php?title=File% 3AStructured_Data_-_Slides.pdf&page=17 ("topic links")
I read that as topics will be stored on Wikidata. That is, on Commons, you say that file DouglasAdams.jpg is about topic Q42, which is referring to an object on Wikidata. Everything about Q42 is stored on Wikidata.
Yes, I imagine you would store say
Q42182 (pointing to Buckingham palace), probably with P180 ("depicts" - as opposed to "signature of", or "chemical structure for"
But I suspect you would also store eg
Q16560 (palace) etc; even though this is implied by Buckingham Palace
- What about the natural hierarchical structure ? *
eg
Leonardo da Vinci --> Mona Lisa --> --> Files depicting the Mona Lisa
Shouldn't the fact that it was Leonardo that painted the Mona Lisa only be stored in one place, on Mona Lisa, (or perhaps on Leonardo); but *not* multiple times, separately on every single depiction file?
*A*: Probably not, for several reasons.
If the topics are on Wikidata, you will have this "for free", meaning that the hierarchy is already there, ready to be exploited.
Yes the hierarchy is there, ready to be exploited.
But exploiting it costs time.
The point I'm making in my post is that, especially when the user request is a combination search on two quite general topics, you don't want to be hanging around *waiting* while the system works out how to exploit it.
Instead you want the answer then and there -- and that means denormalisation.
Let the ops worry about time, I have not heard them complain about a search dystopia yet. Even the Wiki Data Query has reasonable response time compairing to the power it offers in the queries. And that is on wmflabs, not a production server. You're saying that even when we make the effort to get structured linked data we should not exploit the single most important advantage it offers. It does not make sense. It almost like just repeating the category sysem again but with another software (albeit it offers multilinguality).
/Jan