"Let the ops worry about time" is not an answer.
We're talking about the something we're hoping to turn into a world-class mass-use image bank, and its front-line public-facing search capability.
That's on an altogether different scale to WDQ running a few hundred searches a day.
Moreover, we're talking about a public-facing search capability, where you're user clicks a tag and they want an updated results set *instantly* -- their sitting around while the server makes a cup of tea, or declares the query is too complex and goes into a sulk is not an option.
If the user wants a search on "palace" and "soldier", there simply is not time for the server to first recursively build a list of every palace it knows about, then every image related to each of those palaces, then every soldier it knows about, every image related to each of those soldiers, then intersect the two (very big) lists before it can start delivering any image hits at all. That is not acceptable. A random internet user wants those hits straight away.
The only way to routinely be able to deliver that is denormalisation.
It's not a question of just buying some more blades and filling up some more racks. That doesn't get you a big enough factor of speedup.
What we have is a design challenge, which needs a design solution.
-- James.
Let the ops worry about time, I have not heard them complain about a search dystopia yet. Even the Wiki Data Query has reasonable response time compairing to the power it offers in the queries. And that is on wmflabs, not a production server. You're saying that even when we make the effort to get structured linked data we should not exploit the single most important advantage it offers. It does not make sense. It almost like just repeating the category sysem again but with another software (albeit it offers multilinguality).
/Jan
Multimedia mailing list Multimedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/multimedia