Hi Markus,
Any query is
possible (given a completely implemented engine),
but some of them just take a lot of time.
Sorry, but this is really not what I am seeing. The queries I have tried all failed
entirely. They were not slow, they completed computation with no results, time outs, or
otherwise looped without any success.
"Given a completely implemented engine" is crucial there.
I expect errors in the console or something.
It is trivial to prove that a TPF client
can evaluate any SPARQL query with full completeness
with respect to a finite number of sources,
since one can obtain a full data dump with TPF.
(It is equally trivial to prove that some will be very slow.)
No need to translate DBpedia queries.
Sure, they were just an example
of the kind of queries that are interesting.
We have a large number (over 300) of user-written
example queries for Wikidata:
I'll try to autorun them after the holidays
and get timings for each of them.
I am afraid that blocking operators are rather
essential in practice though
It all depends on the use cases and the constraints.
Running things on the Web, as opposed to on a closed database,
completely changes the setting.
The kind of queries that work great with TPF
are the kind of queries that also work great on the open Web.
An example I've given in the past
(
http://www.slideshare.net/RubenVerborgh/the-lonesome-lod-cloud/47):
a query with ORDER BY doesn't make sense on the Open Web.
You'll just wait forever, because the first answer can't be given
since we can never be sure there won't be another one before it.
So instead, don't have ORDER BY in the query;
rather have an interface that dynamically
reorders results as they stream in.
In the queries I tried, no results were streaming in
whatsoever.
The default query streams (
http://ldfclient.wmflabs.org/).
it is increasingly annoying to get late results that
shift your page layout and to be unsure if more things will come later.
Or exciting, depending on how you look at it ;-)
I believe users want to see a stable result, basically
instantaneously. They don't want to witness the computation process in slow motion.
Very well, but then they'll have to pay for it I'm afraid.
I'm very happy with Wikimedia, Wikidata, Wikipedia, DBpedia,
but these are unfortunately the exception.
We cannot expect to obtain multi-source information from the Web
without paying for this in some way.
It might be with privacy and diversity (e.g., Facebook),
it might be with money (e.g., paying APIs),
it might be with time (e.g., TPF).
So in absence of a business model
for federated querying on the Web,
I focus on reducing publishers' costs,
such that we pay for information only with time.
Which is what we did on the Web in the old days.
Point taken.
I just wished the community produced Web-grade solutions ;-)
I don't get what you mean by this. It sounds like a marketing term to me that has no
serious meaning. What exactly is your problem with the current Wikidata or DBpedia SPARQL
endpoints? Why are they not "Web grade" in your view?
As I said above, Wikidata and DBpedia are the exceptions.
There is some philanthropy behind these (and I'm happy for that).
They are only Web-grade because somebody is willing to pay the bill
without reaping immediate benefits from that.
As we knows, that's not how the rest of the Web/world works.
Think about all the publishers that make their data freely available.
They have been doing so for years and years,
through websites and, later, Web APIs.
Can we really expect them to provide an API
that is more expressive/expensive than any other API out there?
Can we really expect publishers who already give data away for free,
to also pay for the entire cost of execution queries over that data?
I don't think so.
I think the only feasible option for free/public data
is to provide lightweight APIs.
For some, "lightweight" might mean just a download;
for others, like Wikipedia, it means one page per subject.
But for none of them (except the philanthropy-driven ones),
it will mean "let me execute a custom query".
It's just not feasible on the Web.
So a Web-grade solution to me is a solution that publishers can offer
at the same cost for which they're currently on the Web.
And of course, everything changes if data consumers carry the cost,
either through privacy, money, or other means.
But that's not primarily what I'm talking about.
Eventually, I want machines to access open information on the Web just like I can,
from different sources, nicely combined.
As long as they're faster than me, and I don't have to do it, I'm a happy
man.
The fast solutions come at a price, always.
Best,
Ruben