Am 28.10.2014 11:26, schrieb Martynas Jusevičius:
And why do you think practical cases cannot be implemented using RDF? What is the justification for ignoring the whole standard and implementation stack? What makes you think Wikidata can do better than RDF?
We don't ignore the standard, but, from the start, considered RDF one way to represent the data we have in Wikidata. The fact that the RDF mapping is currently only partially implemented is annoying, but perhaps understandable, since other features (e.g. the query capability you mention) are more urgent.
However, the data model used by Wikibase/Wikidata is different from RDF on a conceptual level. In fact, it's much closer to ISO Topic Maps, if you want a standard.
The point is that in RDF, you typically have statements like "X is a Y" or "The z of X has value n". These are "facts" that can easily be used by an inference engine, and queried using SPARQL. Wikidata, however, does not contain such facts. It contains *claims*. Wikidata statements have the form "X is a Y according to a and b, in the context of C" (e.g. "The population of Berlin is 3.5 Million, according to survey 768234, in 2011"). "Deep" statements like this *can* be modeled in RDF, but the result is rather inconvenient to work with, and quite useless for SPARQL queries; just because you *can* model this email as a single number does not mean it is useful to do so - and even if it is useful for some use case, that doesn't mean it's a good idea to use that number as the primary, internal representation of this email.
Even if you ignore the "depth" of the statement, the "top level" claims (Population of Berlin = 3.5 million) are too vague, too loosely connected and too prone to errors to be useful for SPARQL based processing. DBpedia's SPARQL endpoint is pretty useless for all but trivial questions, because the data is too dirty, even though the dbpedia team does a commendable job at cleaning the data via a variety of heuristics.
Another issue is scale. Because of the depth of the statements, we would need hundreds of millions of triples to represent even the data we have today. Expect this to be billions in a years or two. Few, if any, triple stores scale that way.