Re: [Wikidata-l] Wikidata RDF

28 Oct 2014

      Am 28.10.2014 11:26, schrieb Martynas Jusevičius:
...
And why do you think practical cases cannot be implemented using RDF?
What is the justification for ignoring the whole standard and
implementation stack? What makes you think Wikidata can do better than
RDF?
We don't ignore the standard, but, from the start, considered RDF one way to
represent the data we have in Wikidata. The fact that the RDF mapping is
currently only partially implemented is annoying, but perhaps understandable,
since other features (e.g. the query capability you mention) are more urgent.
However, the data model used by Wikibase/Wikidata is different from RDF on a
conceptual level. In fact, it's much closer to ISO Topic Maps, if you want a
standard.
The point is that in RDF, you typically have statements like "X is a Y" or "The
z of X has value n". These are "facts" that can easily be used by an inference
engine, and queried using SPARQL. Wikidata, however, does not contain such
facts. It contains *claims*. Wikidata statements have the form "X is a Y
according to a and b, in the context of C" (e.g. "The population of Berlin is
3.5 Million, according to survey 768234, in 2011"). "Deep" statements like this
*can* be modeled in RDF, but the result is rather inconvenient to work with, and
quite useless for SPARQL queries; just because you *can* model this email as a
single number does not mean it is useful to do so - and even if it is useful for
some use case, that doesn't mean it's a good idea to use that number as the
primary, internal representation of this email.
Even if you ignore the "depth" of the statement, the "top level" claims
(Population of Berlin = 3.5 million) are too vague, too loosely connected and
too prone to errors to be useful for SPARQL based processing. DBpedia's SPARQL
endpoint is pretty useless for all but trivial questions, because the data is
too dirty, even though the dbpedia team does a commendable job at cleaning the
data via a variety of heuristics.
Another issue is scale. Because of the depth of the statements, we would need
hundreds of millions of triples to represent even the data we have today. Expect
this to be billions in a years or two. Few, if any, triple stores scale that way.
-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikidata-l] Wikidata RDF