Would the Word "assertion" be a possible replacement for the neonym "Snak"?
Hi,
When I glanced over the data model description and found the word 'Snaks' [1] as entity or unit of facts, it created some interpretive confusion. Semantic web already uses some abstract language to describe entity concepts, if possible don't introduce another one just to describe a new concept and if necessary please choose a descriptor that is more self-explanatory.
[1] http://meta.wikimedia.org/wiki/Wikidata/Data_model#Snaks
Cheers
On Thu, Apr 5, 2012 at 7:37 AM, Gregor Hagedorn g.m.hagedorn@gmail.com wrote:
Would the Word "assertion" be a possible replacement for the neonym "Snak"?
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Hey all,
it doesn't look like reuse of existing concepts and standards is a priority for this project. One cannot build a Semantic Web application by ignoring its main building block, which is the RDF data model. Right now it makes no sense to call Wikidata a "semantic" application.
In my opinion, where Wikidata should be going, is to provide read-write user-friendly, multilingual interfaces for DBPedia. That would be a true Semantic Web application for free and open knowledge. Too bad people cannot get over the wiki mentality. It has worked fine for a while, but it's time to move on.
Martynas graphity.org
On Thu, Apr 5, 2012 at 5:51 AM, James HK jamesin.hongkong.1@gmail.com wrote:
Hi,
When I glanced over the data model description and found the word 'Snaks' [1] as entity or unit of facts, it created some interpretive confusion. Semantic web already uses some abstract language to describe entity concepts, if possible don't introduce another one just to describe a new concept and if necessary please choose a descriptor that is more self-explanatory.
[1] http://meta.wikimedia.org/wiki/Wikidata/Data_model#Snaks
Cheers
On Thu, Apr 5, 2012 at 7:37 AM, Gregor Hagedorn g.m.hagedorn@gmail.com wrote:
Would the Word "assertion" be a possible replacement for the neonym "Snak"?
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Gregor, James, I don't know if you are familiar with OWL and other semantic web standards, but if you are then the following explanation might be useful for you:
The most precise general term for Snak in Semantic Web speak would be "axiom". The term "assertion" is more specific, since an assertion in ontology languages is an axiom that expresses instance-level information about individuals and literals. You may also have heard of the related terminology "ABox" that is used in description logics, again referring to instance-level knowledge. Snaks, in contrast, could also express some schema level statements, so calling them assertions would be misleading for people who are familiar with OWL and similar languages.
On the other hand, "axiom" would also be a poor choice of name. For one thing, it is not certain that all Snaks will have an easy reading as OWL axioms, and there are certainly many OWL axioms that cannot be written as Snaks. Moreover, the word "axiom" already has a variety of meanings in other contexts, none of which is what we mean here. Since Snaks are a purely technical construct in Wikidata that will mainly be seen by developers, we have thus given them a name that does not suggest anything specific.
Markus
On 05/04/12 04:51, James HK wrote:
Hi,
When I glanced over the data model description and found the word 'Snaks' [1] as entity or unit of facts, it created some interpretive confusion. Semantic web already uses some abstract language to describe entity concepts, if possible don't introduce another one just to describe a new concept and if necessary please choose a descriptor that is more self-explanatory.
[1] http://meta.wikimedia.org/wiki/Wikidata/Data_model#Snaks
Cheers
On Thu, Apr 5, 2012 at 7:37 AM, Gregor Hagedorng.m.hagedorn@gmail.com wrote:
Would the Word "assertion" be a possible replacement for the neonym "Snak"?
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
On 05/04/12 12:20, Martynas Jusevicius wrote:
Hey all,
it doesn't look like reuse of existing concepts and standards is a priority for this project. One cannot build a Semantic Web application by ignoring its main building block, which is the RDF data model. Right now it makes no sense to call Wikidata a "semantic" application.
In my opinion, where Wikidata should be going, is to provide read-write user-friendly, multilingual interfaces for DBPedia. That would be a true Semantic Web application for free and open knowledge. Too bad people cannot get over the wiki mentality. It has worked fine for a while, but it's time to move on.
Please rest assured that we are not ignoring RDF or OWL. The similarities and differences of Wikidata and Dbpedia are explained in a wiki page that Dbpedia and Wikidata people have been working out together in order to avoid any confusions [1].
Markus
[1] http://meta.wikimedia.org/wiki/Wikidata/Notes/DBpedia_and_Wikidata
Martynas graphity.org
On Thu, Apr 5, 2012 at 5:51 AM, James HKjamesin.hongkong.1@gmail.com wrote:
Hi,
When I glanced over the data model description and found the word 'Snaks' [1] as entity or unit of facts, it created some interpretive confusion. Semantic web already uses some abstract language to describe entity concepts, if possible don't introduce another one just to describe a new concept and if necessary please choose a descriptor that is more self-explanatory.
[1] http://meta.wikimedia.org/wiki/Wikidata/Data_model#Snaks
Cheers
On Thu, Apr 5, 2012 at 7:37 AM, Gregor Hagedorng.m.hagedorn@gmail.com wrote:
Would the Word "assertion" be a possible replacement for the neonym "Snak"?
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
2012/4/5 Martynas Jusevicius martynas@graphity.org
Too bad people cannot get over the wiki mentality. It has worked fine for a while, but it's time to move on.
Dear Martynas,
with Wikidata, we do not want to "get over the wiki mentality", but actually embrace it. I thought that our name *Wiki*data was quite a give-away regarding that point.
Cheers, Denny
Dear Martynas,
if you try to model the following statement in RDF
"The population density of France, as of an 2012 estimate, is 116 per square kilometer, according to the "Bilan demographique 2010"."
you might notice that RDF requires a reification of the statement. The data model that you have seen provides us with an abstract and concise way to talk about these reifications (i.e. via the statement model, just as in RDF).
We still have not finished the document describing how to map our data model to OWL/RDF, but we have thought about this the whole time while discussing the data model.
But if you find a simpler, and more RDFish way to express the above statement, please feel free to enlighten me. I would be indeed very interested.
Cheers, Denny
2012/4/5 Martynas Jusevicius martynas@graphity.org
it doesn't look like reuse of existing concepts and standards is a priority for this project. One cannot build a Semantic Web application by ignoring its main building block, which is the RDF data model.
Many thanks for the explanations Markus!
I still feel uneasy about the hard-to-remember-neonym. I cannot prove it, but believe the term snak will have to be learned by anyone who interacts with the system through the API, any form of import mechanism, etc. This is far wider than the developers in the sense of coders. I may be wrong here.
I guess you have considered broadening the concept of statement. Why does this not work? My feeling is that it is a statement that a property is not applicable ("has not value"). Naively speaking, such a statement does require a source and in many respects is similar to other forms of statements.
Gregor
On 5 April 2012 21:04, Markus Krötzsch markus@semantic-mediawiki.org wrote:
Gregor, James, I don't know if you are familiar with OWL and other semantic web standards, but if you are then the following explanation might be useful for you:
The most precise general term for Snak in Semantic Web speak would be "axiom". The term "assertion" is more specific, since an assertion in ontology languages is an axiom that expresses instance-level information about individuals and literals. You may also have heard of the related terminology "ABox" that is used in description logics, again referring to instance-level knowledge. Snaks, in contrast, could also express some schema level statements, so calling them assertions would be misleading for people who are familiar with OWL and similar languages.
On the other hand, "axiom" would also be a poor choice of name. For one thing, it is not certain that all Snaks will have an easy reading as OWL axioms, and there are certainly many OWL axioms that cannot be written as Snaks. Moreover, the word "axiom" already has a variety of meanings in other contexts, none of which is what we mean here. Since Snaks are a purely technical construct in Wikidata that will mainly be seen by developers, we have thus given them a name that does not suggest anything specific.
Markus
On 05/04/12 04:51, James HK wrote:
Hi,
When I glanced over the data model description and found the word 'Snaks' [1] as entity or unit of facts, it created some interpretive confusion. Semantic web already uses some abstract language to describe entity concepts, if possible don't introduce another one just to describe a new concept and if necessary please choose a descriptor that is more self-explanatory.
[1] http://meta.wikimedia.org/wiki/Wikidata/Data_model#Snaks
Cheers
On Thu, Apr 5, 2012 at 7:37 AM, Gregor Hagedorng.m.hagedorn@gmail.com wrote:
Would the Word "assertion" be a possible replacement for the neonym "Snak"?
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
2012/4/5 Gregor Hagedorn g.m.hagedorn@gmail.com
I still feel uneasy about the hard-to-remember-neonym.
It was strange to me and had to read after it. You may remember as bit-->byte-->snack, growing pieces of food.
I cannot prove it, but believe the term snak will have to be learned by anyone who interacts with the system through the API, any form of import mechanism, etc.
Well, and what's then? They will learn. Once I thought namespaces to be a rather programming word and concept, but then I became a Wikipedian and understood they were a basic concept of editing. Every Wikipedian must know the difference between article and user and project namespace and they are not afraid of the word even if they have no real knowledge about namespaces in programming. People must understand concepts and ideas, and for the majority of non-English, non-programmer people it will be quite the same whatever name the new concept has. More, a sna(c)k fits better to every day concepts of an avarage person than an assertion, doesn't it?
Hey Denny,
I gave it a shot:
http://dbpedia.org/resource/France http://dbpedia.org/ontology/PopulatedPlace/populationDensity "116"^^http://dbpedia.org/datatype/inhabitantsPerSquareKilometre http://wikidata.org/graphs/France2012 . http://dbpedia.org/resource/France http://dbpedia.org/ontology/populationDensity "116"^^http://www.w3.org/2001/XMLSchema#double http://wikidata.org/graphs/France2012 .
http://wikidata.org/graphs/France2012 http://purl.org/dc/terms/date "2012"^^http://www.w3.org/2001/XMLSchema#year http://wikidata.org/graphs/France2012 . http://wikidata.org/graphs/France2012 http://purl.org/dc/terms/source _:source http://wikidata.org/graphs/France2012 . _:source http://purl.org/dc/terms/published "2010"^^http://www.w3.org/2001/XMLSchema#year http://wikidata.org/graphs/France2012 . _:source http://purl.org/dc/terms/title "Bilan demographique"@fr http://wikidata.org/graphs/France2012 .
The syntax is N-Quads. It does not use reification, but instead named graphs for provenance. The necessary concepts were already present in DBPedia.
As you might know, temporal provenance is not the strongest point of RDF. However conventions and solutions are available, and I am sure implementing them would require far less effort than creating a custom data model from scratch, not to mention the benefits of potential reuse. There's quite some research done on RDF provenance, which is worth looking into if provenance is really a key feature for Wikidata from day one. I see it as something that should work transparently behind the scenes, and therefore could be rolled-out later on.
You would get much better and more extensive advice than mine on semantic-web@w3.org -- the only prerequisite is willingness to cooperate.
RDF's strength is that it solves data integration problems by pivotal conversion, reducing the number of model transformations from quadratic to linear: http://en.wikipedia.org/wiki/Data_conversion#Pivotal_conversion A custom data model brings up questions which already have an answer in the Semantic Web stack: # can data from different Wikidata instances be merged or interlinked natively? # is there a native query language? In case of SQL, how performant will it be given many JOINs and the planned use of provenance? # what and how many custom serialization formats and API mechanisms will have to follow?
Stacking one custom solution on top of another can eventually result in huge costs. I honestly think the energy of Wikidata could be directed in a more productive way.
Martynas graphity.org
2012/4/5 Denny Vrandečić denny.vrandecic@wikimedia.de:
Dear Martynas,
if you try to model the following statement in RDF
"The population density of France, as of an 2012 estimate, is 116 per square kilometer, according to the "Bilan demographique 2010"."
you might notice that RDF requires a reification of the statement. The data model that you have seen provides us with an abstract and concise way to talk about these reifications (i.e. via the statement model, just as in RDF).
We still have not finished the document describing how to map our data model to OWL/RDF, but we have thought about this the whole time while discussing the data model.
But if you find a simpler, and more RDFish way to express the above statement, please feel free to enlighten me. I would be indeed very interested.
Cheers, Denny
2012/4/5 Martynas Jusevicius martynas@graphity.org
it doesn't look like reuse of existing concepts and standards is a priority for this project. One cannot build a Semantic Web application by ignoring its main building block, which is the RDF data model.
-- Project director Wikidata Wikimedia Deutschland e.V. | Eisenacher Straße 2 | 10777 Berlin Tel. +49-30-219 158 26-0 | http://wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Hey again,
"getting over" wasn't meant to be harsh. I was hoping though "wiki" in "Wikidata" stands for a the broader concept of "free and open", and not a particular syntax for encoding knowledge.
I'm all for free and open knowledge, but I think sticking to the same mechanisms which helped bootstrap Wikipedia is not a good idea when building the next-generation semantic Wikidata. It prohibits out-of-the-box thinking and evaluating new, state-of-the art approaches in this area. Wouldn't it be desired that the tasks on Wikidata could be done via an intuitive user-interface rather than via wiki syntax? I would certainly think so, and I also think it is doable.
Martynas graphity.org
2012/4/5 Denny Vrandečić denny.vrandecic@wikimedia.de:
2012/4/5 Martynas Jusevicius martynas@graphity.org
Too bad people cannot get over the wiki mentality. It has worked fine for a while, but it's time to move on.
Dear Martynas,
with Wikidata, we do not want to "get over the wiki mentality", but actually embrace it. I thought that our name *Wiki*data was quite a give-away regarding that point.
Cheers, Denny
-- Project director Wikidata Wikimedia Deutschland e.V. | Eisenacher Straße 2 | 10777 Berlin Tel. +49-30-219 158 26-0 | http://wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
On Fri, Apr 6, 2012 at 2:28 PM, Martynas Jusevicius martynas@graphity.org wrote:
Hey again,
"getting over" wasn't meant to be harsh. I was hoping though "wiki" in "Wikidata" stands for a the broader concept of "free and open", and not a particular syntax for encoding knowledge.
I'm all for free and open knowledge, but I think sticking to the same mechanisms which helped bootstrap Wikipedia is not a good idea when building the next-generation semantic Wikidata. It prohibits out-of-the-box thinking and evaluating new, state-of-the art approaches in this area. Wouldn't it be desired that the tasks on Wikidata could be done via an intuitive user-interface rather than via wiki syntax? I would certainly think so, and I also think it is doable.
There will be forms and the like to input information. We're trying to make it as intuitive as possible.
Cheers Lydia
Martynas,
what you are proposing below is not W3C recommended RDF but an extension of triples to quads. As far as I know, this extension is not compatible yet with existing standards such as SPARQL and OWL. Named graphs work with SPARQL, but are mostly used in another way than you suggest. Most RDF database tools would be *very* unhappy to get millions of named graphs in combination with queries that use variables as graph names. The syntax you use is not a W3C standard either.
This does not say that N-Quads aren't a good idea if one can get them to work with the rest of the Semantic Web stack, but it really defeats your own arguments. We are committed to supporting *existing* standards (as we have said many times already), but we will not base our software design on a non-standard RDF-variant that works with neither OWL nor SPARQL.
Markus
On 06/04/12 13:09, Martynas Jusevicius wrote:
Hey Denny,
I gave it a shot:
http://dbpedia.org/resource/France http://dbpedia.org/ontology/PopulatedPlace/populationDensity "116"^^http://dbpedia.org/datatype/inhabitantsPerSquareKilometre http://wikidata.org/graphs/France2012 . http://dbpedia.org/resource/France http://dbpedia.org/ontology/populationDensity "116"^^http://www.w3.org/2001/XMLSchema#double http://wikidata.org/graphs/France2012 .
http://wikidata.org/graphs/France2012 http://purl.org/dc/terms/date "2012"^^http://www.w3.org/2001/XMLSchema#year http://wikidata.org/graphs/France2012 . http://wikidata.org/graphs/France2012 http://purl.org/dc/terms/source _:source http://wikidata.org/graphs/France2012 . _:sourcehttp://purl.org/dc/terms/published "2010"^^http://www.w3.org/2001/XMLSchema#year http://wikidata.org/graphs/France2012 . _:sourcehttp://purl.org/dc/terms/title "Bilan demographique"@fr http://wikidata.org/graphs/France2012 .
The syntax is N-Quads. It does not use reification, but instead named graphs for provenance. The necessary concepts were already present in DBPedia.
As you might know, temporal provenance is not the strongest point of RDF. However conventions and solutions are available, and I am sure implementing them would require far less effort than creating a custom data model from scratch, not to mention the benefits of potential reuse. There's quite some research done on RDF provenance, which is worth looking into if provenance is really a key feature for Wikidata from day one. I see it as something that should work transparently behind the scenes, and therefore could be rolled-out later on.
You would get much better and more extensive advice than mine on semantic-web@w3.org -- the only prerequisite is willingness to cooperate.
RDF's strength is that it solves data integration problems by pivotal conversion, reducing the number of model transformations from quadratic to linear: http://en.wikipedia.org/wiki/Data_conversion#Pivotal_conversion A custom data model brings up questions which already have an answer in the Semantic Web stack: # can data from different Wikidata instances be merged or interlinked natively? # is there a native query language? In case of SQL, how performant will it be given many JOINs and the planned use of provenance? # what and how many custom serialization formats and API mechanisms will have to follow?
Stacking one custom solution on top of another can eventually result in huge costs. I honestly think the energy of Wikidata could be directed in a more productive way.
Martynas graphity.org
2012/4/5 Denny Vrandečićdenny.vrandecic@wikimedia.de:
Dear Martynas,
if you try to model the following statement in RDF
"The population density of France, as of an 2012 estimate, is 116 per square kilometer, according to the "Bilan demographique 2010"."
you might notice that RDF requires a reification of the statement. The data model that you have seen provides us with an abstract and concise way to talk about these reifications (i.e. via the statement model, just as in RDF).
We still have not finished the document describing how to map our data model to OWL/RDF, but we have thought about this the whole time while discussing the data model.
But if you find a simpler, and more RDFish way to express the above statement, please feel free to enlighten me. I would be indeed very interested.
Cheers, Denny
2012/4/5 Martynas Juseviciusmartynas@graphity.org
it doesn't look like reuse of existing concepts and standards is a priority for this project. One cannot build a Semantic Web application by ignoring its main building block, which is the RDF data model.
-- Project director Wikidata Wikimedia Deutschland e.V. | Eisenacher Straße 2 | 10777 Berlin Tel. +49-30-219 158 26-0 | http://wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
On 06.04.2012 14:28, Martynas Jusevicius wrote:
Hey again,
"getting over" wasn't meant to be harsh. I was hoping though "wiki" in "Wikidata" stands for a the broader concept of "free and open", and not a particular syntax for encoding knowledge.
Dear Martynas
As stated in the proposal, we do not plan to use wiki syntax to encode the data. The proposal identifies json as the most likely candidate for serializing data records, but that's just an implementation detail on the backend, the user will never see this. All input will be done using forms.
For export, I expect several formats to be available, at least one of them based on a mapping to RDF.
Personally, I believe RDF is a very important standard for allowing data from different sources to be combined and re-used. That does not mean however that it is necessarily the best data model to use *inside* an application. I believe this is often not the case. RDF is an intentionally simple mode. This makes it easy to mix and match data from different sources using different standards, but it also makes it hard to represent certain types of data efficiently or conveniently.
Regards, Daniel
Markus et al,
what you are saying is true. However... the RDF Working Group that is currently in operation will, hopefully, come up with a proposed syntax (probably based on TriG) and, more importantly, some sort of a semantics for named graphs, hopefully in alignment with SPARQL. I cannot say, of course, when this will be finalized and how it will align with the timing of the Wikidata project. But it is worth knowing about it and, actually, possibly to keep an eye on it and contact the WG if the (obviously important!) Wikidata use case does not align with what the WG is doing.
(And, of course, I am happy to do the go-between when and if the time comes:-)
Cheers
Ivan
On Apr 6, 2012, at 17:31 , Markus Krötzsch wrote:
Martynas,
what you are proposing below is not W3C recommended RDF but an extension of triples to quads. As far as I know, this extension is not compatible yet with existing standards such as SPARQL and OWL. Named graphs work with SPARQL, but are mostly used in another way than you suggest. Most RDF database tools would be *very* unhappy to get millions of named graphs in combination with queries that use variables as graph names. The syntax you use is not a W3C standard either.
This does not say that N-Quads aren't a good idea if one can get them to work with the rest of the Semantic Web stack, but it really defeats your own arguments. We are committed to supporting *existing* standards (as we have said many times already), but we will not base our software design on a non-standard RDF-variant that works with neither OWL nor SPARQL.
Markus
On 06/04/12 13:09, Martynas Jusevicius wrote:
Hey Denny,
I gave it a shot:
http://dbpedia.org/resource/France http://dbpedia.org/ontology/PopulatedPlace/populationDensity "116"^^http://dbpedia.org/datatype/inhabitantsPerSquareKilometre http://wikidata.org/graphs/France2012 . http://dbpedia.org/resource/France http://dbpedia.org/ontology/populationDensity "116"^^http://www.w3.org/2001/XMLSchema#double http://wikidata.org/graphs/France2012 .
http://wikidata.org/graphs/France2012 http://purl.org/dc/terms/date "2012"^^http://www.w3.org/2001/XMLSchema#year http://wikidata.org/graphs/France2012 . http://wikidata.org/graphs/France2012 http://purl.org/dc/terms/source _:source http://wikidata.org/graphs/France2012 . _:sourcehttp://purl.org/dc/terms/published "2010"^^http://www.w3.org/2001/XMLSchema#year http://wikidata.org/graphs/France2012 . _:sourcehttp://purl.org/dc/terms/title "Bilan demographique"@fr http://wikidata.org/graphs/France2012 .
The syntax is N-Quads. It does not use reification, but instead named graphs for provenance. The necessary concepts were already present in DBPedia.
As you might know, temporal provenance is not the strongest point of RDF. However conventions and solutions are available, and I am sure implementing them would require far less effort than creating a custom data model from scratch, not to mention the benefits of potential reuse. There's quite some research done on RDF provenance, which is worth looking into if provenance is really a key feature for Wikidata from day one. I see it as something that should work transparently behind the scenes, and therefore could be rolled-out later on.
You would get much better and more extensive advice than mine on semantic-web@w3.org -- the only prerequisite is willingness to cooperate.
RDF's strength is that it solves data integration problems by pivotal conversion, reducing the number of model transformations from quadratic to linear: http://en.wikipedia.org/wiki/Data_conversion#Pivotal_conversion A custom data model brings up questions which already have an answer in the Semantic Web stack: # can data from different Wikidata instances be merged or interlinked natively? # is there a native query language? In case of SQL, how performant will it be given many JOINs and the planned use of provenance? # what and how many custom serialization formats and API mechanisms will have to follow?
Stacking one custom solution on top of another can eventually result in huge costs. I honestly think the energy of Wikidata could be directed in a more productive way.
Martynas graphity.org
2012/4/5 Denny Vrandečićdenny.vrandecic@wikimedia.de:
Dear Martynas,
if you try to model the following statement in RDF
"The population density of France, as of an 2012 estimate, is 116 per square kilometer, according to the "Bilan demographique 2010"."
you might notice that RDF requires a reification of the statement. The data model that you have seen provides us with an abstract and concise way to talk about these reifications (i.e. via the statement model, just as in RDF).
We still have not finished the document describing how to map our data model to OWL/RDF, but we have thought about this the whole time while discussing the data model.
But if you find a simpler, and more RDFish way to express the above statement, please feel free to enlighten me. I would be indeed very interested.
Cheers, Denny
2012/4/5 Martynas Juseviciusmartynas@graphity.org
it doesn't look like reuse of existing concepts and standards is a priority for this project. One cannot build a Semantic Web application by ignoring its main building block, which is the RDF data model.
-- Project director Wikidata Wikimedia Deutschland e.V. | Eisenacher Straße 2 | 10777 Berlin Tel. +49-30-219 158 26-0 | http://wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
---- Ivan Herman, W3C Semantic Web Activity Lead Home: http://www.w3.org/People/Ivan/ mobile: +31-641044153 FOAF: http://www.ivan-herman.net/foaf.rdf
On 07/04/12 10:37, Ivan Herman wrote:
Markus et al,
what you are saying is true. However... the RDF Working Group that is currently in operation will, hopefully, come up with a proposed syntax (probably based on TriG) and, more importantly, some sort of a semantics for named graphs, hopefully in alignment with SPARQL. I cannot say, of course, when this will be finalized and how it will align with the timing of the Wikidata project. But it is worth knowing about it and, actually, possibly to keep an eye on it and contact the WG if the (obviously important!) Wikidata use case does not align with what the WG is doing.
Yes, we are aware of this activity and will be watching the outcome. One rationale behind our abstract data model is that it makes it easy to adopt future standards. Even if named graphs are not available yet at the end of the project, it would be easy to write a new exporter later. In this case, the old (triple) export and the new export would most likely not have the same formal semantics, since they encode data in different structures. But this is not a problem since we have a technology neutral data model, that may well be faithfully represented in different, mutually incompatible formats. In essence, this is our extension strategy for supporting future W3C (and other) standards.
(And, of course, I am happy to do the go-between when and if the time comes:-)
Thanks. Once we have concrete proposals for an RDF/OWL export format, we can also discuss how to improve it to make the best use of the available language standards.
Best regards,
Markus
On Apr 6, 2012, at 17:31 , Markus Krötzsch wrote:
Martynas,
what you are proposing below is not W3C recommended RDF but an extension of triples to quads. As far as I know, this extension is not compatible yet with existing standards such as SPARQL and OWL. Named graphs work with SPARQL, but are mostly used in another way than you suggest. Most RDF database tools would be *very* unhappy to get millions of named graphs in combination with queries that use variables as graph names. The syntax you use is not a W3C standard either.
This does not say that N-Quads aren't a good idea if one can get them to work with the rest of the Semantic Web stack, but it really defeats your own arguments. We are committed to supporting *existing* standards (as we have said many times already), but we will not base our software design on a non-standard RDF-variant that works with neither OWL nor SPARQL.
Markus
On 06/04/12 13:09, Martynas Jusevicius wrote:
Hey Denny,
I gave it a shot:
http://dbpedia.org/resource/France http://dbpedia.org/ontology/PopulatedPlace/populationDensity "116"^^http://dbpedia.org/datatype/inhabitantsPerSquareKilometre http://wikidata.org/graphs/France2012 . http://dbpedia.org/resource/France http://dbpedia.org/ontology/populationDensity "116"^^http://www.w3.org/2001/XMLSchema#double http://wikidata.org/graphs/France2012 .
http://wikidata.org/graphs/France2012 http://purl.org/dc/terms/date "2012"^^http://www.w3.org/2001/XMLSchema#year http://wikidata.org/graphs/France2012 . http://wikidata.org/graphs/France2012 http://purl.org/dc/terms/source _:source http://wikidata.org/graphs/France2012 . _:sourcehttp://purl.org/dc/terms/published "2010"^^http://www.w3.org/2001/XMLSchema#year http://wikidata.org/graphs/France2012 . _:sourcehttp://purl.org/dc/terms/title "Bilan demographique"@fr http://wikidata.org/graphs/France2012 .
The syntax is N-Quads. It does not use reification, but instead named graphs for provenance. The necessary concepts were already present in DBPedia.
As you might know, temporal provenance is not the strongest point of RDF. However conventions and solutions are available, and I am sure implementing them would require far less effort than creating a custom data model from scratch, not to mention the benefits of potential reuse. There's quite some research done on RDF provenance, which is worth looking into if provenance is really a key feature for Wikidata from day one. I see it as something that should work transparently behind the scenes, and therefore could be rolled-out later on.
You would get much better and more extensive advice than mine on semantic-web@w3.org -- the only prerequisite is willingness to cooperate.
RDF's strength is that it solves data integration problems by pivotal conversion, reducing the number of model transformations from quadratic to linear: http://en.wikipedia.org/wiki/Data_conversion#Pivotal_conversion A custom data model brings up questions which already have an answer in the Semantic Web stack: # can data from different Wikidata instances be merged or interlinked natively? # is there a native query language? In case of SQL, how performant will it be given many JOINs and the planned use of provenance? # what and how many custom serialization formats and API mechanisms will have to follow?
Stacking one custom solution on top of another can eventually result in huge costs. I honestly think the energy of Wikidata could be directed in a more productive way.
Martynas graphity.org
2012/4/5 Denny Vrandečićdenny.vrandecic@wikimedia.de:
Dear Martynas,
if you try to model the following statement in RDF
"The population density of France, as of an 2012 estimate, is 116 per square kilometer, according to the "Bilan demographique 2010"."
you might notice that RDF requires a reification of the statement. The data model that you have seen provides us with an abstract and concise way to talk about these reifications (i.e. via the statement model, just as in RDF).
We still have not finished the document describing how to map our data model to OWL/RDF, but we have thought about this the whole time while discussing the data model.
But if you find a simpler, and more RDFish way to express the above statement, please feel free to enlighten me. I would be indeed very interested.
Cheers, Denny
2012/4/5 Martynas Juseviciusmartynas@graphity.org
it doesn't look like reuse of existing concepts and standards is a priority for this project. One cannot build a Semantic Web application by ignoring its main building block, which is the RDF data model.
-- Project director Wikidata Wikimedia Deutschland e.V. | Eisenacher Straße 2 | 10777 Berlin Tel. +49-30-219 158 26-0 | http://wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Ivan Herman, W3C Semantic Web Activity Lead Home: http://www.w3.org/People/Ivan/ mobile: +31-641044153 FOAF: http://www.ivan-herman.net/foaf.rdf
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Dear Martynas,
I understand your concern. Markus already commented on the Quads and the fact that there are no where used in the way you suggest it. I am thankful for critically accompanying our design phase, but I would again ask you for some patience until we have the RDF mapping drafted.
We are very well aware of the advantages and importance of using Semantic Web standards. And I know well what a Semantic Web application is. Heck, check my CV. I've been two times finalist for the Semantic Web Challenge. And I know many people involved in the provenance working group, as well as those who have been working on standards like RDF and OWL. So even though this doesn't mean much -- people should be judged by their actions, not by their pedigree or social network -- I at least hope that you will be able to wait until we drafted our RDF mapping document. I would hope that, as soon as we do that, you will again critically review our work.
Cheers, Denny
2012/4/6 Martynas Jusevicius martynas@graphity.org
Hey Denny,
I gave it a shot:
http://dbpedia.org/resource/France http://dbpedia.org/ontology/PopulatedPlace/populationDensity "116"^^http://dbpedia.org/datatype/inhabitantsPerSquareKilometre http://wikidata.org/graphs/France2012 . http://dbpedia.org/resource/France http://dbpedia.org/ontology/populationDensity "116"^^http://www.w3.org/2001/XMLSchema#double http://wikidata.org/graphs/France2012 .
http://wikidata.org/graphs/France2012 http://purl.org/dc/terms/date "2012"^^http://www.w3.org/2001/XMLSchema#year http://wikidata.org/graphs/France2012 . http://wikidata.org/graphs/France2012 http://purl.org/dc/terms/source _:source http://wikidata.org/graphs/France2012 . _:source http://purl.org/dc/terms/published "2010"^^http://www.w3.org/2001/XMLSchema#year http://wikidata.org/graphs/France2012 . _:source http://purl.org/dc/terms/title "Bilan demographique"@fr http://wikidata.org/graphs/France2012 .
The syntax is N-Quads. It does not use reification, but instead named graphs for provenance. The necessary concepts were already present in DBPedia.
As you might know, temporal provenance is not the strongest point of RDF. However conventions and solutions are available, and I am sure implementing them would require far less effort than creating a custom data model from scratch, not to mention the benefits of potential reuse. There's quite some research done on RDF provenance, which is worth looking into if provenance is really a key feature for Wikidata from day one. I see it as something that should work transparently behind the scenes, and therefore could be rolled-out later on.
You would get much better and more extensive advice than mine on semantic-web@w3.org -- the only prerequisite is willingness to cooperate.
RDF's strength is that it solves data integration problems by pivotal conversion, reducing the number of model transformations from quadratic to linear: http://en.wikipedia.org/wiki/Data_conversion#Pivotal_conversion A custom data model brings up questions which already have an answer in the Semantic Web stack: # can data from different Wikidata instances be merged or interlinked natively? # is there a native query language? In case of SQL, how performant will it be given many JOINs and the planned use of provenance? # what and how many custom serialization formats and API mechanisms will have to follow?
Stacking one custom solution on top of another can eventually result in huge costs. I honestly think the energy of Wikidata could be directed in a more productive way.
Martynas graphity.org
2012/4/5 Denny Vrandečić denny.vrandecic@wikimedia.de:
Dear Martynas,
if you try to model the following statement in RDF
"The population density of France, as of an 2012 estimate, is 116 per
square
kilometer, according to the "Bilan demographique 2010"."
you might notice that RDF requires a reification of the statement. The
data
model that you have seen provides us with an abstract and concise way to talk about these reifications (i.e. via the statement model, just as in RDF).
We still have not finished the document describing how to map our data
model
to OWL/RDF, but we have thought about this the whole time while
discussing
the data model.
But if you find a simpler, and more RDFish way to express the above statement, please feel free to enlighten me. I would be indeed very interested.
Cheers, Denny
2012/4/5 Martynas Jusevicius martynas@graphity.org
it doesn't look like reuse of existing concepts and standards is a priority for this project. One cannot build a Semantic Web application by ignoring its main building block, which is the RDF data model.
-- Project director Wikidata Wikimedia Deutschland e.V. | Eisenacher Straße 2 | 10777 Berlin Tel. +49-30-219 158 26-0 | http://wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l