[Wikidata-l] Data model (RDF)

List overview All Threads
Download

newer

older

[Wikidata-l] Press release about...

[Wikidata-l] Introducing myself

Martynas Jusevicius

28 Mar 2012 28 Mar '12

6:02 p.m.

Hey all,

I've been reading some of the technical notes on Wikidata, for example http://meta.wikimedia.org/wiki/Wikidata/Notes/Data_model http://meta.wikimedia.org/wiki/User:Nikola_Smolenski/Wikidata#Query_language

Statements like "[data model] similar to RDF, but allows qualified property values" and "should there be a query language that will enable querying of the data?" concern me a great deal regarding the future of the whole Wikidata project.

It seems to me that whoever is making these technical decisions does not fully realize the price of reinventing the bike -- or in this situation, reinventing data models/formats/standards. Having designed and implemented production-grade applications both on RDBMSs, XML, and RDF, I strogly suggest you should base Wikidata on standard RDF.

I know some/most of you are coming from the wiki background which might be hard to get over with, but if Wikidata is to become a free and open knowledge base on the (Semantic) Web, then RDF is the free and open industry standard for that. Whatever little advantage you would get from developing a custom non-standard data model, think how many man-years of standardization and tool development you would loose. Isn't knowledge about standing on the shoulders of giants? RDF has all the specifications, a variety of tools, and DBPedia as a very solid proof-of-concept (which I also think should be better integrated with this project) necessary to build Wikidata. With SPARQL Update, full read/write RDF roundtrip is possible (and works in practice). It also makes the notion of API rather obsolete, since SPARQL Update (and related mechanisms) is the only generic API-method one has to deal with.

To round up -- I think failure to realize the potential of RDF for Wikidata would be a huge waste of resources for this project, Wikipedia, and the general public.

Martynas graphity.org

Show replies by date

Ivan Herman

28 Mar 28 Mar

6:10 p.m.

Obviously, you all expect me to agree with Martynas, in view of my job, and I do:-). But I do not only because I work for W3C, but I indeed genuinely believe that reinventing things here may be way too costly on long term...

Note also that W3C may start a new group later this year that would look at a 'lower' level HTTP protocol to manage (read and write) RDF data without necessarily using SPARQL. This may be useful for the project as well.

Ivan

On Mar 28, 2012, at 18:02 , Martynas Jusevicius wrote:

...

Hey all,

I've been reading some of the technical notes on Wikidata, for example http://meta.wikimedia.org/wiki/Wikidata/Notes/Data_model http://meta.wikimedia.org/wiki/User:Nikola_Smolenski/Wikidata#Query_language

Statements like "[data model] similar to RDF, but allows qualified property values" and "should there be a query language that will enable querying of the data?" concern me a great deal regarding the future of the whole Wikidata project.

It seems to me that whoever is making these technical decisions does not fully realize the price of reinventing the bike -- or in this situation, reinventing data models/formats/standards. Having designed and implemented production-grade applications both on RDBMSs, XML, and RDF, I strogly suggest you should base Wikidata on standard RDF.

I know some/most of you are coming from the wiki background which might be hard to get over with, but if Wikidata is to become a free and open knowledge base on the (Semantic) Web, then RDF is the free and open industry standard for that. Whatever little advantage you would get from developing a custom non-standard data model, think how many man-years of standardization and tool development you would loose. Isn't knowledge about standing on the shoulders of giants? RDF has all the specifications, a variety of tools, and DBPedia as a very solid proof-of-concept (which I also think should be better integrated with this project) necessary to build Wikidata. With SPARQL Update, full read/write RDF roundtrip is possible (and works in practice). It also makes the notion of API rather obsolete, since SPARQL Update (and related mechanisms) is the only generic API-method one has to deal with.

To round up -- I think failure to realize the potential of RDF for Wikidata would be a huge waste of resources for this project, Wikipedia, and the general public.

Martynas graphity.org

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

---- Ivan Herman, W3C Semantic Web Activity Lead Home: http://www.w3.org/People/Ivan/ mobile: +31-641044153 FOAF: http://www.ivan-herman.net/foaf.rdf

JFC Morfin

7:33 p.m.

At 18:10 28/03/2012, Ivan Herman wrote:

...

Obviously, you all expect me to agree with Martynas, in view of my job, and I do:-). But I do not only because I work for W3C, but I indeed genuinely believe that reinventing things here may be way too costly on long term...

Note also that W3C may start a new group later this year that would look at a 'lower' level HTTP protocol to manage (read and write) RDF data without necessarily using SPARQL. This may be useful for the project as well.

Ivan

On Mar 28, 2012, at 18:02 , Martynas Jusevicius wrote:

...
Hey all,

I've been reading some of the technical notes on Wikidata, for example http://meta.wikimedia.org/wiki/Wikidata/Notes/Data_model

http://meta.wikimedia.org/wiki/User:Nikola_Smolenski/Wikidata#Query_language

I would suggest that at this stage we need to have a comprehensive understanding of all the possible options.

1.Has someone published as page listing the best *summary* documenting and argumenting each of the existing options. I suppose we will want a table indicating which existing solution answers which listed need?

2. Since we have a W3C expert: what is the best document/book to get a comprehensive and clear (not too massive) documentation on the semantic web?

3. What are the relations with the JTC1/SC32/WG2? Is investing time in their documents appropriate?

Thank you! jfc

Denny Vrandečić

7:20 p.m.

Hi Martynas,

that is a good observation! First, rest assured -- there are a number of people involved in Wikidata who have very intimate knowledge of RDF, OWL, SPARQL -- some of us have been working on these standards actually :) We do fully understand the value of these standards.

We will export our data in RDF. But this does not mean that our internal data model has to be RDF. Think about Drupal or Semantic MediaWiki: both export a lot of their data in RDF, but their internal data models are very different. And still, they are great citizens of the Web of Data, I'd reckon. Or even think about Wikipedia: obviously, articles of Wikipedia are "exported" as HTML, so that browsers can display them. But the internal mark-up language to create, edit, and maintain the articles is not HTML, but MediaWiki syntax.

I hope this helps with your concerns :)

Cheers, Denny

2012/3/28 Martynas Jusevicius martynas@graphity.org

...

Hey all,

I've been reading some of the technical notes on Wikidata, for example http://meta.wikimedia.org/wiki/Wikidata/Notes/Data_model

http://meta.wikimedia.org/wiki/User:Nikola_Smolenski/Wikidata#Query_language

Statements like "[data model] similar to RDF, but allows qualified property values" and "should there be a query language that will enable querying of the data?" concern me a great deal regarding the future of the whole Wikidata project.

It seems to me that whoever is making these technical decisions does not fully realize the price of reinventing the bike -- or in this situation, reinventing data models/formats/standards. Having designed and implemented production-grade applications both on RDBMSs, XML, and RDF, I strogly suggest you should base Wikidata on standard RDF.

I know some/most of you are coming from the wiki background which might be hard to get over with, but if Wikidata is to become a free and open knowledge base on the (Semantic) Web, then RDF is the free and open industry standard for that. Whatever little advantage you would get from developing a custom non-standard data model, think how many man-years of standardization and tool development you would loose. Isn't knowledge about standing on the shoulders of giants? RDF has all the specifications, a variety of tools, and DBPedia as a very solid proof-of-concept (which I also think should be better integrated with this project) necessary to build Wikidata. With SPARQL Update, full read/write RDF roundtrip is possible (and works in practice). It also makes the notion of API rather obsolete, since SPARQL Update (and related mechanisms) is the only generic API-method one has to deal with.

To round up -- I think failure to realize the potential of RDF for Wikidata would be a huge waste of resources for this project, Wikipedia, and the general public.

Martynas graphity.org

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

-- Project director Wikidata Wikimedia Deutschland e.V. | Eisenacher Straße 2 | 10777 Berlin Tel. +49-30-219 158 26-0 | http://wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

Martynas Jusevicius

7:32 p.m.

Hey Denny,

now this is a much more subjective remark then my previous one, but based on my experience, data model conversions are some of the most unnecessary and buggy parts of Web app architecture. I don't know how familiar you are with RDF, but if you're used to traditional Web frameworks, you might not even image how simple, pure, and generic a fully RDF-based architecture can be. I mean an architecture where not only data from different datasources is linked via RDF, but where every component of the architecture itself is addressed and stored as RDF.

This was precisely the topic of our position paper for Linked Enterprise Data Patterns workshop at W3C last December, take a look if you have time: http://www.w3.org/2011/09/LinkedData/ledp2011_submission_1.pdf

Martynas graphity.org

2012/3/28 Denny Vrandečić denny.vrandecic@wikimedia.de:

...

Hi Martynas,

that is a good observation! First, rest assured -- there are a number of people involved in Wikidata who have very intimate knowledge of RDF, OWL, SPARQL -- some of us have been working on these standards actually :) We do fully understand the value of these standards.

We will export our data in RDF. But this does not mean that our internal data model has to be RDF. Think about Drupal or Semantic MediaWiki: both export a lot of their data in RDF, but their internal data models are very different. And still, they are great citizens of the Web of Data, I'd reckon. Or even think about Wikipedia: obviously, articles of Wikipedia are "exported" as HTML, so that browsers can display them. But the internal mark-up language to create, edit, and maintain the articles is not HTML, but MediaWiki syntax.

I hope this helps with your concerns :)

Cheers, Denny

2012/3/28 Martynas Jusevicius martynas@graphity.org

...
Hey all,

I've been reading some of the technical notes on Wikidata, for example http://meta.wikimedia.org/wiki/Wikidata/Notes/Data_model

http://meta.wikimedia.org/wiki/User:Nikola_Smolenski/Wikidata#Query_language

Statements like "[data model] similar to RDF, but allows qualified property values" and "should there be a query language that will enable querying of the data?" concern me a great deal regarding the future of the whole Wikidata project.

It seems to me that whoever is making these technical decisions does not fully realize the price of reinventing the bike -- or in this situation, reinventing data models/formats/standards. Having designed and implemented production-grade applications both on RDBMSs, XML, and RDF, I strogly suggest you should base Wikidata on standard RDF.

I know some/most of you are coming from the wiki background which might be hard to get over with, but if Wikidata is to become a free and open knowledge base on the (Semantic) Web, then RDF is the free and open industry standard for that. Whatever little advantage you would get from developing a custom non-standard data model, think how many man-years of standardization and tool development you would loose. Isn't knowledge about standing on the shoulders of giants? RDF has all the specifications, a variety of tools, and DBPedia as a very solid proof-of-concept (which I also think should be better integrated with this project) necessary to build Wikidata. With SPARQL Update, full read/write RDF roundtrip is possible (and works in practice). It also makes the notion of API rather obsolete, since SPARQL Update (and related mechanisms) is the only generic API-method one has to deal with.

To round up -- I think failure to realize the potential of RDF for Wikidata would be a huge waste of resources for this project, Wikipedia, and the general public.

Martynas graphity.org

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

-- Project director Wikidata Wikimedia Deutschland e.V. | Eisenacher Straße 2 | 10777 Berlin Tel. +49-30-219 158 26-0 | http://wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

4632

Age (days ago)

4632

Last active (days ago)

wikidata@lists.wikimedia.org

4 comments

4 participants

tags (0)

participants (4)

Denny Vrandečić
Ivan Herman
JFC Morfin
Martynas Jusevicius