Dear Wikipedia-Wizards,
we are a group of four researchers building an extension for Wikipedia, called the "Semantic Wikipedia", which is technically a MediaWiki extension.
The project is described here: http://meta.wikimedia.org/wiki/Semantic_MediaWiki
can be used as a demo here: http://wiki.ontoworld.org
As a short summary, it allows users to type links, which yields to the creation of semantic metadata (page name, link type, link target). In a similar fashion we allow for the annotation of attributes. If this project will be deployed on Wikipedia, a huge amount of machine-processable data could be generated. We will provide an RDF export per page and a SPARQL-query endpoint for the whole Semantic Wikipedia (SPARQL is like SQL, but more adapted to the data model of RDF, a building block of the semantic web).
Currently, we have two problems and would be glad if you help us:
1. The tool stack in the semantic web community is mainly built on Java. For C, there is only on "triple store" (which is needed for efficient RDF storage & querying). The only candiadate we have, "3store" is not very mature - but many Java stores are. Especially the open-source system "Sesame" (openrdf.org) would be our choice for implementation. But, as far as I understand Wikipedia, Java is not open source enough, as there is no open source implementation of Java itself? Is this true or just a rumor?
2. Syntax. We had to extend the syntax slightly to enable annotations of links and data values. Currently we settled down to use
[[link type::link target|optional alternate label]]
Sample, on page "London": ... is in [[located in::England]] ... Renders as: ... is in England .... (England = Linked)
for relations, and for attributes.
[[attribute type:=data value with unit|optional alternate label]]
Sample, on page "London": ... rains on [[rain:=234 days/year]] .... Renders as .... rains on 234 days/year (nothing linked)
For a full explanation of whay and what we try to do, you can also have a look at a paper, which we wrote for a conference: http://www.aifb.uni-karlsruhe.de/Publikationen/showPublikation_english?publ_...
BTW: I promised Jimmy (in San Diego) to explain him, what the semantic web is. I still work on that :-)
Thanks a lot in advance,
Kind regards,
Max Völkel -- Dipl.-Inform. Max Völkel University of Karlsruhe, AIFB, Knowledge Management Group mvo@aifb.uni-karlsruhe.de +49 721 608-4754 www.xam.de
On 11/18/05, Max Voelkel max@xam.de wrote:
Dear Wikipedia-Wizards, we are a group of four researchers building an extension for Wikipedia, called the "Semantic Wikipedia", which is technically a MediaWiki extension. As a short summary, it allows users to type links, which yields to the creation of semantic metadata (page name, link type, link target). In a similar fashion we allow for the annotation of attributes. If this project will be deployed on Wikipedia, a huge amount of machine-processable data could be generated. We will provide an RDF export per page and a SPARQL-query endpoint for the whole Semantic Wikipedia (SPARQL is like SQL, but more adapted to the data model of RDF, a building block of the semantic web).
Currently, we have two problems and would be glad if you help us:
The tool stack in the semantic web community is mainly built on Java. For C, there is only on "triple store" (which is needed for efficient RDF storage & querying). The only candiadate we have, "3store" is not very mature - but many Java stores are. Especially the open-source system "Sesame" (openrdf.org) would be our choice for implementation. But, as far as I understand Wikipedia, Java is not open source enough, as there is no open source implementation of Java itself? Is this true or just a rumor?
Look at [[GCJ]] on enwiki. There are free implementations but they are not good enough for all applications.
Perhaps I'm hopeless out of touch with the times, but why can't this information be stored in a normal SQL database. I'm sure a parser could be written to rewrite SPARQL queries into efficiently executing SQL queries.
On my on and off again analysis copy of Wikipedia I use the hstore module in PGsql to store name=value pairs for every revision. If you were only talking about storing metadata for the current versions articles it would likely be quite efficent to carry a relation which contains source,type,dest tuples for each of your links. Even MySQL could do a pretty good job for this.
Syntax. We had to extend the syntax slightly to enable annotations of links and data values. Currently we settled down to use
[[link type::link target|optional alternate label]] Sample, on page "London": ... is in [[located in::England]] ... Renders as: ... is in England .... (England = Linked)
for relations, and for attributes.
[[attribute type:=data value with unit|optional alternate label]] Sample, on page "London": ... rains on [[rain:=234 days/year]] .... Renders as .... rains on 234 days/year (nothing linked)
For a full explanation of whay and what we try to do, you can also have a look at a paper, which we wrote for a conference: http://www.aifb.uni-karlsruhe.de/Publikationen/showPublikation_english?publ_...
There would appear to be some potential conversion with namespaces. It might be advisable to use a character which could currently not appear in an internal link. It might also be advisable to not make them look like internal links. We've already overloaded [[]] quite a bit with things which are not quite the same as internal links conceptually, or syntactically (categories, images).
Can semantic links carry an additional attribute beyond their type, i.e. on [[Bill Clinton]] [[us presidential succession::George W. Bush:=next]] or would it have to be [[next us president::George W. Bush]]
If it's just the latter case, then all it is is a typed directed graph and there are a wealth of fast algorithms available for searching and storing the relationships.
On 11/18/05, Max Voelkel max@xam.de wrote:
Syntax. We had to extend the syntax slightly to enable annotations of links and data values. Currently we settled down to use
The problem with your extending of the syntax is that it conflicts with existing titles both in theory and in practice, pages with double colons though rare do exist, for example the Code::Blocks article on enwiki and numerous user pages on other wikis.
Having said that you could either break compatability with such titles or use some of the characters currently not allowed in titles which are:
* + * < * > * [ * ] * { * | * }
[] aren't practical since they already delimit the link (unless you wanted horrors like [[located in[England]]), {} are already used for templates and | would be ambiguous that leaves you with <> and +. and [[location>England]] or [[location=>England]] doesn't look all that bad.
[[attribute type:=data value with unit|optional alternate label]] Sample, on page "London": ... rains on [[rain:=234 days/year]] .... Renders as .... rains on 234 days/year (nothing linked)
Say I also wanted to link an assigned value, say make a link to [[234 days/year]] how would I do that? [[[[rain:=234 days/year]]]] doesn't work.
Max Voelkel wrote:
Syntax. We had to extend the syntax slightly to enable annotations of links and data values. Currently we settled down to use
[[link type::link target|optional alternate label]] Sample, on page "London": ... is in [[located in::England]] ... Renders as: ... is in England .... (England = Linked)
for relations, and for attributes.
[[attribute type:=data value with unit|optional alternate label]] Sample, on page "London": ... rains on [[rain:=234 days/year]] .... Renders as .... rains on 234 days/year (nothing linked)
Why not make a nice extension, and wrap it in templates?
<semantic_attribute> key=rain value=234 unit=days/year label=sometext </semantic_attribute>
which can be generated by
{{attribute|rain|234|days/year|sometext}}
The extension can hold a table of default units and labels, in case they are omitted:
{{attribute|rain|234| | }} uses "unit=days/year" and "label=rains on VALUE UNIT", because of key "rain".
An extension could also extract the attribute list and display it in the sidebar, like language links, instead or in addition to displaying them inline.
Instead of the special link syntax, a similar extension called by {{linkto|target|type|label}} could be used.
Magnus
On 11/20/05, Magnus Manske magnus.manske@web.de wrote:
Max Voelkel wrote:
Syntax. We had to extend the syntax slightly to enable annotations of links and data values. Currently we settled down to use
[[link type::link target|optional alternate label]] Sample, on page "London": ... is in [[located in::England]] ... Renders as: ... is in England .... (England = Linked)
for relations, and for attributes.
[[attribute type:=data value with unit|optional alternate label]] Sample, on page "London": ... rains on [[rain:=234 days/year]] .... Renders as .... rains on 234 days/year (nothing linked)
Why not make a nice extension, and wrap it in templates?
<semantic_attribute> key=rain value=234 unit=days/year label=sometext </semantic_attribute>
which can be generated by
{{attribute|rain|234|days/year|sometext}}
In the current parser that would not be possible, you can't pass template arguments to extensions. Also, if it's inline it's likelier to get updated.
Ævar Arnfjörð Bjarmason wrote:
In the current parser that would not be possible, you can't pass template arguments to extensions. Also, if it's inline it's likelier to get updated.
Funny, last time I checked on my citation feature it worked quite well. That has been a few weeks ago, though.
Magnus
On 11/20/05, Magnus Manske magnus.manske@web.de wrote:
Ævar Arnfjörð Bjarmason wrote:
In the current parser that would not be possible, you can't pass template arguments to extensions. Also, if it's inline it's likelier to get updated.
Funny, last time I checked on my citation feature it worked quite well. That has been a few weeks ago, though.
When you make a template at Template:Extension with the contents: """ <hook> arg = {{{1}}} </hook> """
and call it at a page with {{Extension|myarg}} the output is:
""" arg = {{{1}}} """
I.e. the {{{1}}} is not interpolated, calling the parser on it won't work either as you'll be dealing with a new instance of the parser which won't replace those variables because as far as it's concerned it hasn't been called with any.
Now, your idea of having a template wrap it as {{attribute|rain|...}} would presumably require a template at Template:Attribute with contents like:
""" <attr> weather = {{{1}}} .... </attr> """
And as I've demonstrated that doesn't work, so what exactly did work the last time you checked it?
Ævar Arnfjörð Bjarmason wrote:
On 11/20/05, Magnus Manske magnus.manske@web.de wrote:
Ævar Arnfjörð Bjarmason wrote:
In the current parser that would not be possible, you can't pass template arguments to extensions. Also, if it's inline it's likelier to get updated.
Funny, last time I checked on my citation feature it worked quite well. That has been a few weeks ago, though.
When you make a template at Template:Extension with the contents: """
<hook> arg = {{{1}}} </hook> """
and call it at a page with {{Extension|myarg}} the output is:
""" arg = {{{1}}} """
I.e. the {{{1}}} is not interpolated, calling the parser on it won't work either as you'll be dealing with a new instance of the parser which won't replace those variables because as far as it's concerned it hasn't been called with any.
Now, your idea of having a template wrap it as {{attribute|rain|...}} would presumably require a template at Template:Attribute with contents like:
"""
<attr> weather = {{{1}}} .... </attr> """
And as I've demonstrated that doesn't work, so what exactly did work the last time you checked it?
Damn, I distinctly remember I ported the fix for this, based on something I found on bugzilla.
Maybe someone turned it off again because of unpleasant side effects? I'll keep you posted if I can find it again.
Magnus
wikitech-l@lists.wikimedia.org