On 11/18/05, Max Voelkel <max(a)xam.de> wrote:
Dear Wikipedia-Wizards,
we are a group of four researchers building an extension for
Wikipedia, called the "Semantic Wikipedia", which is technically a
MediaWiki extension.
As a short summary, it allows users to type links, which yields to
the creation of semantic metadata (page name, link type, link
target). In a similar fashion we allow for the annotation of
attributes. If this project will be deployed on Wikipedia, a huge
amount of machine-processable data could be generated. We will
provide an RDF export per page and a SPARQL-query endpoint for the
whole Semantic Wikipedia (SPARQL is like SQL, but more adapted to
the data model of RDF, a building block of the semantic web).
Currently, we have two problems and would be glad if you help us:
1.
The tool stack in the semantic web community is mainly built on
Java. For C, there is only on "triple store" (which is needed for
efficient RDF storage & querying). The only candiadate we have,
"3store" is not very mature - but many Java stores are. Especially
the open-source system "Sesame" (
openrdf.org) would be our choice
for implementation. But, as far as I understand Wikipedia, Java is
not open source enough, as there is no open source implementation of
Java itself? Is this true or just a rumor?
Look at [[GCJ]] on enwiki. There are free implementations but they
are not good enough for all applications.
Perhaps I'm hopeless out of touch with the times, but why can't this
information be stored in a normal SQL database. I'm sure a parser
could be written to rewrite SPARQL queries into efficiently executing
SQL queries.
On my on and off again analysis copy of Wikipedia I use the hstore
module in PGsql to store name=value pairs for every revision. If you
were only talking about storing metadata for the current versions
articles it would likely be quite efficent to carry a relation which
contains source,type,dest tuples for each of your links. Even MySQL
could do a pretty good job for this.
2.
Syntax. We had to extend the syntax slightly to enable annotations
of links and data values. Currently we settled down to use
[[link type::link target|optional alternate label]]
Sample, on page "London": ... is in [[located in::England]] ...
Renders as: ... is in England .... (England = Linked)
for relations, and for attributes.
[[attribute type:=data value with unit|optional alternate label]]
Sample, on page "London": ... rains on [[rain:=234 days/year]] ....
Renders as .... rains on 234 days/year (nothing linked)
For a full explanation of whay and what we try to do,
you can also have a look at a paper, which we wrote for a
conference:
http://www.aifb.uni-karlsruhe.de/Publikationen/showPublikation_english?publ…
There would appear to be some potential conversion with namespaces. It
might be advisable to use a character which could currently not appear
in an internal link.
It might also be advisable to not make them look like internal links.
We've already overloaded [[]] quite a bit with things which are not
quite the same as internal links conceptually, or syntactically
(categories, images).
Can semantic links carry an additional attribute beyond their type, i.e.
on [[Bill Clinton]] [[us presidential succession::George W.
Bush:=next]] or would it have to be [[next us president::George W.
Bush]]
If it's just the latter case, then all it is is a typed directed graph
and there are a wealth of fast algorithms available for searching and
storing the relationships.