On Sun, Jan 17, 2010 at 9:20 AM, Denny Vrandecic
<denny.vrandecic(a)kit.edu> wrote:
the use you may need seems to be a lot like what
Semantic MediaWiki is offering.
I don't know if Wikisource would consider it, but adding user-curated metadata
using a user-generated vocabulary, and being able to query it internally (as
well as exporting it externally) is pretty much what we do.
The major problem with SMW in the past has been, AFAIK, that it's an
enormous amount of code written totally separately from MediaWiki by
different people, and would need to be reviewed in its entirety by
someone like Tim Starling before it could be enabled on any Wikimedia
site. I recall Tim looking briefly at the code and taking a few
minutes to find an XSS exploit. There are also likely to be major
performance issues scaling to Wikipedia (correct me if I'm wrong). So
I wouldn't bet on any progress here anytime soon, especially since
we're way behind on reviewing even existing core code, let alone large
new extensions.
A much more probable method of progress would be to try committing
more modest features incrementally to core, or to small
special-purpose extensions. I don't think it would be very hard at
all to have the API output a machine-readable summary of the template
parameters used on a given page. I might do that today as a
proof-of-concept. If I do, then someone familiar with RDF and PHP
could probably write a fairly simple patch to turn this code into RDF
output. From there it would be pretty simple to write a maintenance
script to output RDF for the template parameters on all pages on a
wiki, and we could see about incorporating that into the regular
Wikipedia data dump.
Notably, this doesn't try to actually use the data on the wiki, so
should have no scalability issues. It should also be small enough to
put in core with no problems, so all MW wikis could be outputting RDF
for their template parameters out of the box. My understanding is
that it's expected that data providers may output RDF in whatever
format is convenient to them, and someone will have to write OWL to
turn this into more conventional formats. But we can output the raw
data reasonably easily, at least.