On Sun, Jan 17, 2010 at 9:20 AM, Denny Vrandecic denny.vrandecic@kit.edu wrote:
the use you may need seems to be a lot like what Semantic MediaWiki is offering. I don't know if Wikisource would consider it, but adding user-curated metadata using a user-generated vocabulary, and being able to query it internally (as well as exporting it externally) is pretty much what we do.
The major problem with SMW in the past has been, AFAIK, that it's an enormous amount of code written totally separately from MediaWiki by different people, and would need to be reviewed in its entirety by someone like Tim Starling before it could be enabled on any Wikimedia site. I recall Tim looking briefly at the code and taking a few minutes to find an XSS exploit. There are also likely to be major performance issues scaling to Wikipedia (correct me if I'm wrong). So I wouldn't bet on any progress here anytime soon, especially since we're way behind on reviewing even existing core code, let alone large new extensions.
A much more probable method of progress would be to try committing more modest features incrementally to core, or to small special-purpose extensions. I don't think it would be very hard at all to have the API output a machine-readable summary of the template parameters used on a given page. I might do that today as a proof-of-concept. If I do, then someone familiar with RDF and PHP could probably write a fairly simple patch to turn this code into RDF output. From there it would be pretty simple to write a maintenance script to output RDF for the template parameters on all pages on a wiki, and we could see about incorporating that into the regular Wikipedia data dump.
Notably, this doesn't try to actually use the data on the wiki, so should have no scalability issues. It should also be small enough to put in core with no problems, so all MW wikis could be outputting RDF for their template parameters out of the box. My understanding is that it's expected that data providers may output RDF in whatever format is convenient to them, and someone will have to write OWL to turn this into more conventional formats. But we can output the raw data reasonably easily, at least.