On Jan 17, 2010, at 16:11, Aryeh Gregor wrote:
On Sun, Jan 17, 2010 at 9:20 AM, Denny Vrandecic denny.vrandecic@kit.edu wrote:
the use you may need seems to be a lot like what Semantic MediaWiki is offering. I don't know if Wikisource would consider it, but adding user-curated metadata using a user-generated vocabulary, and being able to query it internally (as well as exporting it externally) is pretty much what we do.
The major problem with SMW in the past has been, AFAIK, that it's an enormous amount of code written totally separately from MediaWiki by different people, and would need to be reviewed in its entirety by someone like Tim Starling before it could be enabled on any Wikimedia site. I recall Tim looking briefly at the code and taking a few minutes to find an XSS exploit. There are also likely to be major performance issues scaling to Wikipedia (correct me if I'm wrong). So I wouldn't bet on any progress here anytime soon, especially since we're way behind on reviewing even existing core code, let alone large new extensions.
I was not talking about WIkipedia -- even though our scalability tests suggest that it could work there, but it is hard to say in advance without testing on the actual WMF server farm. I am merely talking about Wikisource, and wondering if it could be used to solve the problems they have, right now.
Furthermore, the code has had some peer review by now, it is used by sites like Wikia. Our code is getting smaller and we are incorporating comments. It would be great to get further reviews.
So, as said, I am only talking about Wikisource. I think it could be a viable solution for them.
Notably, this doesn't try to actually use the data on the wiki, so should have no scalability issues. It should also be small enough to put in core with no problems, so all MW wikis could be outputting RDF for their template parameters out of the box. My understanding is that it's expected that data providers may output RDF in whatever format is convenient to them, and someone will have to write OWL to turn this into more conventional formats. But we can output the raw data reasonably easily, at least.
Since for the requirements of Wikisource it seems that it would be helpful that the wiki itself stores and uses the data (e.g. give me all the chapters in their order of that book written by X between 1920 and 1940), I was wondering if an extension that does that could be helpful. It is obviously and entirely possible to have the metadata be generated by the RDFa-extension, the metadata be harvested by an external tool, the queries be processed by an external tool, and the result be uploaded to the wiki. It may be a bit easier for Wikisource if the wiki did it, since it could potentially enable more users to perform these tasks.
In the case of Wikisource I'd further suggest to switch off the additional annotation syntax of SMW on go for a modus were the templates do the whole annotation, but that again is an implementation detail that has to be decided by the Wikisource community.
Cheers, denny