Christopher Sahnwaldt schrieb:
I'm pretty new to MediaWiki and I'm not sure if I understand this correctly... Here's my attempt at spelling it out in a bit more detail:
When a user edits a page and sends the new text to the server, the server / the RDF extension parses the text, extracts the desired data and saves it in a RDF store.
I hope I got that about right - please correct me if not!
More or less - the parser parses the text, and hands the bit that is RDF (turtle) to the RDF-Extension for analysis. It analyzes the statements and would save it to the database (this is not yet implemented).
Now when I think about the pros and cons of having this process run integrated in MediaWiki or on a different server, a few questions come up... again, I'm new to MediaWiki, so these may be newbie questions... :-)
How much parsing does MediaWiki currently do when it stores new text for an article? Are templates expanded / transcluded?
There is a preprocessor that expands all templates recursively. After that, the real "parser" (read: munger) is invoked to turn wiki text into HTML.
In the case of a "semantified" infobox, the substitution process would generate RDF/Turtle statements using the template parameters. These would in turn be handed to the RDF extension, which would write the resulting triples to the database.
How are updates distributed? Do subscribers regularly poll the server for recent changes? Or is there some kind of store-and-forward / publish-subscribe?
There is the RSS/Atom feed (human readable, not easy to parse), and an OAI-PMH interface ("life update feed"). There's also the web API for polling data in a machine readable form, and there's the RC ("recent changes") channel on IRC (human readable, can't be parsed reliably). True XMPP based pubsub is being worked on, see http://brightbyte.de/page/RecentChanges_via_Jabber.
-- daniel