On 12/1/05, Erik Moeller erik_moeller@gmx.de wrote: [snip]
My main challenge at the moment is to find a scalable model for versioning multiple tables with complex relations between them. Versioning is necessary for the "wiki" in "Wikidata", but at some level of complexity, it gets very tricky.
If anyone is interested in working on Wikidata, please contact me again (you may have done so before - sorry if something fell through). We're making substantial progress and are also exploring new sources of funding for future development.
Eh, I don't see what it's hard. Just do it like Mediawiki.
Every 'table' must have a identifying key which must be non-null, immutable, and will have all the history attached.
For every wikidata table there are actually two tables in the database, a item table and a revision table. In item, the key will be constrained to be unique and non-null, in the revision table it will be just non-null.
Generally just follow mediawiki for the fields, but rather than text have your data fields.
What will generally be a problem is that many forms of wiki data will want to have the ability to associate random name-value pairs with items. like successor="George W. Bush".
A pure SQL approach would be to have another (pair of) tables (due to versioning) with item_key,name,value but the performance of that approach would be poor because of locality issues. If MySQL supported clutering tables on a field (does it?) then you could cluster on item_key, and carry indexes on item_key and name,value.
If MySQL had an indexed hstore datatype like PostgreSQL then name-value data could be stored without additional tables. Which would be simpler and would perform well as long as the name-value data wasn't too big.