On Mon, 12 Sep 2005 01:56:53 -0700, Brion Vibber wrote:
Netocrat wrote:
What is/are the reason/s for storing the full text of page revisions in the database as opposed to some form of differential?
Expedience; it hasn't been written yet.
Am I correct in assuming that speed has been given priority over storage space requirements, and if so, has any benchmarking been done to find out how much overhead would be added by storing revision as diffs and how much space would be saved?
See Tim's presentation from 21C3: http://zwinger.wikimedia.org/berlin/
That's exactly the sort of info I was looking for. Was any attempt made to compress the diffs? I would be interested to know how the result compared for compression and overall speed to the compressed concatenated revisions.
The three main reasons to find an improvement to rcs diffs were stated as: * moved paragraphs * reverted edits * minor changes within a line
The 1st and 3rd could be handled by a customised diff format and the 2nd could be handled by links in the database - have those possibilities been considered and what pros/cons are there to this approach vs the current compression scheme?
The disadvantage to the current compression scheme seems to me to be that the wiki software must work on the full text of a set of revisions at a time (i.e. when uncompressed).
Also, has there been any discussion of the possibility of branching a page (as is possible in e.g. a CVS repository)?
Not really. Tagging of revisions is likely to happen soonish, branching not so likely.
Being able to specify a particular revision in a link would be useful - I presume that's why tagging is being considered.