Does the MediaWiki software simply store every old version (1, 2, 3, ...) side by side in the "old" database table, or is there some "diff" involved in the stored format (like RCS does)?
Did any of this change between Phase II and Phase III?
Was it a non-issue or was there any (documented) design decision or discussion of this?
Lars Aronsson wrote:
Does the MediaWiki software simply store every old version (1, 2, 3, ...) side by side in the "old" database table, or is there some "diff" involved in the stored format (like RCS does)?
Did any of this change between Phase II and Phase III?
Was it a non-issue or was there any (documented) design decision or discussion of this?
AFAIK, it is still the same mechanism as in Phase II, but Brion works on on-the-fly compression for the old texts, as the old table gets quite large.
Magnus
Magnus Manske wrote:
AFAIK, it is still the same mechanism as in Phase II, but Brion works on on-the-fly compression for the old texts, as the old table gets quite large.
Magnus
The big win is not in compressing each individual version (version 6 of the article "London"), but in compressing the entire sequence of versions for each article, since so much is common between version 6, 7, and 8 of the same article.
This optimization is what RCS does by storing the current text in full and only the diffs that are needed to reproduce the next earlier version. Now, RCS has its roots in the 1970s and does this (1) in a text file, and (2) in a long sequence of diffs, which makes it very slow to extract version 1 of a text if the current version is 2314. I think that some of the more modern version control systems (?? aegis, arch, bitkeeper, darcs, perforce, subversion, ??) play around with hierarchical systems where every N:th version is stored in full.
Still, when restoring vandalism, version 6 and 8 might be identical, so storing the two diffs (back and forth) would be less than optimal. Further, when pieces of text are moved between two articles, the best compression would have to consider the whole table. Perhaps MySQL (or the underlying filesystem) should implement the compression.
I don't know if any existing version control system uses a relational database backend (MySQL, PostgreSQL, ...), but this would be an interesting combination independent of Wikipedia, so perhaps it should be develped as a generic component that can be used from Wikipedia as well as from other applications. Especially the way MediaWiki stores the changelog in a searchable relational table is a great improvement over primitive file-based systems such as RCS and CVS.
On Mon, Feb 02, 2004 at 01:28:57PM +0100, Lars Aronsson wrote:
Does the MediaWiki software simply store every old version (1, 2, 3, ...) side by side in the "old" database table, or is there some "diff" involved in the stored format (like RCS does)?
Did any of this change between Phase II and Phase III?
Was it a non-issue or was there any (documented) design decision or discussion of this?
It just stores the old versions in both Phase II and Phase III.
And, of course, nobody discussed anything.
wikitech-l@lists.wikimedia.org