The storage of old article revisions on Wikipedia is
taking a few gigs
now, and getting larger.
About how much are we talking here? Space is cheap.
We could store old revisions as binary blobs
compressed with gzip
instead of raw text
If we do this, we should not do this on the last n revisions, so that
recent changes are still quickly available for diffs. I suggest n=5.
But yes, I was afraid we would be running into this problem sooner or
later when I first realized that we were, in fact, storing every version.
Some 30k articles are edited dozens of times in edit wars. We could, of
course, also use diffs+snapshots every x revisions.
I'm also thinking of a good way to implement my "merge subsequent edits by
one user" proposal to drastically reduce the number of edits.
Regards,
Erik