On Wed, Jan 19, 2011 at 7:49 PM, Happy-melon <happy-melon(a)live.com> wrote:
"Anthony" <wikimail(a)inbox.org> wrote
in message
news:AANLkTi=UK+UF3y_B+ZLd57WCfUEF_7rf-Bt8TNvtg+2f@mail.gmail.com...
No, that's not the question. The question is
why are you
uncompressing and undiffing (from DiffHistoryBlobs) only to recompress
(to bz2) and then uncompress and recompress (to 7z) when you can get
roughly the same compression by just extracting the blobs and removing
any non-public data.
That's probably not nearly as straightforward as it sounds.
I have no idea how straightforward it sounds, so I won't argue with that.
RevDel'd and
suppressed revisions are not removed from the text storage; even Oversighted
revisions are left there, only the entry in the revision table is removed or
altered. I don't know OTTOMH how regularly the DiffHistoryBlob system
stores a 'key frame', and how easy it would be to break diff chains in order
to snip out non-public data from them, but I'd guess a) not very, and b)
that the current code doesn't give any consideration to doing so because
there's no reason for it to do so. So refactoring it to incorporate that,
while not impossible, is a non-trivial amount of work.
It wouldn't be trivial, but it wouldn't be particularly hard either.
Most of the work is already being done. It's just being done
inefficiently.
On Wed, Jan 19, 2011 at 7:49 PM, Happy-melon <happy-melon(a)live.com> wrote:
And there are
lots of lower-priority things that are being done. And
lots of dollars sitting on the sidelines doing nothing.
Low-priority interesting things tend to get done when you have volunteers
doing them. While the value of some of the Foundation's expenditure is
commonly debated, I think you'd struggle to argue that many of the WMF's
dollars are not doing *anything*.
Last I checked there were millions of them sitting in the bank.