On 6/16/06, Tim Starling t.starling@physics.unimelb.edu.au wrote:
Deleted text has always been much, much smaller than the non-deleted text, and the old image directories have always been much smaller than the current image directories.
[snip]
Just to interject some data into the discussion.
There are 489678 image pages on enwiki. There are 207320 distinct image page titles which have been deleted. If we presume that deleted image pages with deletions have the same number of image versions and similar image sizes then we can say roughly that preserving deleteds would have increased our enwiki storage requirements by 50%. Because a substantial chunk of these may have been deletions from objects moved to commons, past performance may not indicate future results. Although with 73,214 (vs 148k uploads) image pages deleted on enwiki in the first three months of the year, we can expect the future storage requirements to be non-trivial.
Commons on the other hand has a far smaller number of deleted image pages (~40k), no doubt due to the sites lower profile for uninformed users leading to less problems images, the lack of a migration causing deletions, and less vigorous copyright enforcement.
In any case even if keeping deleteds were to double our image capacity requirements they may well be worth it (consider detecting reuploads of previously deleted images, even modified copies of previously deleted images) ... but lets not forget to calculate the real cost of storage (it's not just the $120 that a 320gb disk costs... think of backup, etc).