[MediaWiki-l] Permanently remove old revisions and unused files?

Mickey Feldman mfeldman at vigil.com
Tue Feb 9 14:31:21 UTC 2016

The wiki is now about 10 GB. It does compress to about 1 GB. Although 
text pages are saved as their differences, as far as I know new versions 
of images are saved in their entirety. Even saving only diffs, hundreds 
of revisions of thousands of pages does add up.

I want to do a daily off-site backup. This wiki is on a shared virtual 
host without command line access, thus no ability to use rsync over ssh, 
which would allow only the changes to be moved. An entire image of the 
system needs to be saved so that it can be restored fairly painlessly. I 
want to shrink this as much as possible, since I am already running into 
problems with the archiving of the system on the host due to size. For 
example, I have had to compress each branch of the images folder 
individually - gzipping it into a single archive fails, despite the 
hosting company having bumped timeouts and memory allowances.

Moving to a host with complete command line access might be a solution, 
but currently the hosting company deals with security issues (beyond 
allowing only  authorized users to log in of course). If we go to 
something like rackspace, then security becomes our problem, and I don't 
pretend to have sufficient expertise in that.

Suggestions and alternatives welcome.

> ----------------------------------------------------------------------
> Message: 1
> Date: Fri, 5 Feb 2016 16:28:49 +0000
> From: Daniel Barrett <danb at cimpress.com>
> To: MediaWiki announcements and site admin list
> 	<mediawiki-l at lists.wikimedia.org>
> Subject: Re: [MediaWiki-l] Permanently remove old revisions and unused
> 	files?
> Message-ID:
> 	<AM3PR01MB3424C911FCB6549DCD3717ABCD20 at AM3PR01MB342.eurprd01.prod.exchangelabs.com>
> Content-Type: text/plain; charset=UTF-8
> Mickey,
> What business problem are you trying to solve by deleting old revisions of articles? They don’t take up much disk space, and they aren’t visible unless you intentionally go looking for them (with the View History tab).  Is the problem just personal taste -- you don't like seeing so many revisions -- or is there some other business reason? Note: If you don’t want users to see revisions at all, you could hide the View History tab with a line in Mediawiki:Vector.css (or Common.css), at least as a first step:
>     #ca-history { display:none; }
> If the old versions are a security risk, there is feature to hide (not delete) particular revisions: https://www.mediawiki.org/wiki/Manual:RevisionDelete.
> Regarding removal of unused, uploaded files, here is a SQL query that (I believe) lists all unused files that are more than 90 days old. (Critiques are welcome.) You can then feed the list to the script "maintenance/deleteBatch.php" supplied with Mediawiki to delete them.
> select
>   concat('File:', p.page_title) as 'unused file'
> from
>   wp_page p
>   left outer join wp_imagelinks il on (il.il_to = p.page_title)
>   inner join wp_image i on (i.img_name = p.page_title)
> where
>   il.il_to is null
>   and datediff(now(), i.img_timestamp) > 90
> DanB
> ================
> From: MediaWiki-l [mailto:mediawiki-l-bounces at lists.wikimedia.org] On Behalf Of Mickey Feldman
> Sent: Thursday, February 04, 2016 3:38 PM
> To: mediawiki-l at lists.wikimedia.org
> Subject: [MediaWiki-l] Permanently remove old revisions and unused files?
> I have been looking for an extension or process to remove all revisions
> of pages "older than _date_" or "all but the last _n_", but have not
> found anything close.
> This is a private corporate wiki used for internal documentation. Pages
> evolve, but then generally stabilize and are then only for reference and
> rarely edited. There is no need to keep the 100's of revisions that grew
> them to their final form.
> Likewise, there are older and unused versions of uploaded files that are
> just clutter.
> Extension:Nuke does not meet this need.
> Extension:DeleteBatch doesn't either.
> Extension:DeletePagePermanently - nope.
> There are maintenance scripts for Deleting Archived revisions and
> purging old text - also not what I'm looking for.
> So far I'm finding no way to do this other than manually, one page at a
> time, which is a no go. There are 10s of thousands of pages.
> I may have to write a new extension from scratch, but I'm finding it
> hard to believe this functionality does not already exist.
> Have I overlooked something obvious? Am I the only one who has wanted
> something like this?
> Thanks in advance.

More information about the MediaWiki-l mailing list