In a message dated 6/21/2004 5:02:01 PM Eastern Standard Time, brion@pobox.com writes:
Note that currently we don't have diff-based storage; when you make a change to a page the entire previous revision is stored in whole. (Consider enabling $wgCompressOld if you have zlib support in PHP; this will reduce old text requirements by roughly half.)
-- brion vibber (brion @ pobox.com)
Currently, my group's wiki is small. There are a few of us actively contributing right now, but that will probably change soon. The handful of volunteers that have been putting in content have also been learning the way of the wiki as they do so, making multiple edits on some rather lengthy articles, and innocently eating up storage space.
At the moment, our wiki is restricted to only registered users being able to contribute and only the sysop can create a registered user account.
We had attempted to research the wiki's overhead requirements in making a judgment as to whether or not to buy more disk space from our provider. During the investigation of overhead storage requirements, we used the 'wikipedia' statistics and charts on space. It never occurred to us that 'wikipedia' was storing full copies of all versions of an article based on the 590MB May 22, 2004 number and considering the high number of articles the db had. We must have been reading the wrong statistics.
Do the 'wikipedia' administrators remove history from their wiki in order to preserve space? If so, how is this done? Is there some sort of 'export only the lastest version of each article, etc.' option, clear the db, and then import the lastest version back?
Our administrator has set the "$wgCompressRevisions = true;" since your message (above) -- will that take care of only the revisions since the flag was turned on or will there be compression of the previous revisions as well?
I appreciate everyone's patience in this. I'm sort of the go-between right now. Hopefully our administrator will come online with this list and she can pose the questions more 'technically'. :)
Our versions: MediaWiki: 1.3.0beta2 PHP: 4.3.4 (apache) MySQL: 4.0.18
Take care,
Debi
AlphabetDP@aol.com wrote:
In a message dated 6/21/2004 5:02:01 PM Eastern Standard Time, brion@pobox.com writes:
Note that currently we don't have diff-based storage; when you make a change to a page the entire previous revision is stored in whole. (Consider enabling $wgCompressOld if you have zlib support in PHP; this will reduce old text requirements by roughly half.)
-- brion vibber (brion @ pobox.com)
<snip>
We had attempted to research the wiki's overhead requirements in making a judgment as to whether or not to buy more disk space from our provider. During the investigation of overhead storage requirements, we used the 'wikipedia' statistics and charts on space. It never occurred to us that 'wikipedia' was storing full copies of all versions of an article based on the 590MB May 22, 2004 number and considering the high number of articles the db had. We must have been reading the wrong statistics.
You might have looked at the cur dump wich only hold the lastest revision, not holding the old revisions. Compressed the sql dumps size for the english wikipedia are:
cur : 269 MB old : 7608 MB
The sizes of all wikipedias databases are available at: http://www.wikipedia.org/wikistats/EN/TablesDatabaseSize.htm
In fact they are bigger :o)
Do the 'wikipedia' administrators remove history from their wiki in order to preserve space? If so, how is this done? Is there some sort of 'export only the lastest version of each article, etc.' option, clear the db, and then import the lastest version back?
There is no such option, one might want to drop olders entries in the "old" tables but you will then lost histories. The only thing deleted in wikipedia databases are new articles which are vandalism / incorrect data. They are dropped from the "cur" table but are still in "old" (as far as I know).
Our administrator has set the "$wgCompressRevisions = true;" since your message (above) -- will that take care of only the revisions since the flag was turned on or will there be compression of the previous revisions as well?
I think it will be only for revisions made after the flag got set, I am not sure there is a ./maintenance/ script to compress revisions made before the switch.
Hopefully the new diff based history will save lot of space.
AlphabetDP@aol.com wrote:
It never occurred to us that 'wikipedia' was storing full copies of all versions of an article based on the 590MB May 22, 2004 number and considering the high number of articles the db had. We must have been reading the wrong statistics.
That sounds like the table of current revisions. The old revisions table for en.wikipedia.org is over 10GB.
Do the 'wikipedia' administrators remove history from their wiki in order to preserve space? If so, how is this done? Is there some sort of 'export only the lastest version of each article, etc.' option, clear the db, and then import the lastest version back?
No.
Our administrator has set the "$wgCompressRevisions = true;" since your message (above) -- will that take care of only the revisions since the flag was turned on or will there be compression of the previous revisions as well?
That only affects newly saved revisions. There is a 'compressOld.php' script in the maintenance directory which will go through and get the rest.
-- brion vibber (brion @ pobox.com)
mediawiki-l@lists.wikimedia.org