Gentlemen, it occurred to me that under close examination one finds that when making a backup of one's wiki's database, some of the tables dumped have various degrees of temporariness, and thus though needing to be present in a proper dump, could perhaps be emptied of their values, saving much space in the SQL.bz2 etc. file produced.
Looking at the mysqldump man page, one finds no perfect options to do so, so instead makes one's own script:
$ mysqldump my_database| perl -nwle 'BEGIN{$dontdump="wiki_(objectcache|searchindex)"} s/(^-- )(Dumping data for table `$dontdump`$)/$1NOT $2/; next if /^LOCK TABLES `$dontdump` WRITE;$/../^UNLOCK TABLES;$/; print;'
Though not myself daring to make any recommendations on http://www.mediawiki.org/wiki/Manual:Backing_up_a_wiki#Tables I am still curious which tables can be emptied always, which can be emptied if one is willing to remember to run a maintenance script to resurrect their contents, etc.
On Sat, Mar 14, 2009 at 1:33 AM, jidanni@jidanni.org wrote:
Gentlemen, it occurred to me that under close examination one finds that when making a backup of one's wiki's database, some of the tables dumped have various degrees of temporariness, and thus though needing to be present in a proper dump, could perhaps be emptied of their values, saving much space in the SQL.bz2 etc. file produced.
Looking at the mysqldump man page, one finds no perfect options to do so, so instead makes one's own script:
$ mysqldump my_database| perl -nwle 'BEGIN{$dontdump="wiki_(objectcache|searchindex)"} s/(^-- )(Dumping data for table `$dontdump`$)/$1NOT $2/; next if /^LOCK TABLES `$dontdump` WRITE;$/../^UNLOCK TABLES;$/; print;'
Though not myself daring to make any recommendations on http://www.mediawiki.org/wiki/Manual:Backing_up_a_wiki#Tables I am still curious which tables can be emptied always, which can be emptied if one is willing to remember to run a maintenance script to resurrect their contents, etc.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Really, the only 3 that are needed are page, revision and text. Of course, to keep old versions of stuff you'll need archive, oldimage and filearchive too.
For each table you remove from a dump, that's data you're losing. With some tables (recentchanges, *links) you can repopulate the data. Others you can't. I guess it comes down to deciding which data is important for you to backup.
-Chad
Chad schrieb:
On Sat, Mar 14, 2009 at 1:33 AM, jidanni@jidanni.org wrote:
Gentlemen, it occurred to me that under close examination one finds that when making a backup of one's wiki's database, some of the tables dumped have various degrees of temporariness, and thus though needing to be present in a proper dump, could perhaps be emptied of their values, saving much space in the SQL.bz2 etc. file produced.
Looking at the mysqldump man page, one finds no perfect options to do so, so instead makes one's own script:
$ mysqldump my_database| perl -nwle 'BEGIN{$dontdump="wiki_(objectcache|searchindex)"} s/(^-- )(Dumping data for table `$dontdump`$)/$1NOT $2/; next if /^LOCK TABLES `$dontdump` WRITE;$/../^UNLOCK TABLES;$/; print;'
Though not myself daring to make any recommendations on http://www.mediawiki.org/wiki/Manual:Backing_up_a_wiki#Tables I am still curious which tables can be emptied always, which can be emptied if one is willing to remember to run a maintenance script to resurrect their contents, etc.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Really, the only 3 that are needed are page, revision and text. Of course, to keep old versions of stuff you'll need archive, oldimage and filearchive too.
That would be sufficient to keep page content. You would however also want to keep the user and user_groups tables, probably. and the interwiki table, for it can not be restored and determins the interpretation of link prefixes. The log, too, can't be restored, but if you need it is another question. The image table can generally be restored by looking at the files, though the result may not be exactly the same as the original.
I think it's better to put it this way: tables with "cache" in their name can safely be truncated, the same is true for the profiling table (if used at all). Tables with "link" in their name can always be rebuild, though it may take a while, same for the searchindex.
-- daniel
On Sat, Mar 14, 2009 at 1:33 AM, jidanni@jidanni.org wrote:
Looking at the mysqldump man page, one finds no perfect options to do so, so instead makes one's own script:
$ mysqldump my_database| perl -nwle 'BEGIN{$dontdump="wiki_(objectcache|searchindex)"} s/(^-- )(Dumping data for table `$dontdump`$)/$1NOT $2/; next if /^LOCK TABLES `$dontdump` WRITE;$/../^UNLOCK TABLES;$/; print;'
Why don't you just do:
$ mysqldump --ignore-table=my_database.wiki_objectcache --ignore-table=my_database.wiki_searchindex my_database
Certainly you can skip objectcache. searchindex can be rebuilt. So can all the *links tables, and the redirect table, and probably some others. Of course, rebuilding all these tables on backup restore might take an awfully long time, which you need to weigh against the convenience of not having to back them up.
AG> Why don't you just do:
AG> $ mysqldump --ignore-table=my_database.wiki_objectcache AG> --ignore-table=my_database.wiki_searchindex my_database
OK, but then the structure of those tables are gone from the dump too, not just their contents. We cannot thus recover from scratch straight from the SQL dump. But perhaps we then can immediately run a maintenance script to recreate the missing tables' structure?
On Sun, Mar 15, 2009 at 2:43 PM, jidanni@jidanni.org wrote:
OK, but then the structure of those tables are gone from the dump too, not just their contents. We cannot thus recover from scratch straight from the SQL dump. But perhaps we then can immediately run a maintenance script to recreate the missing tables' structure?
Then you could just do
$ mysqldump --ignore-table=my_database.wiki_objectcache --ignore-table=my_database.wiki_searchindex my_database > backup.sql $ mysqldump -d my_database wiki_objectcache wiki_searchindex >> backup.sql
or something similar to that.
Aryeh Gregor schrieb:
On Sun, Mar 15, 2009 at 2:43 PM, jidanni@jidanni.org wrote:
OK, but then the structure of those tables are gone from the dump too, not just their contents. We cannot thus recover from scratch straight from the SQL dump. But perhaps we then can immediately run a maintenance script to recreate the missing tables' structure?
Then you could just do
$ mysqldump --ignore-table=my_database.wiki_objectcache --ignore-table=my_database.wiki_searchindex my_database > backup.sql $ mysqldump -d my_database wiki_objectcache wiki_searchindex >> backup.sql
Or you can make an extra backup with just the structure:
mysqldump -d yourdatabase > structure.sql
or use maintenance/tables.sql to re-create the structure (if you use a table prefix, you need to run an appropriate search&replace first).
-- daniel
Hoi, With all respect, what is the purpose of this theoretical excercise ? You _may_ be able to ommit selected files when you make a backup. This comes at a cost. The cost is the time that it will take to recreate the data pr the loss of that data. This is fairly obvious. It is equally obvious that a recovery can take a considerable amount of time. When the system is essentially available for users, this may be not that much of a problem. When the system is not available for users for an extended period of time, it may be considered problematic.
The one thing missing in this discussion is a risk assessment and the importance given to maintaing our infrastructure availability. When downtime is considered to be acceptable, it can be considered to omit files from a backup. In all scenarios it makes sense to define a backup and recovery procedure and test it. I know first hand of backup procedures that were done regularly and in the end proved to be of no value. Thanks, GerardM
2009/3/15 jidanni@jidanni.org
AG> Why don't you just do:
AG> $ mysqldump --ignore-table=my_database.wiki_objectcache AG> --ignore-table=my_database.wiki_searchindex my_database
OK, but then the structure of those tables are gone from the dump too, not just their contents. We cannot thus recover from scratch straight from the SQL dump. But perhaps we then can immediately run a maintenance script to recreate the missing tables' structure?
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Sun, Mar 15, 2009 at 3:02 PM, Gerard Meijssen gerard.meijssen@gmail.com wrote:
The one thing missing in this discussion is a risk assessment and the importance given to maintaing our infrastructure availability.
What's "our"? Jidanni is talking about backing up his own wiki, not any Wikimedia wiki.
Aryeh Gregor schreef:
On Sun, Mar 15, 2009 at 3:02 PM, Gerard Meijssen gerard.meijssen@gmail.com wrote:
The one thing missing in this discussion is a risk assessment and the importance given to maintaing our infrastructure availability.
What's "our"? Jidanni is talking about backing up his own wiki, not any Wikimedia wiki.
On top of that, restoring from dumps with these tables empty doesn't break anything. While the rebuild is in progress, features relying on them (search, whatlinkshere, category views) will show incomplete results, but if that means restoring is faster, I guess that's acceptable.
Roan Kattouw (Catrope)
wikitech-l@lists.wikimedia.org