MySQL

Advanced MySQL Administration

Disaster Recovery Through Backup & Restore

Using the Server's Auto-Recovery Capabilities

One of your first lines of defense in maintaining database integrity is the MySQL server's ability to perform automatic recovery at startup time. When the server begins executing, it can perform certain types of table checking to help deal with problems resulting from an earlier server or machine crash. MySQL is designed to recover from a variety of problems, so if you do nothing more than restart the server normally, it will make the necessary corrections for you in most cases. The possible actions taken by the server include the following:

. If the InnoDB table handler is enabled, it can check for a variety of problems automatically. Committed transactions that are present in the redo logs but not yet flushed to tables are rolled forward (redone). Uncommitted transactions in progress at the time of the crash are rolled back (discarded) using the undo logs. The result is to leave your InnoDB tables in a consistent state, so that their contents reflect all transactions that had been committed up to the point of the crash.

. The BDB table handler, if enabled, also attempts auto-recovery based on the contents of its log files.

. The MyISAM handler can perform automatic checking and repair operations. This is controlled by the server's --myisam-recover=level option, where level can be empty to disable checking or a comma-separated list of one or more of the following values: DEFAULT (same as specifying no option), BACKUP (create a backup of the table if it is changed), FORCE (force recovery even if more than a row of data will be lost), or QUICK (quick recovery). The --myisam-recover option is available as of MySQL 3.23.25.

If InnoDB or BDB auto-recovery fails due to a non-recoverable problem, the server exits after writing a message to the error log..

No automatic table startup timetable checking is available for ISAM tables. Nor is it likely there ever will be; ISAM support in MySQL is essentially frozen because MyISAM tables are preferable. I encourage you to consider converting your ISAM tables to MyISAM tables. To convert a table to MyISAM format, use an ALTER TABLE statement:

ALTER TABLE tbl_name TYPE = MYISAM;

You can also use the mysql_convert_table_format utility to convert all tables in a database with a single command. This script is written in Perl and requires that you have DBI installed. To see how to use it, invoke it with the --help option.

If you don't want to convert your ISAM tables, you can arrange to check them by invoking isamchk before the server starts up. Also, if your server is older than 3.23.25 (prior to the introduction of --myisam-recover), you can check your MyISAM tables by invoking myisamchk before the server starts up.

Backing Up and Copying Databases

It's important to back up your databases in case tables are lost or damaged. If a serious system crash occurs, you want to be able to restore your tables to the state they were in at the time of the crash with as little data loss as possible. Likewise, a user who issues an unwise DROP DATABASE or DROP TABLE will likely show up at your door requesting that you perform data recovery. Sometimes it's even the MySQL administrator who inadvertently causes the damage, for example, by trying to edit a table file directly using an editor, such as vi or emacs. This is certain to do bad things to the table!

The techniques that are used for creating backups are also useful for copying databases to another server. Most commonly, a database is transferred to a server running on another host, but you can also transfer data to another server running on the same host. You might do this if you're running a server for a new release of MySQL and want to test it with some real data from your production server.

Another use for backups is to set up a replication server, because one of the first steps in setting up a slave server is to take a snapshot of the master server at a specific point in time. The backup serves as this snapshot, and loading it into the slave server brings it up to date with respect to the master server. Thereafter, updates made on the master server are replicated to the slave server through the standard replication protocol.

The two main methods for backing up databases are to use the mysqldump program or to directly copy database files (for example, with mysqlhotcopy, cp, tar, or cpio). Each method has its own advantages and disadvantages:

. mysqldump operates in cooperation with the MySQL server. Direct-copy methods involve file copy operations that are done external to the server, and you must take steps to ensure that the server does not modify the tables while you copy them. This is the same problem that occurs if you try to use file system backups to back up databases:

If a database table is being updated during the file system backup, the table files that go into the backup may be in an inconsistent state momentarily and will be worthless for restoring the table later. However, whereas with file system backups you may have no control over the backup schedule, when you use direct-copy methods to copy tables, you can take steps to make sure the server leaves the tables alone.

. mysqldump is slower than direct-copy techniques because the dump operation involves transferring the information over the network. Direct-copy backup methods operate at the file system level and require no network traffic.

. mysqldump generates text files that are portable to other machines, even those with a different hardware architecture. Such files are therefore usable for copying databases. Files generated by direct-copy methods may or may not be portable to other machines. This depends on whether or not they correspond to tables that use a machine independent storage format.

ISAM tables do not satisfy this constraint. For example, copying files from Solaris on SPARC to Solaris on SPARC will work, but copying files from Solaris on SPARC to Solaris on Intel or to Mac OS X will not work. MyISAM and InnoDB tables normally are machine independent. For those table types, directly copied files can be moved to a server running on a machine with a different hardware architecture

Whichever backup method you choose, there are certain principles to which you should adhere to assure the best results if you ever need to restore database contents:

. Perform backups regularly. Set a schedule and stick to it.

. Tell the server to perform binary update. The binary logs can help when you need to restore databases after a crash. After you use your backup files to restore the databases to the state they were in at the time of the backup, you can re-apply the changes that occurred after the backup was made by running the queries contained in the logs. This restores the tables in the databases to their state at the time the crash occurred.

. Use a consistent and understandable naming scheme for your backup files. Names like backup1, backup2, and so forth are not particularly meaningful; when it comes time to perform a restore operation, you'll waste time figuring out what's in the files. You may find it useful to construct backup filenames using database names and dates-for example:

. % mysqldump sampdb > /archive/mysql/sampdb.2007-10-02
. % mysqldump menagerie > /archive/mysql/menagerie.2007-10-02

. Expire your backup files periodically to keep them from filling your disk. You can apply the same principles to backup file expiration as well.

. Back up your backup files using file system backups. If you have a complete crash that wipes out not only your data directory but also the disk drive containing your database backups, you'll be in real trouble. Back up your logs, too.

. Put your backup files on a different file system than the one you use for your databases. This reduces the likelihood of filling up the file system containing the data directory as a result of generating backups. Also, if this file system is on another drive, you further reduce the extent of damage that can be caused by drive failure, because loss of any one drive cannot destroy both your data directory and your backups.

Using mysqldump to Back Up and Copy Databases

When you use the mysqldump program to generate database backup files, the file is written in SQL format by default, consisting of CREATE TABLE statements that create the tables being dumped and INSERT statements containing the data for the rows in the tables. To re-create the database later, you can take the mysqldump output file and use it as input to mysql to reload it into MySQL. (Note that you do not use mysqlimport to read SQL-format mysqldump output!)

You can dump an entire database into a single text file as follows:

% mysqldump sampdb > /archive/mysql/sampdb.2007-10-02

The beginning of the output file will look something like this:

-- MySQL dump 9.06
--
-- Host: localhost    Database: sampdb
---------------------------------------------------------
-- Server version     4.0.3-beta-log
--
-- Table structure for table 'absence'
--
CREATE TABLE absence (
  student_id int(10) unsigned NOT NULL default '0',
  date date NOT NULL default '0000-00-00',
  PRIMARY KEY  (student_id,date)
) TYPE=MyISAM;
--
-- Dumping data for table 'absence'
--
INSERT INTO absence VALUES (3,'2007-09-03');
INSERT INTO absence VALUES (5,'2007-09-03');
INSERT INTO absence VALUES (10,'2007-09-06');

The rest of the file consists of more CREATE TABLE and INSERT statements.

Backup files often are large, so you'll likely want to do what you can to make them smaller. One way to do this is to use the --opt option, which optimizes the dump process to generate a smaller file:

% mysqldump --opt sampdb > /archive/mysql/sampdb.2007-10-02

You can also compress the dump file. For example, to compress the backup as you generate it, use a command like the following:

% mysqldump --opt sampdb | gzip > /archive/mysql/sampdb.2007-10-02.gz

If you find large dump files difficult to manage, it's possible to dump the contents of individual tables by naming them after the database name on the mysqldump command line. Then mysqldump will dump just the named tables rather than all the tables in the database. This partitions the dump into smaller, more manageable files. The following example shows how to dump some of the sampdb tables into separate files:

% mysqldump --opt sampdb student score event absence > gradebook.sql
% mysqldump --opt sampdb member president > hist-league.sql

--opt is useful when you're generating backup files that are intended to be used to periodically refresh the contents of another database. That's because it automatically enables the --add-drop-table option, which tells mysqldump to precede each CREATE TABLE statement in the file with a DROP TABLE IF EXISTS statement for the same table. Then, when you take the backup file and load it into the second database, you won't get an error if the tables already exist. If you're running a second test server that's not a replication slave, you can use this technique to periodically reload it with a copy of the data from the databases on your production server.

If you want to transfer a database to another server, you may not even need to create backup files. Make sure that the database exists on the other host and then dump the database over the network using a pipe so that mysql mysqldump directly. For example, to copy the sampdb database from the local host to the server on education.ebizel.com, do so like this:

% mysqladmin -h education.ebizel.com create sampdb
% mysqldump --opt sampdb | mysql -h education.ebizel.com sampdb

If you don't have a MySQL account on the local host that allows you to access the education.ebizel.com server, but you do have such an account on education.ebizel.com itself, use ssh to remotely invoke MySQL commands on that host:

% ssh education.ebizel.com mysqladmin create sampdb
% mysqldump --opt sampdb | ssh education.ebizel.com mysql sampdb

Later, if you want to refresh the sampdb database on education.ebizel.com, repeat the mysqldump command.

Other mysqldump options you may find useful include the following:

. The combination of --flush-logs and --lock-tables is helpful for checkpointing your database. --lock-tables locks all the tables that you're dumping, and --flush-logs closes and reopens the log files. If you're generating sequenced update or binary update logs, the new log will contain only those queries that modify databases subsequent to the checkpoint. This synchronizes your log to the time of the backup. (Locking all the tables is not so good for client access during the backups if you have clients that need to perform updates, however.)

If you use --flush-logs to checkpoint the logs to the time of the backup, it's probably best to dump the entire database. During restore operations, it's common to extract log contents on a per-database basis. If you dump individual tables, it's much more difficult to synchronize log checkpoints against your backup files. (There is no option for extracting updates for individual tables, so you'll have to extract them yourself.)

. By default, mysqldump reads the entire contents of a table into memory before writing it out. This isn't really necessary and, in fact, is almost a recipe for failure if you have really large tables. You can use the --quick option to tell mysqldump to write each row as soon as it has been retrieved. To further optimize the dump process, use --opt instead of --quick. The --opt option turns on other options that speed up dumping the data. In addition, the dump file is written in such a way that it can be processed more quickly later when loaded back into the server.

Performing backups using --opt is probably the most common method because of the benefits for backup speed. Be warned, however, that the --opt option does have a price; what --opt optimizes is your backup procedure, not access by other clients to the database. The --opt option prevents anyone from updating any of the tables that you're dumping by locking all the tables at once. You can easily see for yourself the effect of this on general database access. Just try making a backup at the time of day when your database is normally most heavily used. It won't take long for your phone to start ringing with people calling to find out what's going on. (I'd appreciate it if you would refrain from asking how it is that I happen to know this.)

. An option that has something of the opposite effect of --opt is --delayed. This option causes mysqldump to write INSERT DELAYED statements rather than INSERT statements. If you are loading a dump file into another database and you want to minimize the impact of the operation on other queries that may be taking place in that database, --delayed is helpful for achieving that end.

. Normally you name a database on the mysqldump command line, optionally followed by specific table names. To dump several databases at once, use the --databases option. Then mysqldump will interpret all names as database names and dump all the tables in each of them. To dump all of a server's databases, use --all-databases. In this case, you supply no database or table name arguments. Be careful with the --all-databases option if you intend to load the dump output into another server; the dump will include the grant tables in the mysql database, and you may not really want to replace the other server's grant tables.

. The --compress option is helpful when copying a database to another machine because it reduces the number of bytes traveling over the network:

. % mysqldump --opt sampdb |
mysql --compress -h education.ebizel.com sampdb

Notice that the --compress option is given for the program that communicates with the server on the remote host, not the one that communicates with the local host. Compression applies only to network traffic; it does not cause compressed tables to be created in the destination database.

. By default, mysqldump dumps both table structure (the CREATE TABLE statements) and table contents (the INSERT statements). To dump just one or the other, use the --no-create-info or --no-data options.

mysqldump has many other options as well.

Using Replication to Help Make Backups

Making backups is important, but it introduces a conflict of interest into your duties as a MySQL administrator. On the one hand, you want to maximize the availability of your server to the members of your user community, which includes allowing them to make database updates. On the other hand, for recovery purposes, backups are most useful if you make sure your backup file and log file checkpoints are synchronized.

These goals conflict because the best way to synchronize backup and log checkpoints is by flushing the logs when you make the backup, combined with making sure no updates occur by either bringing the server down or locking all the tables at once (for example, with the --opt option to mysqldump). Unfortunately, disallowing updates reduces client access to the tables for the duration of the backup.

If you have a replication slave server set up, it can help you resolve this conflict. Rather than making backups on the master server, use the slave server instead. Then you need not bring down the master or otherwise make it unavailable to clients during the backup. Instead, suspend replication on the slave server with SLAVE STOP and make a backup from the slave.

(If you are using a direct-copy backup method, issue a FLUSH TABLES statement as well.) Afterward, re-enable replication with SLAVE START and the slave will catch up on any updates made by the master server during the backup period. Depending on your backup method, you may not even need to suspend replication. For example, if you're backing up only a single database, you can use mysqlhotcopy or mysqldump with the appropriate options to lock all the tables at once. In that case, the slave server can stay up, but it won't attempt any updates to the locked tables during the backup. When the backup is done and the locks are released, the slave resumes update processing automatically.