MySQL

Query Optimization

Overriding Optimization

It sounds odd, but there may be times when you'll want to defeat MySQL's optimization behavior. Some of the reasons to do this are described in the following list:

To empty a table with minimal side effects. When you need to empty a table completely, it's fastest to have the server just drop the table and re-create it based on the description stored in its .frm file. To do this, use a TRUNCATE TABLE statement:

TRUNCATE TABLE tbl_name;

Prior to MySQL 4, you can achieve the same effect by using a DELETE statement with no WHERE clause:

DELETE FROM tbl_name;

The server's optimization of emptying a table by re-creating it from scratch makes the operation extremely fast because each row need not be deleted individually. However, there are some side effects that may be undesirable under certain circumstances:

. Prior to MySQL 4, DELETE with no WHERE clause may report the number of rows affected as zero, even when the table wasn't empty. TRUNCATE TABLE may do this for any version of MySQL, depending on the table type.

Most of the time this doesn't matter, although it can be puzzling if you don't expect it. But for applications that require an accurate count of the number of deleted rows, a count of zero is not acceptable.

. For MyISAM tables, AUTO_INCREMENT values normally are not reused when rows are deleted. However, emptying a table by re-creating it may reset the sequence to begin over at 1.

If you encounter these side effects and want to avoid them, use an "unoptimized" full-table DELETE statement that includes a trivially true WHERE clause:

DELETE FROM tbl_name WHERE 1;

Adding the WHERE clause forces MySQL to do a row-by-row deletion, because it must evaluate the condition for each row to determine whether or not to delete it. The query executes much more slowly, but it will return the true number of rows deleted, and it will preserve the current AUTO_INCREMENT sequence number for MyISAM tables.

To override the optimizer's table join order. Use STRAIGHT_JOIN to force the optimizer to use tables in a particular order. If you do this, you should order the tables so that the first table is the one from which the smallest number of rows will be chosen. (If you are not sure which table this is, put the table with the most rows first.)

In other words, try to order the tables to cause the most restrictive selection to come first. Queries perform better the earlier you can narrow the possible candidate rows. Make sure to try the query both ways; there may be some reason the optimizer isn't joining tables the way you think it should, and STRAIGHT_JOIN may not actually help.

Another possibility is to use the USE INDEX and IGNORE INDEX modifiers after a table name in the table list of a join to tell MySQL to use or ignore indexes. This may be helpful in cases where the optimizer doesn't make the correct choice.

To retrieve results in random order. As of MySQL 3.23.2, you can use ORDER BY RAND() to sort results randomly. Another technique, which is useful for older versions of MySQL, is to select a column of random numbers and sort on that column. However, if you try writing the query as follows, the optimizer defeats your intent:

SELECT ..., RAND() as rand_col FROM ... ORDER BY rand_col

The problem here is that MySQL sees that the column is a function call, thinks that the value of the column will be a constant, and optimizes the ORDER BY clause right out of the query! You can fool the optimizer by referring to a table column in the expression. For example, if your table has a column named age, you can write the query as follows:

SELECT ..., age*0+RAND() as rand_col FROM ... ORDER BY rand_col;

In this case, the expression value is always equivalent to RAND(). But the optimizer doesn't know that, so it no longer guesses that the column contains a constant value in each row.

To avoid an endless update loop. Prior to MySQL 3.23.2, if you update a column that is indexed, it's possible for the rows that are updated to be updated endlessly if the column is used in the WHERE clause and the update moves the index value into the part of the range that hasn't been processed yet.

Suppose the mytbl table has an integer column key_col that is indexed. Queries such as the following can cause problems:

UPDATE mytbl SET key_col = key_col+1 WHERE key_col > 0;

The solution for this is to use key_col in an expression term in the WHERE clause such that MySQL can't use the index:

UPDATE mytbl SET key_col = key_col+1 WHERE key_col+0 > 0;