MySQL

Query Optimization

Choosing Index Type

The syntax for creating indexes was covered earlier in the tutorial , "MySQL SQL Syntax and Use." I assume here that you've read that section. But knowing syntax doesn't in itself help you determine how your tables should be indexed.

That requires some thought about the way you use your tables. This section gives some guidelines on how to identify candidate columns for indexing and how best to set up indexes:

. Index columns that you use for searching, sorting, or grouping, not columns you display as output. In other words, the best candidate columns for indexing are the columns that appear in your WHERE clause, columns named in join clauses, or columns that appear in ORDER BY or GROUP BY clauses. Columns that appear only in the output column list following the SELECT keyword are not good candidates:

. SELECT col_a FROM tbl1 LEFT JOIN tbl2 ON tbl1.col_b = tbl2.col_c candidates WHERE col_d = expr;

The columns that you display and the columns you use in the WHERE clause might be the same, of course. The point is that appearance of a column in the output column list is not in itself a good indicator that it should be indexed.

Columns that appear in join clauses or in expressions of the form col1 = col2 in WHERE clauses are especially good candidates for indexing. col_b and col_c in the query just shown are examples of this. If MySQL can optimize a query using joined columns, it cuts down the potential table-row combinations quite a bit by eliminating full table scans.

. Use unique indexes. Consider the spread of values in a column. Indexes work best for columns with unique values and most poorly with columns that have many duplicate values. For example, if a column contains many different age values, an index will differentiate rows readily.

An index probably will not help much for a column that is used to record sex and contains only the two values 'M' and 'F'. If the values occur about equally, you'll get about half of the rows whichever value you search for. Under these circumstances, the index may never be used at all because the query optimizer generally skips an index in favor of a full table scan if it determines that a value occurs in more than about 30 percent of a table's rows.

. Index short values.

If you're indexing a string column, specify a prefix length whenever it's reasonable to do so. For example, if you have a CHAR(200) column, don't index the entire column if most values are unique within the first 10 or 20 bytes. Indexing the first 10 or 20 bytes will save a lot of space in the index, and probably will make your queries faster as well.

A smaller index involves less disk I/O, and shorter values can be compared more quickly. More importantly, with shorter key values, blocks in the index cache hold more key values, so MySQL can hold more keys in memory at once. This improves the likelihood of locating rows without reading additional index blocks from disk.

(You want to use some common sense, of course. Indexing just the first character from a column isn't likely to be that helpful because there won't be very many distinct values in the index.)

. Take advantage of leftmost prefixes.

When you create an n-column composite index, you actually create n indexes that MySQL can use. A composite index serves as several indexes because any leftmost set of columns in the index can be used to match rows. Such a set is called a leftmost prefix.

Suppose you have a table with a composite index on columns named state, city, and zip. Rows in the index are sorted in state/city/zip order, so they're automatically sorted in state/city order and in state order as well.

This means that MySQL can take advantage of the index even if you specify only state values in a query or only state and city values. Thus, the index can be used to search the following combinations of columns:

state, city, zip
state, city
state

MySQL cannot use the index for searches that don't involve a leftmost prefix. For example, if you search by city or by zip, the index isn't used.

If you're searching for a given state and a particular Zip code (columns 1 and 3 of the index), the index can't be used for the combination of values, although MySQL can narrow the search using the index to find rows that match the state.

. Don't over-index.

Don't index everything in sight based on the assumption "the more, the better." That's a mistake. Every additional index takes extra disk space and hurts performance of write operations, as has already been mentioned.

Indexes must be updated and possibly reorganized when you modify the contents of your tables, and the more indexes you have, the longer this takes. If you have an index that is rarely or never used, you'll slow down table modifications unnecessarily. In addition, MySQL considers indexes when generating an execution plan for retrievals. Creating extra indexes creates more work for the query optimizer.

It's also possible (if unlikely) that MySQL will fail to choose the best index to use when you have too many indexes. Maintaining only the indexes you need helps the query optimizer avoid making such mistakes.

If you're thinking about adding an index to a table that is already indexed, consider whether the index you're thinking about adding is a leftmost prefix of an existing multiple-column index.

If so, don't bother adding the index because, in effect, you already have it. (For example, if you already have an index on state, city, and zip, there is no point in adding an index on state.)

. Consider the type of comparisons you perform on a column. Generally, indexes are used for <, <=, =, >=, >, and BETWEEN operations. Indexes are also used for LIKE operations when the pattern has a literal prefix.

If you use a column only for other kinds of operations, such as STRCMP(), there is no value in indexing it. For HEAP tables, indexes are hashed and are used only for equality comparisons. If you perform a range search (such as a < b) with a HEAP table, an index will not help.

. Use the slow-query log to identify queries that may be performing badly. This log can help you find queries that may benefit from indexing. Use the mysqldumpslow utility to view this log.

If a given query shows up over and over in the slow-query log, that's a clue that you've found a query that may not be written optimally. You may be able to rewrite it to make it run more quickly. Keep the following points in mind when assessing your slow-query log:

o "Slow" is measured in real time, so more queries will show up in the slow-query log on a heavily loaded server than on a lightly loaded one. You'll need to take this into account.

o If you use the --log-long-format option in addition to enabling slow-query logging, the log also will include queries that execute without using any index. These queries aren't necessarily slow. (No index may be needed for small tables, for example.)