您的位置：首页 > 产品设计 > UI/UE

Notes on <High Performance MySQL> -- Ch4: Query Performance Optimization

2012-07-08 12:00 671 查看

Slow Query Basics: Optimize Data Access

Analyze a poorly performing query in two steps:

- Find out whether your application is retrieving more data than you need. That usually means it’s accessing too many rows, but it might also be accessing too many columns.

- Find out whether the MySQL Server is analyzing more rows than it needs.

Are You Asking the Database for Data You Don’t Need?

Here are some typical mistakes:

- Fetching more rows than needed

- Fetching all columns from a multitable join

- Fetching all columns

Is MySQL Examining Too Much Data?

In MySQL, the simplest query cost metrics are:

- Execution time

- Number of rows examined

- Number of rows returned

All these metrics are logged in the slow query log, so looking at the slow query log is one of the best ways to find queries that examine too much data.

- Execution time

- Rows examined and rows returned

- Rows examined and access types

The access method(s) appear in the type column in EXPLAIN’s output. The access types range from a full table scan to index scans, range scans, unique index lookups, and constants. Each of these is faster than the one before it, because it requires reading less data.

In general, MySQL can apply a WHERE clause in three ways, from best to worst:

Apply the conditions to the index lookup operation to eliminate nonmatching rows. This happens at the storage engine layer.

Using the covering index (“Using index” in the Extra column) to avoid row accesses, and filter out nonmatching rows after retrieving each result from the index. This happens at the server layer, but it doesn’t reading rows from the table.

Retrieving rows from the table, then filter nonmatching rows (“Using where” in the Extra column). This happens at the server layer and requires the server to read rows from the table before it can filter them.

Ways to Restructure Queries

Complex Queries Versus Many Queries

MySQL was designed to handle connecting and disconnecting very efficiently and to respond to small and simple queries very quickly.

Chopping Up a Query

Join Decomposition

Many high-performance web sites use join decomposition. You can decompose a join by running multiple single queries instead of a multitable join, and then performing the join in the application.

- Caching can be more efficient. Many applications cache “objects” that map directly to tables

- For MyISAM tables, performing one query per table uses table locks more efficiently: the queries will lock the tables individually and relatively briefly, instead of locking them all for a longer time.

- Doing joins in the application makes it easier to scale the database by placing tables on different servers.

- The queries themselves can be more efficient.

- You can reduce redundant row accesses.

- To some extent, you can view this technique as manually implementing a hash join instead of the nested loops algorithm MySQL uses to execute a join.

Summary: When Application Joins May Be More Efficient

You cache and reuse a lot of data from earlier queries

You use multiple MyISAM tables

You distribute data across multiple servers

You replace joins with IN() lists on large tables

A join refers to the same table multiple times

Query Execution Basics

The MySQL Client/Server Protocol

The client sends a query to the server as a single packet of data. This is why the max_packet_size configuration variable is important if you have large queries.

Query states

Each MySQL connection, or thread, has a state that shows what it is doing at any given time. There are several ways to view these states, but the easiest is to use the SHOW FULL PROCESSLIST command (the states appear in the Command column)

- Sleep: The thread is waiting for a new query from the client

- Query: The thread is either executing the query or sending the result back to the client

- Locked: The thread is waiting for a table lock to be granted at the server level.

- Analyzing and statistics: The thread is checking storage engine statistics and optimizing the query.

- Copying to tmp table [on disk]

- Sorting result

- Sending data: This can mean several things: the thread might be sending data between stages of the query, generating the result set, or returning the result set to the client.

The Query Cache

Case sensitive hash lookup.

If MySQL does find a match in the query cache, it must check privileges before returning the cached query. This is possible without parsing the query, because MySQL stores table information with the cached query.

The Query Optimization Process

The parser and the preprocessor

The query optimizer

MySQL uses a cost-based optimizer. The unit of cost is a single random 4K data page read. You can see how expensive the optimizer estimated a query to be by running the query, then inspecting the last_query_cost session variable:

The optimizer does not include the effects of any type of caching in its estimates – it assumes every read will result in a disk I/O operation.

There are two basic types of optimizations, which we call static and dynamic. Static optimizations can be performed simply by inspecting the parse tree. Static optimizations are independent of values, such as the value of a constant in a WHERE clause. They can be performed once and will always be valid, even then the query is reexecuted with different values. In contrast, dynamic optimizations are based on context and can depend on many factors, such as which value is in a WHERE clause or how many rows are in an index.

IN() List comparisions

In many database servers, IN() is just a synonym for multiple OR clauses, because the two are logically equivalent. Not so in MySQL, which sorts the values in the IN() list and uses a fast binary search to see whether a value is in the list. This is O(log n) in the size of the list, whereas an equivalent series of OR clauses is O(n) in the size of the list(i.e. much slower for large list)

Table and index statistics

MySQL’s join execution strategy

MySQL considers every query a join – not just every query that matches rows from two tables, but every query, period (including subqueries, and even a SELECT against a single table).

MySQL treats every join as a nested-loop join.

MySQL executes UNION queries with temporary tables,and it rewrites all RIGHT OUTER JOIN queries to equivalent LEFT OUTER JION.

MySQL doesn’t support FULL OUTER JOIN

The execution plan

If you execute EXPLAIN EXTENDED on a query, followed by SHOW WARNINGS, you’ll see the reconstructed query.

The join optimizer

The most important part of the MySQL query optimizer is the join optimizer, which decides the best order of the execution for multitable queries.

STRAIGHT_JOIN

Sort optimizer

It can do sort in memory or on disk, but it always calls this process a filesort, even if it doesn’t actually use a file.

There are two filesort algorithms:

- Two passes (old)

Reads row pointers and ORDER BY columns, sorts them, and then scans the sorted list and rereads the rows for output.

- One pass (new)

Reads all the columns needed for the query, sorts them by the ORDER BY columns, and then scans the sorted list and output the specified columns.

MySQL allocates a fixed-size record for each tuple it will sort, these records are large enough to hold the largest possible tuple, including the full length of each VARCHAR column. Also, if you’re using UTF-8, MySQL allocates 3 bytes for each character.

The Query Execution Engine

Retuning Results to the Client

Limitations of the MySQL Query Optimizer

Correlated Subqueries

When a correlated subquery is good

UNION limitations

MySQL sometimes can’t “push down” conditions from the outside of a UNION to the inside, where they could be used to limit results or enable additional optimizations.

Index merge optimizations

Index merge algorithms let MySQL use more than one index per table in a query.

There are 3 variations on the algorithm: union for OR conditions, intersection for AND conditions, and unions of intersections for combinations of the two.

Equality propagation

Parallel execution

MySQL can’t execute a single query in parallel on many CPUs.

Hash joins

MySQL can’t do true hash joins now.

Loose index scans

MySQL has historically been unable to do loose index scans, which scan noncontiguous ranges of an index. MySQL’s index scans generally require a defined start point and a defined end point in the index, even if only a few noncontiguous rows in the middle are really desired for the query. MySQL will scan the entire range of rows within these end points.

Beginning in MySQL 5.0, loose index scans are possible in certain limited circumstances, such as queries that find maximum and minimum values in a grouped query

MIN() and MAX()

==>

SELECT and UPDATE on the same table

MySQL doesn’t let you SELECT from a table while simultaneous running an UPDATE on it.

Optimizer Specific Types of Queries

Optimizing COUNT() Queries

COUNT() counts values and rows. A value is a non-NULL expression (NULL is the absence of a value).

When you want to know the number of rows in the result, you should always use COUNT(*). This communicates your intention clearly and avoids poor performance.

==>

Optimizing JOIN Queries

-
Make sure there are indexes on the columns in
the ON or USING clauses. In general, you need to add indexes only on the second
table in the join order, unless they’re needed for some other reason.

-
Try to ensure that any GROUP BY or ORDER BY
expression refers only to columns from a single table, so MySQL can try to use
an index for that operation.

-

Optimizing Subqueries

You should usually prefer a join where possible.

Optimizing GROUP BY and DISTINCT

MySQL has two kinds of GROUP BY strategies when it can’t use
an index: it can use a temporary table or a filesort to perform the
grouping. You can force the optimizer
to choose one method or the other with the SQL_BIG_RESULT
and SQL_SMALL_RESULT optimizer
hints.

MySQL automatically orders grouped queries by the columns in
the GROUP BY clause, unless you specify an ORDER BY clause explicitly. If you
don’t care about the order and you see this causing a filesort, you can use ORDER BY NULL to skip the automatic
sort. You can also add an optional DESC or ASC keyword right after the GROUP BY
clause to order the result in the desired direction by the clause’s columns.

Optimizing LIMIT and OFFSET

One simple technique to improve efficiency is to do the
offset on a covering index, rather
than the full rows. You can then join the result to the full row and retrieve
the additional columns you need. This can be much more efficient.

SELECT film_id, description FROM film ORDER BY title LIMIT
50, 5;

==> (if the table is very large, this query is better written as follows:)

SELECT film_id, description FROM film

INNER JOIN (select
film_id FROM film ORDER BY title LIMIT 50, 5) as lim USING(film_id);

If you really need to optimize pagination systems, you
should probably use precomputed summaries.

Optimizing SQL_CALC_FOUND_ROWS

Optimizing UNION

MySQL always executes
UNION queries by creating a temporary table and filling it with the UNION
results. You might have to help the optimizer by manually “pushing down” WHERE,
LIMIT, ORDER BY, and other conditions. MySQL always places results into a
temporary table and then reads them out again, even when it’s not really
necessary.

Query Optimizer Hints

-
HIGH_PRIORITY and LOW_PRIORITY

These hints are effective on storage
engines with table-level locking, but you should never need them on InnoDB or
other engines with fine-grained locking and concurrency control. Be careful
when using then on MyISAM, because they can disable concurrent insert and
greatly reduce performance.

-
DELAYED

-
STRAIGHT_JOIN

-
SQL_SMALL_RESULT and SQL_BIG_RESULT

SQL_SMALL_RESULT tells the optimizer that
the result set will be small and can be put into indexed temporary tables to avoid sorting for the grouping, whereas
SQL_BIG_RESULT indicates that the result will be large and that it will be
better to use temporary tables on disk with sorting.

-
SQL_BUFFER_RESULT

This hint tells the optimizer to put the
results into a temporary table and release table locks as soon as possible.

-
SQL_CACHE and SQL_NO_CACHE

-
SQL_CACL_FOUND_ROWS

This hint tells MySQL to calculate a full
result set when there’s a LIMIT clause, even though it returns only LIMIT ROWS.
You can retrieve the total number of rows it found via FOUND_ROWS()

-
FOR UDPATE and LOCK IN SHARE MODE

When using these hints with InnoDB, be
aware that they may disable some optimizations, such as covering indexes.

InnoDB
can’t lock rows exclusively without accessing the primary key, which is where
the row version information is stored.

-
USE INDEX, IGNORE INDEX, and FORCE INDEX

User-Defined Variables

Where executes before the SELECT, that’s why there are two
records returned.

The query returns every row in the table, because the ORDER
BY added a filesort and the WHERE is evaluated before the filesort.

(The @rownum := @rownum + 1 in the SELECT clause is executed
at last, thus the value in the column cnt is in order)

The solution to this problem is to assign and read in the
same stage of query execution.

Note that the @rownum is 7 after the query!

(Though the returned row count is 1, the @rownum := @rownum
+ 1 is executed for every row in the table as MySQL doesn’t know when to stop
evaluating @rownum := @rownum + 1!)

Note the column cnt is 7 this time as we add “Order by”
clause. Thus, the select will be “executed” after the sort operation; the
@rownum will be the last value.

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航