Here are 10 tips for getting great performance out of MySQL.
MySQL performance tip No. 1: Profile your workload
The best way to understand how your server spends its time is to profile the server’s workload. By profiling your workload, you can expose the most expensive queries for further tuning. Here, time is the most important metric because when you issue a query against the server, you care very little about anything except how quickly it completes.
The best way to profile your workload is with a tool such as MySQL Enterprise Monitor’s query analyzer or the pt-query-digest from the Percona Toolkit. These tools capture queries the server executes and return a table of tasks sorted by decreasing order of response time, instantly bubbling up the most expensive and time-consuming tasks to the top so that you can see where to focus your efforts.
Workload-profiling tools group similar queries together into one row, allowing you to see the queries that are slow, as well as the queries that are fast but executed many times.
MySQL performance tip No. 2: Understand the four fundamental resources
To function, a database server needs four fundamental resources: CPU, memory, disk, and network. If any of these is weak, erratic, or overloaded, then the database server is very likely to perform poorly.
Understanding the fundamental resources is important in two particular areas: choosing hardware and troubleshooting problems.
When choosing hardware for MySQL, ensure good-performing components all around. Just as important, balance them reasonably well against each other. Often, organizations will select servers with fast CPUs and disks but that are starved for memory. In some cases, adding memory is cheap way of increasing performance by orders of magnitude, especially on workloads that are disk-bound. This might seem counterintuitive, but in many cases disks are overutilized because there isn’t enough memory to hold the server’s working set of data.
Another good example of this balance pertains to CPUs. In most cases, MySQL will perform well with fast CPUs because each query runs in a single thread and can’t be parallelized across CPUs.
When it comes to troubleshooting, check the performance and utilization of all four resources, with a careful eye toward determining whether they are performing poorly or are simply being asked to do too much work. This knowledge can help solve problems quickly.
MySQL performance tip No. 3: Don’t use MySQL as a queue
Queues and queue-like access patterns can sneak into your application without your knowing it. For example, if you set the status of an item so that a particular worker process can claim it before acting on it, then you’re unwittingly creating a queue. Marking emails as unsent, sending them, then marking them as sent is a common example.
Queues cause problems for two major reasons: They serialize your workload, preventing tasks from being done in parallel, and they often result in a table that contains work in process as well as historical data from jobs that were processed long ago. Both add latency to the application and load to MySQL.
MySQL performance tip No. 4: Filter results by cheapest first
A great way to optimize MySQL is to do cheap, imprecise work first, then the hard, precise work on the smaller, resulting set of data.
For example, suppose you’re looking for something within a given radius of a geographical point. The first tool in many programmers’ toolbox is the great-circle (Haversine) formula for computing distance along the surface of a sphere. The problem with this technique is that the formula requires a lot of trigonometric operations, which are very CPU-intensive. Great-circle calculations tend to run slowly and make the machine’s CPU utilization skyrocket.
Before applying the great-circle formula, pare down your records to a small subset of the total, and trim the resulting set to a precise circle. A square that contains the circle (precisely or imprecisely) is an easy way to do this. That way, the world outside the square never gets hit with all those costly trig functions.
MySQL performance tip No. 5: Know the two scalability death traps
Scalability is not as vague as you may believe. In fact, there are precise mathematical definitions of scalability that are expressed as equations. These equations highlight why systems don’t scale as well as they should.
Take the Universal Scalability Law, a definition that is handy in expressing and quantifying a system’s scalability characteristics. It explains scaling problems in terms of two fundamental costs: serialization and crosstalk.
Parallel processes that must halt for something serialized to take place are inherently limited in their scalability. Likewise, if the parallel processes need to chat with each other all the time to coordinate their work, they limit each other.
Avoid serialization and crosstalk, and your application will scale much better. What does this translate into inside of MySQL? It varies, but some examples would be avoiding exclusive locks on rows. Queues, point No. 3 above, tend to scale poorly for this reason.
MySQL performance tip No. 6: Don’t focus too much on configuration
DBAs tend to spend a huge amount of time tweaking configurations. The result is usually not a big improvement and can sometimes even be very damaging. I’ve seen a lot of “optimized” servers that crashed constantly, ran out of memory, and performed poorly when the workload got a little more intense.
The defaults that ship with MySQL are one-size-fits-none and badly outdated, but you don’t need to configure everything. It’s better to get the fundamentals right and change other settings only if needed. In most cases, you can get 95 percent of the server’s peak performance by setting about 10 options correctly. The few situations where this doesn’t apply are going to be edge cases unique to your circumstances.
In most cases, server “tuning” tools aren’t recommended because they tend to give guidelines that don’t make sense for specific cases. Some even have dangerous, inaccurate advice coded into them — such as cache hit ratios and memory consumption formulas. These were never right, and they’ve gotten even less correct as time has passed.
MySQL performance tip No. 7: Watch out for pagination queries
Applications that paginate tend to bring the server to its knees. In showing you a page of results, with a link to go to the next page, these applications typically group and sort in ways that can’t use indexes, and they employ a LIMIT and offset that causes the server to do a lot of work generating, then discarding rows.
Optimizations can often be found in the user interface itself. Instead of showing the exact number of pages in the results and links to each page individually, you can just show a link to the next page. You can also prevent people from going to pages too far from the first page.
On the query side, instead of using LIMIT with offset, you can select one more row than you need, and when the user clicks the “next page” link, you can designate that final row as the starting point for the next set of results. For example, if the user viewed a page with rows 101 through 120, you would select row 121 as well; to render the next page, you’d query the server for rows greater than or equal to 121, limit 21.
MySQL performance tip No. 8: Save statistics eagerly, alert reluctantly
Monitoring and alerting are essential, but what happens to the typical monitoring system? It starts sending false positives, and system administrators set up email filtering rules to stop the noise. Soon your monitoring system is completely useless.
I like to think about monitoring in two ways: capturing metrics and alerting. It’s very important to capture and save all the metrics you possibly can because you’ll be glad to have them when you’re trying to figure out what changed in the system. Someday, a strange problem will crop up, and you’ll love the ability to point to a graph and show a change in the server’s workload.
MySQL performance tip No. 9: Learn the three rules of indexing
Indexing is probably the most misunderstood topic in databases because there are so many ways to get confused about how indexes work and how the server uses them. It takes a lot of effort to really understand what’s going on.
Indexes, when properly designed, serve three important purposes in a database server:
They let the server find groups of adjacent rows instead of single rows. Many people think the purpose of an index is to find individual rows, but finding single rows leads to random disk operations, which is slow. It’s much better to find groups of rows, all or most of which are interesting, than to find rows one at a time.
They let the server avoid sorting by reading the rows in a desired order. Sorting is costly. Reading rows in the desired order is much faster.
They let the server satisfy entire queries from the index alone, avoiding the need to access the table at all. This is variously known as a covering index or an index-only query.
If you can design your indexes and queries to exploit these three opportunities, you can make your queries several orders of magnitude faster.
MySQL performance tip No. 10: Leverage the expertise of your peers
Don’t try to go it alone. If you’re puzzling over a problem and doing what seems logical and sensible to you, that’s great. This will work about 19 times out of 20. The other time, you’ll go down a rabbit hole that will be very costly and time-consuming, precisely because the solution you’re trying seems to make a lot of sense.
Build a network of MySQL-related resources — and this goes beyond toolsets and troubleshooting guides. There are some extremely knowledgeable people lurking on mailing lists, forums, Q&A websites, and so on. Conferences, trade shows, and local user group events provide valuable opportunities for gaining insights and building relationships with peers who can help you in a pinch.
Databases tend to grow over time as they store more and more information. To ensure your database information can be accessed in the shortest time possible, you need to learn a few tricks to speed up data retrieval. For MySQL databases, you can use indexes and partitions to limit the amount of data that MySQL has to traverse to fetch query results, and use some other optimization tricks to further improve performance.
For a specific type of frequently accessed data, creating an index is the best way to speed things up. For example, if you have a table with 20 columns, of which one column is frequently accessed, you can create an index for that column to speed up lookups on it. If there are no indexes, MySQL performs a full table scan to retrieve data. It examines all the records in the table, one after another, until either the last record is read or the query is satisfied. With an index, however, MySQL can look up data without having to read each record of the table. As soon as a match is found in the index, you’re pointed to the data in the actual table. Think of it like an index in a book. When looking for a particular topic, you can either flip through the pages until you reach the end of the book or find what you’re looking for, or you can visit the index to reveal the page number for the topic you’re interested in.
When you create an index for a field, MySQL collects all the information in that particular column, sorts it, and then stores it in a unique object or file, separate from the table, along with references to the original table with the unsorted data.
Maintaining indexes does require additional resources. If you create an index for a table that is frequently updated, all the DELETE, INSERT, and UPDATE statements must also update the index, in addition to the data table itself. Frequent alterations on indexes place an additional load on the server and may slow down other important processes, so you shouldn’t create indexes for all fields, but only for ones that are often queried for information, and where the table is large enough (with several thousands of rows or more) to necessitate faster retrieval.
The way the indexing works also depends on the type of storage engine you use to create the tables for your MySQL database. Up until version 5.5, MyISAM was the default storage engine, but InnoDB has taken its place as the default as of MySQL 5.5 – though you can define the default storage engine in the my.cnf file or manually specify the engine to use when creating tables.
On top of this, there are several different types of indexes that you can use, such as B-Tree or Hash. Your choice of index depends on the storage engine you use.
You can create an index for a table while creating the table itself, or create an index for existing tables. The following code creates a new table with an index on one of the columns:
CREATE TABLE records (
id_num INT, INDEX (id)
This creates an index named id for the id_num column on the table. For an existing table, the command would be CREATE INDEX id ON records(id_num).
To simultaneously create an index on multiple columns for an existing table, use a command like ALTER TABLE records ADD INDEX id(id_num), ADD INDEX name(name);.
As MySQL is unaware of the exact nature of data stored in a field, other than the data type (whether CHAR or INT), it will create an index using the full length of the data. So, if you have a table of songs, for instance, and you create an index on the 50-character name field, MySQL will store the entire length of the name for every song in the database. In a case like this, it might be wiser to store only the first 20 characters in the index, instead of the full length; that should give you enough characters to help you distinguish between songs. For large databases with very long columns, it would be a waste of space to store the entire name in the index.
You can restrict the length of the data to 20 characters by specifying it along with the name of the column, like so:
ALTER TABLE records ADD INDEX id(id_num), ADD INDEX name(name(20));
An improperly configured index can slow down a server by hoarding its resources, so you need to have a firm grasp of the underlying technology, particularly if you’re using the InnoDB storage engine. Take some time to familiarize yourself with the intricacies of InnoDB and how it works with primary key and secondary key indexes.
Partition MySQL Tables
The second trick, using partitions, is ideal for very large tables with several hundred thousand rows or more. While you can also partition tables with only several thousand rows, you will notice a remarkable difference in lookup times for the larger tables.
You can divide either a table’s rows (called horizontal partitioning) or columns (vertical partitioning) into partitions. When you do so, instead of traversing the entire table, MySQL queries only the relevant partition, which greatly reduces the lookup times.
A partition can have any number of rows or columns, but you should try to keep partitions relatively small compared to the size of the table if you want to see performance gains. Each table can be divided into 1,024 partitions at most.
Before you try to partition a table, first check whether your version of MySQL was built with partitioning support. When compiling MySQL from source, you need to use the -DWITH_PARTITION_STORAGE_ENGINE option to build partitioning support; the MySQL software package distributed in the repositories of most distributions comes with the partitions option built-in. To check, run SHOW PLUGINS; at the MySQL prompt, which will print a tabular list of all plugins and their status. Make sure that partition is listed as one of the plugins; if it isn’t, you can’t use the partitioning feature of MySQL.
You can create partitions when creating a table, or alter an existing table to partition its data set. You can create various types of partitions, such as RANGE, LIST, and HASH. The most commonly used is Range, where each partition is defined to accept a specific range of values for some column. What partition suits your table best will depend on the type of data stored in the table.
You must use the PARTITION BY keyword to define how the partitions are to be effected. So, to create a partition based on a range of data such as year, use the PARTITION BY RANGE(YEAR(order_date)) clause, like so:
CREATE TABLE employees (
order_date DATETIME NOT NULL,
— Other columns omitted
) PARTITION BY RANGE(YEAR(order_date)) (
PARTITION p_2009 VALUES LESS THAN (2009),
PARTITION p_2010 VALUES LESS THAN (2010),
PARTITION p_all VALUES LESS THAN MAXVALUE );
This command will create three partitions on the table employees, one for each of the specified years. Here, the first partition holds data for all years up to 2009, while the last partition, p_all, holds all the records that are left over after the other records are partitioned. The data is partitioned as soon as it matches a partitioning criterion, so no two partitions can have the same data.
When querying data from a partitioned table, you don’t need to specify the partition; the query remains the same whether or not you use partitions. MySQL will automatically deduce which partition to retrieve the information from.
As with indexes, there’s a small performance penalty with partitioning. When you insert new rows into a partitioned table, MySQL has to determine which partition the new data should go into, and this requires resources. While this additional drain on resources isn’t necessarily crippling, you have to take it into account when creating partitions. Depending on the size of the table, it might be wise to use dozens of partitions, but once you start reaching into the hundreds, take due care that using partitions itself doesn’t affect performance.
Identify Slow Queries
One of the best thing about MySQL is that it can help you identify the queries that take up the most resources via its slow-query-log.
MySQL doesn’t record slow queries by default; running mysqladmin var | grep log_slow_queries in the terminal will confirm this for you. To enable the option, edit the my.cnf configuration file and add these lines under the [mysqld] block:
long_query_time = 1
log-slow-queries = /var/log/mysql/mysql-slow-query.log
The default long_query_time is 0, but we’ve set it to 1 here to make MySQL log all queries that take longer than 1 second to execute. Feel free to change this value, as well as the location for the log file. When you’re done, you need to restart the MySQL server with the command /etc/init.d/mysql restart for the changes in the my.cnf file to take effect.
You can then use the mysqldumpslow command-line tool, included with MySQL, to get a summary of the slow-query-log file. If you see a list of queries that are taking too much time to execute, you can try to find out why.
Explain is one of the most widely used MySQL query analysis tool available. When used with a SELECT statement, it reveals a wide array of information, such as the number of tables involved in the statement, how the data is looked up, if there are any subqueries, whether any indexes are used, and much more. You can run Explain on each of of the slow queries to determine why the statements are slow. Once you know the causes, you should be able to figure out how to fix the problems.
Fix That Configuration File
One final tip: MySQL Tuner is a Perl script that you can run to test your MySQL performance. Based on the results, and other observations, the script will then advise what changes you can make in the configuration file to improve performance.
At the terminal, type wget http://mysqltuner.com/mysqltuner.pl to download the script. Make the script executable with the command chmod +x mysqltuner.pl. Run it by typing ./mysqltuner.pl; you will be prompted for your MySQL root password.
The output from the script is divided into sections. The General and Storage Engine Statistics at the top tells you the number of tables in each of the different storage engines. Performance Metrics lists parameters the script checks the system for, such as slow queries, memory usage, and temporary tables. Finally, the Recommendations section lists all the variables that you need to adjust in the configuration files.