Speeding Up Your Website’s Database

Advertisement

Website speed has always been a big issue, and it has become even more important since April 2010, when Google decided to use it in search rankings1. However, the focus of the discussion is generally on minimizing file sizes, improving server settings and optimizing CSS and Javascript.

The discussion glosses over another important factor: the speed with which your pages are actually put together on your server. Most big modern websites store their information in a database and use a language such as PHP or ASP to extract it, turn it into HTML and send it to the Web browser.

So, even if you get your home page down to 1.5 seconds (Google’s threshold for being considered a “fast” website), you can still frustrate customers if your search page takes too much time to respond, or if the product pages load quickly but the “Customer reviews” delay for several seconds.

Google Site Performance2
Google’s threshold for a fast-loading website is about 1.5 seconds. This screenshot comes from Google Webmaster Tools (go to [domain name] → Diagnostics → Site Performance).

This article looks at these sorts of issues and describes some simple ways to speed up your website by optimizing your database. It starts with common knowledge but includes more complex techniques at the end, with links to further reading throughout. The article is intended for fearless database beginners and designers who have been thrown in at the deep end.

What Is A Database? What Is SQL?

A database is basically a collection of tables of information, such as a list of customers and their orders. It could be a filing cabinet, a bunch of spreadsheets, a Microsoft Access file or Amazon’s 40 terabytes of book and customer data3.

A typical database for a blog has tables for users, categories, posts and comments. WordPress includes these and a few other4 starter tables. A typical database for an e-commerce website has tables for customers, products, categories, orders and order items (for the contents of shopping baskets). The open-source e-commerce software Magento includes these and many others5. Databases have many other uses — such as for content management, customer relations, accounts and invoicing, and events — but these two common types (i.e. for a blog and an e-commerce website) will be referenced throughout this article.

Some tables in a database are connected to other tables. For example, a blog post can have many comments, and a customer can make multiple orders (these are one-to-many relationships). The most complicated type of database relationship is a many-to-many relationship. One relationship is at the core of all e-commerce databases: an order can contain many products, and a single product can be added to many different orders. This is where the “order items” table comes in: it sits between the products and the orders, and it records every time a product is added to an order. This will be relevant later on in the article, when we look at why some database queries are slow.

The word database also refers to the software that contains all this data, as in “My database crashed while I was having breakfast,” or “I really need to upgrade my database.” Popular database software include Microsoft Access 2010, Microsoft SQL Server, MySQL, PostgreSQL and Oracle Database 11g.

The acronym SQL comes up a lot when dealing with databases. It stands for “structured query language” and is pronounced “sequel” or “es-cue-el.” It’s the language used to ask and tell a database things — exciting things like SELECT lastname FROM customers WHERE city='Brighton'. This is called a database query because it queries the database for data. There are other types of database statements: INSERT for putting in new data, UPDATE for updating existing data, DELETE for deleting things, CREATE TABLE for creating tables, ALTER TABLE and many more.

How Can A Database Slow Down A Website?

A brand new empty website will run very fast, but as it grows and ages, you may notice some sluggishness on certain pages, particularly pages with complicated bits of functionality. Suppose you wanted to show “Customers who bought this product also bought…” at the bottom of a page of products. To extract this information from the database, you would need to do the following:

  1. Start with the current product,
  2. See how many times the product has recently been added to anyone’s shopping basket (the “order items” table from above),
  3. Look at the orders related to those shopping baskets (for completed orders only),
  4. Find the customers who made those orders,
  5. Look at other orders made by those customers,
  6. Look at the contents of those orders’ baskets (the “order items” again),
  7. Look up the details of those products,
  8. Identify the products that appear the most often and display them.

You could, in fact, do all of that in one massive database query, or you could split it up over several different queries. Either way, it might run very quickly when your database has 20 products, 12 customers, 18 orders and 67 order items (i.e. items in shopping baskets). But if it is not written and programmed efficiently, then it will be a lot slower with 500 products, 10,000 customers, 14,000 orders and 100,000 order items, and it will slow down the page.

This is a very complicated example, but it shows what kind of stuff goes on behind the scenes and why a seemingly innocuous bit of functionality can grind a website to a halt.

A website could slow down for many other reasons: the server running low on memory or disc space; another website on the same server consuming resources; the server sending out a lot of emails or churning away at some other task; a software, hardware or network fault; a misconfiguration. Or it may have suddenly become a popular website. The next two sections, therefore, will look at speed in more detail.

Is It My Database?

There are now several ways to analyze your website’s speed, including the Firebug plug-in6 for Firefox, the developer tools in Google Chrome (press Shift + Control + I, and then go to Resources → Enable Resource Tracking) and Yahoo YSlow7. There are also websites such as WebPagetest8, where you can enter a URL, and it will time it from your chosen location.

All of these tools will show you a diagram of all of the different resources (HTML, images, CSS and JavaScript files) used by your page, along with how long each took to load. They will also break down the time taken to perform a DNS lookup (i.e. to convert your domain name into an IP address), the time taken to connect to your server, the time spent waiting for your server to reply (aka “time to first byte”), and the time spent receiving (i.e. downloading) the data.

Many Web pages are constructed in their entirety by the Web server, including by PHP that accesses the database, and then sent to the browser all at once, so any database delays would lead to a long waiting time, and the receiving/downloading time would be proportional to the amount of data sent. So, if your 20 kB HTML page has a quick connection, a waiting time of 5 seconds and a download time of 0.05 seconds, then the delay would occur on the server, as the page is being built.

Not all Web pages are like this, though. The PHP flush function forces the server to send the HTML that it has already built to the browser right away. Any further delays would then be in the receiving time, rather than the waiting time.

Either way, you can compare the waiting/receiving time for your suspected slow and complicated Web page to the waiting time for a similarly sized HTML page (or image or other static resource) on the same server at the same time. This would rule out the possibility of a slow Internet connection or an overloaded server (both of which would cause delays) and allow you to compare the times taken to construct the pages. This is not an exact science, but it should give you some indication of where things are being held up.

The screenshots below show the analysis provide by Google Chrome’s Developer Tools of a 20 kB Web page versus a 20 kB image. The Web page waited 130 milliseconds (ms) and downloaded for 22 ms. The image waited for 51 ms and downloaded for 11 ms. The download/receiving times are about the same, as expected, but the server is spending about 80 ms extra on processing and constructing the Web page, which entails executing the PHP and calling the database.

When performing these tests, analyze the static resource by itself and click “Refresh,” so that you are not getting a quick cached version. Also, run each a few times to ensure that you’re not looking at a statistical anomaly. The third screenshot below shows that WebPagetest indicates almost double the time of Google for the same page at the same time, demonstrating that using the same environment for all tests is important.

Screenshot9
Resource analysis using Google Chrome’s Developer Tools, showing a 130-ms wait time for a Web page.

Waiting and receiving time for an image10
The same tool, showing a 51-ms wait time for an image of about the same size.

Screenshot11
Resource analysis of the same page from WebPagetest, with a 296-ms wait time and a 417-ms total time.

How To Time A Database Query In PHP And MySQL

The approach above was general; we can now get very specific. If you suspect that your database might be slowing down your website, then you need to figure out where the delay is coming from. I will define a couple of timing functions, and then use them to time every single database query that is run by a page. The code below is specific to PHP and MySQL, but the method could be used on any database-driven website:

function StartTimer ($what='') {
 global $MYTIMER; $MYTIMER=0; //global variable to store time
 //if ($_SERVER['REMOTE_ADDR'] != '127.0.0.1') return; //only show for my IP address

 echo '<p style="border:1px solid black; color: black; background: yellow;">';
 echo "About to run <i>$what</i>. "; flush(); //output this to the browser
 //$MYTIMER = microtime (true); //in PHP5 you need only this line to get the time

 list ($usec, $sec) = explode (' ', microtime());
 $MYTIMER = ((float) $usec + (float) $sec); //set the timer
}
function StopTimer() {
 global $MYTIMER; if (!$MYTIMER) return; //no timer has been started
 list ($usec, $sec) = explode (' ', microtime()); //get the current time
 $MYTIMER = ((float) $usec + (float) $sec) - $MYTIMER; //the time taken in milliseconds
 echo 'Took ' . number_format ($MYTIMER, 4) . ' seconds.</p>'; flush();
}

StartTimer starts the timer and also prints whatever you are trying to time. The second line is a check of your IP address. This is very useful if you are doing this (temporarily) on a live website and don’t want everyone in the world to see the timing messages. Uncomment the line by removing the initial //, and replace the 127.0.0.1 with your IP address12. StopTimer stops the timer and displays the time taken.

Most modern websites (especially well-programmed open-source ones) have a lot of PHP files but query the database in only a handful of places. Search through all of the PHP files for your website for mysql_db_query or mysql_query. Many software development packages such as BBEdit have functions to perform searches like this; or, if you are familiar with the Linux command line, try this:
grep mysql_query `find . -name *php`

You may find something like this:

mysql_query ($sql);

For WordPress 3.0.4, this is on line 1112 of the file wp-includes/wp-db.php. You can copy and paste the functions above into the top of this file (or into any PHP file that is included by every page), and then add the timer before and after the mysql_query line. It will look like this:

StartTimer ($query);
$this->result = @mysql_query( $query, $dbh );
StopTimer();

Below is a partial screenshot of this being done on a brand new WordPress installation. It is running about 15 database queries in total, each taking about 0.0003 seconds (0.3 ms); so, less than 5 ms in total, which is to be expected for an empty database.

Screenshot13
This shows and times all of the database queries that WordPress runs.

If you have found this line in other commonly used systems, please share this information by adding to the comments for this article.

You can also do other interesting things with it: you can see how fast your computer is compared to mine. Counting to 10 million takes my computer 2.9420 seconds. My Web server is a bit faster at 2.0726 seconds:

StartTimer ('counting to 10000000');
for ($i=0; $i<10000000; $i++); //count to a high number
StopTimer();

Notes on the Results

This technique gives you only comparative results. If your server was very busy at that moment, then all of the queries would be slower than normal. But you should have at least been able to determine how long a fast query takes on your server (maybe 1 to 5 ms), and therefore identify the slow-ish ones (200+ ms) and the really slow ones (1+ second). You can run the test a few times over the course of an hour or day (but not immediately after — see the section below about the database cache) to make sure you’re not getting a fluke.

This will also most likely severely mess up the graphical presentation of the page. It may also give you PHP warnings like “Cannot modify header information. Headers already sent by…” This is because the timing messages are interfering with cookie and session headers. As long as the page still displays below the warnings, you can ignore them. If the page does not display at all, then you may need to put the StartTimer and StopTimer around specific blocks of code, rather than around mysql_query.

This technique is essentially a quick hack to show some rough results. It should not be left on a live website.

What Else Could It Be?

If your database queries are not particularly slow, but the construction of your Web page is, then you might just have poorly written code. You can put the timer statements above around bigger and bigger blocks of code to see if and where the delay is occurring. It could be that you are looping through 10,000 full rows of product information, even if you are displaying only 20 product names.

Profiling

If you are still baffled and/or want more complete and accurate information about what’s happening in your code, you could try a debugging and profiling tool such as Xdebug14, which analyzes a local copy of your website. It can even visually show where bottlenecks are occurring.

Indexing Database Tables

The experiment above may have surprised you by showing just how many database queries a page on your website is running, and hopefully, it has helped you identify particularly slow queries.

Let’s look now at some simple improvements to speed things up. To do this, you’ll need a way to run database queries on your database. Many server administration packages (like cPanel or Plesk) provide phpMyAdmin for this task. Alternatively, you could upload something like phpMiniAdmin15 to your website; this single PHP file enables you to look at your database and run queries. You’ll need to enter your database name, user name and password. If you don’t know these, you can usually find them in your website’s configuration file, if it has one (in WordPress, it’s wp-config.php).

Among the database queries that your page runs, you probably saw a few WHERE conditions. This is SQL’s way of filtering out results. For instance, if you are looking at an “Account history” type of page on your website, there is probably a query like this to look up all of the orders someone has placed. Something like this:

SELECT * FROM orders WHERE customerid = 2;

This retrieves all orders placed by the customer with the database ID 2. On my computer, with 100,000 orders in the database, running this took 0.2158 seconds.

Columns like customerid — which deal with a lot of WHERE conditions with = or < or > and have many possible values, should be indexed. This is like the index at the back of a book: it helps the database quickly retrieve indexed data. This is one of the quickest ways to speed up database queries.

What to Index

In order to know which columns to index, you need to understand a bit about how your database is being used. For example, if your website is often used to look up categories by name or events by date, then these columns should be indexed.

SELECT * FROM categories WHERE name = 'Books';
SELECT * FROM events WHERE startdate >= '2011-02-07';

Each of your database tables should already have an ID column (often called id, but sometimes ID or articleid or the like) that is listed as a PRIMARY KEY, as in the wp_posts screenshot below. These PRIMARY KEYs are automatically indexed. But you should also index any columns that refer to ID numbers in other tables, such as customerid in the example above. These are sometimes referred to as FOREIGN KEYs.

SELECT * FROM orders WHERE customerid = 2;
SELECT * FROM orderitems WHERE orderid = 231;

If a lot of text searches are being done, perhaps for descriptions of products or article content, then you can add another type of index called a FULL TEXT index16. Queries using a FULL TEXT index can be done over multiple columns and are initially configured to work only with words of four or more letters. They also exclude certain common words17 like about and words that appear in more than 50% of the rows being searched. However, to use this type of index, you will need to change your SQL queries. Here is a typical text search, the first without and the second with a FULL TEXT index:

SELECT * FROM products WHERE name LIKE '%shoe%' OR description LIKE '%shoe%';
SELECT * FROM products WHERE MATCH(name,description) AGAINST ('shoe');

It may seem that you should go ahead and index everything. However, while indexing speeds up SELECTs, it slows down INSERTs, UPDATEs and DELETEs. So, if you have a products table that hardly ever changes, you can be more liberal with your indexing. But your orders and order items tables are probably being modified constantly, so you should be more sparing with them.

There are also cases where indexing may not help18; for example, if most of the entries in a column have the same value. If you have a stock_status column that stores a value of 1 for “in stock,” and 95% of your products are in stock, then an index wouldn’t help someone search for in-stock products. Imagine if the word the was indexed at the back of a reference book: the index would list almost every page in the book.

SELECT * FROM products WHERE stock_status = 1;

How to Index

Using phpMyAdmin or phpMiniAdmin, you can look at the structure of each database table and see whether the relevant columns are already indexed. In phpMyAdmin, click the name of the table and browse to the bottom where it lists “Indexes.” In phpMiniAdmin, click “Show tables” at the top, and then “sct” for the table in question; this will show the database query needed to recreate the table, which will include any indices at the bottom — something like KEY 'orderidindex' ('orderid').

Screenshot19
Using phpMiniAdmin to check for indices in the WordPress wp_posts table.

If the index does not exist, then you can add it. In phpMyAdmin, below the index, it says “Create an index on 1 columns”; click “Go” here, enter a useful name for the index (like customeridindex), choose the column on the next page, and press “Save,” as seen in this screenshot:

Screenshot20
Indexing a column using phpMyAdmin.

In phpMiniAdmin, you’ll have to run the following database statement directly in the large SQL query box at the top:

ALTER TABLE orders ADD INDEX customeridindex (customerid);

Running the query again after indexing takes only 0.0019 seconds on my computer, 113 times faster.

Adding a FULL TEXT index is a similar process. When you run searches against this index, you must list the same columns:

ALTER TABLE articles ADD FULLTEXT(title,author,articletext);
SELECT * FROM articles WHERE MATCH(title,author,articletext) AGAINST ('mysql');

Back-Ups and Security

Before altering your database tables in any way, make a back-up of the whole database. You can do this using phpMyAdmin or phpMiniAdmin by clicking “Export.” Especially if your database contains customer information, keep the back-ups in a safe place. You can also use the command mysqldump to back up a database via SSH:

mysqldump --user=myuser --password=mypassword
--single-transaction --add-drop-table mydatabase 
> backup`date +%Y%e%d`.sql

These scripts also represent a security risk, because they make it much easier for someone to steal all of your data. While phpMyAdmin is often provided securely though your server management software, phpMiniAdmin is a single file that is very easy to upload and forget about. So, you may want to password-protect it or remove it after usage.

Optimizing Tables

MySQL and other kinds of database software have built-in tools for optimizing their data. If your tables get modified a lot, then you can run the tools regularly to make the database tables smaller and more efficient. But they take some time to run (from a few seconds to a few minutes or more, depending on the size of the tables), and they can block other queries from running on the table during optimization, so doing this at a non-busy time is best. There’s also some debate21 about how often to optimize, with opinions ranging from never to once in a while to weekly.

To optimize a table, run database statements such as the following in phpMyAdmin or phpMiniAdmin:

OPTIMIZE TABLE orders;

For example, before I optimized my orders table with 100,000 orders, it was 31.2 MB in size and took 0.2676 seconds to run SELECT * FROM orders. After its first ever optimization, it shrunk to 30.8 MB and took only 0.0595 seconds.

The PHP function below will optimize all of the tables in your database:

function OptimizeAllTables() {
 $tables = mysql_query ('SHOW TABLES'); //get all the tables
 while ($table = mysql_fetch_array ($tables))
 mysql_query ('OPTIMIZE TABLE ' . $table[0]); //optimize them
}

Before calling this function, you have to connect to your database. Most modern websites will connect for you, so you don’t need to worry about it, but the relevant MySQL calls are shown here for the sake of completeness:

mysql_connect (DB_HOST, DB_USER, DB_PASSWORD);
mysql_select_db (DB_NAME);
OptimizeAllTables();

Making Sure To Use The Cache

Just as a Web browser caches copies of pages you visit, database software caches popular queries. As above, the query below took 0.0019 seconds when I ran it the first time with an index:

SELECT * FROM orders WHERE customerid=2;

Running the same query again right away takes only 0.0004 seconds. This is because MySQL has remembered the results and can return them a second time without looking them up again.

However, many news websites and blogs might have queries like the following to ensure that articles are displayed only after their published date:

SELECT * FROM posts WHERE publisheddate <= CURDATE();
SELECT * FROM articles WHERE publisheddate <= NOW();

These queries cannot be cached because they depend on the current time or date. In a table with 100,000 rows, a query like the one above would take about 0.38 seconds every time I run it against an unindexed column on my computer.

If these queries are run on every page of your website, thousands of times per minute, it would speed things up considerably if they were cacheable. You can force queries to use the cache by replacing NOW or CURDATE with an actual time, like so:

SELECT * FROM articles WHERE publisheddate <= '2011-01-17 17:00';

You can use PHP to make sure the time changes every five minutes or so:

$time = time();
$currenttime = date ('Y-m-d H:i', $time - ($time % 300));
mysql_query (“SELECT * FROM articles WHERE publisheddate <= '$currenttime'”);

The percentage sign is the modulus operator. % 300 rounds the time down to the last 300 seconds or 5 minutes.

There are other uncacheable MySQL functions22, too, like RAND.

Outgrowing Your Cache

Outgrowing your MySQL cache can also make your website appear to slow down. The more posts, pages, categories, products, articles and so on that you have on your website, the more related queries there will be. Take a look at this example:

SELECT * FROM articles WHERE publisheddate <= '2011-01-17 17:00' AND categoryid=12

It could be that when your website had 500 categories, queries like this one all fit in the cache together and all returned in milliseconds. But with 1000 regularly visited categories, they keep knocking each other out of the cache and returning much slower. In this case, increasing the size of the cache might help. But giving more server RAM to your cache means spending less on other tasks, so consider this carefully. Plenty of advice is available about turning on and improving the efficiency of your cache23 by setting server variables.

When Caching Doesn’t Help

A cache is invalidated whenever a table changes. When a row is inserted, updated or deleted, all queries relying on that table are effectively cleared from the cache. So, if your articles table is updated every time someone views an article (perhaps to count the number of views), then the improvement suggested above might not help much.

In such cases, you may want to investigate an application-level cacher, such as Memcached24, or read the next section for ideas on making your own ad-hoc cache. Both require much bigger programming changes than discussed up to now.

Making Your Own Cache

If a particularly viscous database query takes ages but the results don’t change often, you can cache the results yourself.

Let’s say you want to show the 20 most popular articles on your website in the last week, using an advanced formula that takes into account searches, views, saves and “Send to a friend” hits. And you want to show these on your home page in an unordered (<ul>) HTML list.

It might be easiest to use PHP to run the database query once an hour or once a day and save the full list to a file somewhere, which you can then include on your home page.

Once you have written the PHP to create the include file, you could take one of a couple approaches to scheduling it. You could use your server’s scheduler (in Plesk 8, go to Server → Scheduled Tasks) to call a PHP page every hour, with a command like this:

wget -O /dev/null -q http://www.mywebsite.co.uk/runhourly.php

Alternatively, you could get PHP to check whether the file is at least an hour old before running the query — something like this, where 3600 is the number of seconds in an hour:

$filestat = stat ('includes/complicatedfile.html');
//look up information about the file
if ($filestat['mtime'] < time()-3600) RecreateComplicatedIncludeFile();
//over 1 hour
readfile ('includes/complicatedfile.html');
//include the file into the page

Returning to the involved example above for “Customers who bought this product also bought…,” you could also cache items in a new database column (or table). Once a week or so, you could run that long set of queries for each and every product, to figure out which other products customers are buying. You could then store the resulting product ID numbers in a new database column as a comma-separated list. Then, when you want to select the other products bought by customers who bought the product with the ID 12, you can run this query:

SELECT * FROM products WHERE FIND_IN_SET(12,otherproductids);

Reducing The Number Of Queries By Using JOINs

Somewhere in the management and control area of your e-commerce website is probably a list of your orders with the names of the customers who made them.

This page might have a query like the following to find all completed orders (with a status value indicating whether an order has been completed):

SELECT * FROM orders WHERE status>1;

And for each order it comes across, it might look up the customer’s details:

SELECT * FROM customers WHERE id=1;
SELECT * FROM customers WHERE id=2;
SELECT * FROM customers WHERE id=3;
etc

If this page shows 100 orders at a time, then it has to run 101 queries. And if each of those customers looks up their delivery address in a different table, or looks for the total charge for all of their orders, then the time delay will start to add up. You can make it much faster by combining the queries into one using a JOIN. Here’s what a JOIN looks like for the queries above:

SELECT * FROM orders INNER JOIN customers
ON orders.customerid = customers.id WHERE orders.status>=1;

Here is another way to write this, without the word JOIN:

SELECT * FROM orders, customers
WHERE orders.customerid = customers.id AND orders.status>=1;

Restructuring queries to use JOINs can get complicated because it involves changing the accompanying PHP code. But if your slow page runs thousands of database statements, then it may be worth a look. For further information, Wikipedia offers a good explanation of JOINs25. The columns with which you use a JOIN (customerid in this case) are also prime candidates for being INDEXed.

You could also ask MySQL to EXPLAIN26 a database query. This tells you which tables it will use and provides an “execution plan.” Below is a screenshot showing the EXPLAIN statement being used on one of the more complex WordPress queries from above:

Screenshot27
Using the EXPLAIN statement to explain how MySQL plans to deal with a complex query.

The screenshot shows which tables and indices are being used, the JOIN types, the number of rows analyzed, and a lot more information. A comprehensive page on the MySQL website explains what the EXPLAIN explains28, and another much shorter page goes over how to use that information to optimize your queries29 (by adding indices, for instance).

…Or Just Cheat

Finally, returning again to the advanced example above for “Customers who bought this product also bought…,” you could also simply change the functionality to be something less complicated for starters. You could call it “Recommended products” and just return a few other products from the same category or return some hand-picked recommendation.

Conclusion

This article has shown a number of techniques for improving database performance, ranging from simple to quite complex. While all well-built websites should already incorporate most of these techniques (particularly the database indices and JOINs), the techniques do get overlooked.

There is also a lot of debate on forums around the Web about the effectiveness and reliability of some of these techniques (i.e. measuring speed, indexing, optimization, how best to use the cache, etc.), so the advice here is not definitive, but hopefully it gives you an overview of what’s available.

If your website starts to mysteriously slow down after a few months or years, you will at least have a starting point for figuring out what’s wrong.

(al)

Footnotes

  1. 1 http://googlewebmastercentral.blogspot.com/2010/04/using-site-speed-in-web-search-ranking.html
  2. 2 http://coding.smashingmagazine.com/wp-content/uploads/2011/02/Site-performance-full.png
  3. 3 http://top-10-list.org/2010/02/16/top-10-largest-databases-list/
  4. 4 http://codex.wordpress.org/Database_Description
  5. 5 http://www.magentocommerce.com/wiki/2_-_magento_concepts_and_architecture/magento_database_diagram
  6. 6 http://getfirebug.com/
  7. 7 http://developer.yahoo.com/yslow/
  8. 8 http://www.webpagetest.org
  9. 9 http://coding.smashingmagazine.com/wp-content/uploads/2011/02/Google-Chrome-image-full.png
  10. 10 http://coding.smashingmagazine.com/wp-content/uploads/2011/02/Google-Chrome-image-full.png
  11. 11 http://coding.smashingmagazine.com/wp-content/uploads/2011/02/Web-page-test-full1.png
  12. 12 http://whatismyipaddress.com/
  13. 13 http://coding.smashingmagazine.com/wp-content/uploads/2011/02/Wordpress-timed-full.png
  14. 14 http://www.xdebug.org/
  15. 15 http://phpminiadmin.sourceforge.net/
  16. 16 http://www.devarticles.com/c/a/MySQL/Getting-Started-With-MySQLs-Full-Text-Search-Capabilities/1/
  17. 17 http://dev.mysql.com/doc/refman/5.0/en/fulltext-stopwords.html
  18. 18 http://www.mysqlperformanceblog.com/2007/08/28/do-you-always-need-index-on-where-column/
  19. 19 http://coding.smashingmagazine.com/wp-content/uploads/2011/02/Show-create-table-full.png
  20. 20 http://coding.smashingmagazine.com/wp-content/uploads/2011/02/Adding-index-full.png
  21. 21 http://www.xaprb.com/blog/2010/02/07/how-often-should-you-use-optimize-table/
  22. 22 http://dev.mysql.com/doc/refman/5.0/en/query-cache-operation.html
  23. 23 http://www.databasejournal.com/features/mysql/article.php/3808841/Optimizing-the-MySQL-Query-Cache.htm
  24. 24 http://memcached.org/
  25. 25 http://en.wikipedia.org/wiki/Join_%28SQL%29
  26. 26 http://dev.mysql.com/doc/refman/5.0/en/explain.html
  27. 27 http://coding.smashingmagazine.com/wp-content/uploads/2011/02/Mysql-explain-full.png
  28. 28 http://dev.mysql.com/doc/refman/5.5/en/explain-output.html
  29. 29 http://dev.mysql.com/doc/refman/5.5/en/using-explain.html

↑ Back to topShare on Twitter

Paul Tero is an experienced PHP programmer and server administrator. He developed the Stockashop ecommerce system in 2005 for Sensable Media. He now works part-time maintaining and developing Stockashop, and the rest of the time freelancing from a corner of his living room, and sleeping, eating, having fun, etc. He has also written numerous other open sourcish scripts and programs.

Advertising

Note: Our rating-system has caused errors, so it's disabled at the moment. It will be back the moment the problem has been resolved. We're very sorry. Happy Holidays!

  1. 1

    Hey,

    Excellent article, well written. I’ve just finished creating a blog website that’s based on a MySQL database and there’s definetly some tips here worth implementing.

    Cheers

  2. 2

    Sphinx (http://sphinxsearch.com/) is an excellent way to increase speed on sites which require a lot of database queries.

  3. 3

    Very awesome article lots of information that is useful. I’ve actually been wondering about caching queries because I started creating a tinymvc framework that I could use for small applications and I wanted to learn how to cache past queries.

  4. 4

    Fernando Agüero

    March 23, 2011 2:46 am

    Epic Article! Lots of useful information. Thank you very much Paul ;)

    Greetings from Spain.

  5. 6

    Interesting article, but the Google Webmaster Tools do not display the time a website took to generate (well, they do show you that metric, but not in this slow/fast graph). It is the time the website took to render and that depends mainly on the number of files, the speed of the connection and javascript code.

    Optimizing the database isn’t something bad, but if your theme/layout isn’t speedy enough, no amount of saved milliseconds will notably speed up your website ;-)

    • 7

      First, I’m pretty sure GWT *does* show the total load time for the pages, i.e. response time + download + rendering.

      Your last paragraph is kind of correct, but the opposite statement is exactly as correct – that’s what this article was trying to say. There has been such a big onus on the front-end recently that it’s easy to forget about the back-end.

      On my website the front end runs about as fast as it could do, but I realized I could cache nearly all the pages instead of going to the database every time and that made a big difference. And then we got a huge spike in visitors a few weeks ago and needed to make some more changes to Apache itself to allow more concurrent connections. So no, it’s not all about the front-end.

  6. 8

    Good stuff. If databases don’t seem very exciting or sexy, it’s because they are not. When starting projects from scratch, check out other databases that are part of the NoSQL movement to find places to put your data that match the structure of W3C web layout and programming. I use eXist and MarkLogic.

  7. 10

    Thanks so much! I spent half my day yesterday trying to track down a sluggish page load, and was left with the database as the culprit after several different load tests. Your article is so timely!

  8. 11

    Fantastic article!
    Very userfull for web optimization!!!!

  9. 12

    One single query to find “customers who bought this also bought this” takes 86ms on one of my clients database with 400 products, 20,000 customers, 120,000 basket items, 40,000 orders. Using PostgreSQL, Onxshop eCommerce and this query:

    SELECT DISTINCT product_variety.product_id AS related_product_id, product_variety_id, count(product_variety_id) AS count
    FROM ecommerce_basket_content basket_content
    LEFT OUTER JOIN ecommerce_product_variety product_variety ON (product_variety.id = product_variety_id)
    LEFT OUTER JOIN ecommerce_product product ON (product.id = product_variety.product_id)
    LEFT OUTER JOIN ecommerce_basket basket ON (basket.id = basket_content.basket_id)
    WHERE product.publish = 1 AND basket.id IN
    (
    SELECT basket_content.basket_id
    FROM ecommerce_basket_content basket_content
    LEFT OUTER JOIN ecommerce_product_variety product_variety ON (basket_content.product_variety_id = product_variety.id)
    WHERE product_variety.product_id = $product_id
    )
    AND product.id != $product_id
    GROUP BY product_id, product_variety_id
    ORDER BY count DESC LIMIT $limit

    86ms is not much, but I’m using SQL query cache and front-end caching to speed this up anyway.

    Database schema is here: http://www.onxshop.com/share/docs/Onxshop.pdf

    • 13

      That is quick, though that query includes a sub-query to find the other orders containing each item – which is similar to breaking it up into 2 separate queries. Thanks for the interesting comment.

  10. 14

    Great article idea, its scary how many people out there don’t know this already, no doubt, there will be 500 comments from all of them within the space of a week. Something else not mentioned here is analysis of the data you want to store and creating the correct field types in the db tables to start with. For example I you have a db of vehicle manufacturers store the field manufacturer in a varchar (13) field as the longest vehicle manufacturer name required is usually ‘Mercedez-benz’.

    Also sometimes you can save on queries by say storing info you can get elsewhere. ie. you could do a query to fetch blog with id#2 from your blog table and comments with blog id#2 from your comments table but the second query isn’t required if you have no comments. Therefore the extra effort of storing a comment count along with the blog entry in the blog table will save you having to do an extra query when you need to know the comment count for that blog.

    Basic examples I know but you never know when your gona make something that spreads like wildfire and sites gonna struggle.

  11. 15

    Great article, thanks. Do you know of a Xdebug equivalent for .Net?

  12. 17

    Yeah, most programmers, and almost all designers should go ahead and stay the hell away from database work if at all possible. Leave it to the DBAs, there is a reason people specialize in that. The section on indexing in particular is a primitive understanding of the inner workings of indices at best.

    • 18

      Can you expand upon why this was a primitive understanding of indexing? I am not a DBA by trade, though I have been working with databases for a long time, and did research this article well, so I think it is valid advice. And it is not intended to be very in-depth, it is a list of the quickest and broadest fixes. A blog or small ecommerce website cannot afford a DBA, so the designer or programmer needs to have some tips up their sleeve.

  13. 19

    Long time lurker – first time commenter here! This is the type of article I love. Just when I’ve had enough “20 new css tricks!” articles to consider looking for a new site, you come out with a wonderful, hard-hitting article on database optimization and caching grounded in solid computer science.

    And, @dave, I think databases are way exciting. Undoubtedly, the database is the core of any meaningful web app. Which is more useful: a presentation with no data, or raw data? (or more $$valuable: Facebook’s slick ajax presentation framework, or their Library-of-congress sized database of likes, photos, and connections?) If the core of your product (in all of its raw, abstract glory) doesn’t strike you as sexy or exciting, you may need to rethink that particular product.

  14. 20

    Another alternative would be using a NoSQL database (mongodb, couchdb, cassandra).

    If you’re already stuck with MySQL installing Percona Server (a drop in replacement for MySQL made by several original authors) is a good idea too.

  15. 21

    A really interesting article: it’s always a good practice to optimize and reduce rdbms access. Please note that some queries include extra markup (<strong>) that has to be removed in order to make them work.

    Thank you!

    • 22

      Thanks for pointing that out – the extra markup was in error and has now been removed.

  16. 23

    octavian purice

    March 23, 2011 6:51 am

    A very useful article… Very nice intro into DB optimization :)

  17. 24

    One important note to make your queries even better:

    If you have a table with columns “id, title, body, post_date”

    SELECT id, title, body, post_date

    will be more efficient than

    SELECT *

    Specifying the column names hints to MySQL which indexes to use.

    I work for a site that gets roughly 2 -2.5 million unique visits per day.

    • 25

      Very nice article which makes great recommendations.

      In addition to specifying column names, only select the columns you actually need in the query. If you don’t need the data, don’t make the database do the work to retrieve it.

    • 26

      Logically, that makes sense (greater specificity reduces wildcard search). But I wonder if in practice it’s actually faster, especially on tables that only have just those 4 columns. Something I’ll have to play around with today.

      • 27

        Today, your table only has those 4 colums – consider future state.
        Also, there is the possibility of field name collisions from different tables in a join.

  18. 28

    railsperformance

    March 23, 2011 8:31 am

    Great post!

    You covered a lot of ground very nicely, and did it for a broad audience. (something I find almost impossible to do well)

    I think your advice is spot on, and follows what I recommend for tuning:
    1. tune the front-end (compress, combine, minify, defer)
    2. tune the backend (index, tune queries, tune db, queue long-running tasks like sending emails)
    3. tune the page interaction (use ajax instead of reload the whole page, improve the javascript, pre-load items for the next page)

    I think a key component of any tuning strategy is to have a way to measure before and after results, and to monitor the performance in the long-term.

    great post!

  19. 29

    Great article about Database. Mysql Clustering and database load balancing is the best way to handle your high traffic with database. Load balancing and choosing storage engine type and indexing is important for speeding the mysql database.

  20. 30

    I’ve read quite some articles elsewhere and I found out pretty a lot of them suggest not to use JOINs over multiple query runs, as JOINs are computationally intensive.

    What do you say?

    • 31

      I’m not sure exactly what you mean by “over multiple query runs”. Can you perhaps provide a link or title of the article where you read about that? As I think that a 4ms JOINed query would be faster than 100 1ms separate queries. And JOINs are an essential and important part of SQL.

  21. 32

    Not got the time to fully read this article but I would say that you should cache as much of you’re generated HTML (content), even if it’s just for a few minutes, but only once you actually need to, ie when you do get heavy usage.

  22. 33

    Does the database automatically cache, or is it something you have to turn on?? Awesome write up though.

    • 34

      I think that the cache is usually automatically on. Using phpMiniAdmin or phpMyAdmin you can run the SQL statement “SHOW VARIABLES;” and look for the query_cache related settings. You can also try it, by running a slow and complicated query twice in a row – and see if the second running is much faster.

  23. 35

    thanks for the great article. Just noticed that I have a Site Performance under Labs instead of Diagnostics in Web Master Tools.

  24. 36

    Wow – since I’m not a PHP programmer there’s a lot to digest here!
    Very useful tips as far as I can see (I’ve worked a lot with off-line databases before = same principals apply, so I can see the merit.)

    A big CHEERS! :D

  25. 37

    My eyes drooling when reading this..Useful articles. At least I know what is the behind database all about. kudos!

  26. 38

    Nice article! Learn a lot from it. Will try it out soon!

  27. 39

    This was one of the best articles I have read from Smashing. Great work and its good to know I’m doing things right.

  28. 40

    Thank you for all the great and positive comments. They are very much appreciated.

  29. 41

    There are so many people out there who buys servers and/or have slow sites because they haven’t built indexes for their queries. It can easily give 10x performance improvements.

  30. 42

    I only skimmed the article, I must admit, but one thing I didn’t notice on the list of advice was “cache your output”. This not only saves on database query time, but on script execution time as well. For example, an in-house CMS I developed would, upon receiving a request for a page that wasn’t cached, did all the usual stuff (DB lookup, build page with Smarty, etc) but once the output was rendered it would save a copy of it to disc. Any subsequent attempt to access the page in question would just load it directly from the cache without even invoking PHP. This all worked by having the 404 page be the PHP script that looked up and built the pages (or returned an actual 404 if the requested page didn’t exist). Editing the content in the CMS would caused the cached copy to be deleted.

    Of course it’s not quite so simple if you need bits of the page to be dynamic, but it’s not that tricky to get around (all cached pages would include() a small PHP script that would run the few queries that would be needed for the dynamic bits of the page). The result was a system where the “waiting” phase was extremely quick unless the page wasn’t cached. Another useful side effect of this approach is that if you’re connecting to the database less often you’re exposing it to potentially dangerous outside input less frequently as well.

  31. 43

    Unfortunately you make no mention of database engine and miss some very important points.

    The default DB engine in MySQL is MyISAM, which is avoided by any good transactional application. InnoDB would be used instead and functions very differently to MyISAM.

    The biggest problems I come across are badly written queries that perform table scans, join too many tables, or return all the data and then process it in PHP when it can be done easily with a little bit of SQL.

    What about transactions, and proper database design?

    • 44

      It’s true – but the article was intended to provide some basic tips to beginners. So I think that looking at database design or transactions would be too much for an introductory article. But thank you for the comment.

  32. 45

    Instead of phpMiniAdmin, you should *really* try out http://www.adminer.org/

  33. 46

    Nice! Thanks

  34. 47

    Good article. Indexes and query optimisation is an art in its self. In MS SQL there are also things called covering indexes. In general in any query language you should never use SELECT * as you will undoubtably be returning columns you do not need. If you then are selecting (and joining or ordering) on 3 or 4 columns out of a total of say 10 columns in the table, a covering index of those columns will substantially improve performance. Also the order of columns in an index is very important

  35. 48

    First off, I’m from SQL Server world where the functions you are using don’t exist. No big deal, but in that world, date time info is stored in increasing value, so if you modified the following query to use the right SQL Server function, you would also change the “>” to “<" to get PAST dates:
    SELECT * FROM posts WHERE publisheddate < GETDATE();
    –GETUTCDATE?
    It would pick the posts published before today (right now). Most people don't set publish dates in a DB to some future time. (Maybe a field called ProjectedPublishDate.)
    In SQL Server the query might just as well be WHERE publisheddate IS NOT NULL. of course this query would get the same results with:WHERE publisheddate < '2500/01/01'. At least, it would for the next 480+ years from now. (Will your code be around even 40 years from now?)

    Are you saying MySQL stores dates in decreasing value as time progresses? Or did you make a mistake in your query?

    You say "These queries cannot be cached because they depend on the current time or date."
    In SQL Server, they can't be cached because you are executing a function. Worse, indexing is useless because a function is indeterminate and you must look at every single record in the table to see if it meets the selection criteria with the function. In SQL Server if you execute "DECLARE @now datetime = GETUTCDATE()" and use @now in your query two times, the second query will act like it is cached because the times are so close, 99%-100% of the data you are looking for is already cached and only the new (if any) selections will have to be found anew.

    This is a neat PHP capability:
    $currenttime = date ('Y-m-d H:i', $time – ($time % 300));
    But even if your first query was just before and the second query just after a 5 minute boundary, the second would get the cache benefit of the first in SQL Server.

    (I was going to use this as proof that your times are stored in ascending order, then realized if it was in descending order the remainer would be negative and it's the same as adding a positive and this would work as going back in time in a decreasing time value reality.)

    • 49

      No – you’re exactly right – there was an error in the article. It was saying publisheddate>CURDATE(), but now it’s fixed to say publisheddate<=CURDATE(). Thanks for pointing that out. I'm not sure about SQL Server, but in MySQL queries using predictable functions like SUBSTRING can still be cached. It's just when they include NOW or RAND or functions like that, that they can't be cached.

  36. 50

    interesting article. I like it.
    greetings from Indonesia

  37. 51

    Take advantage of client-side caching as much as possible as well. If you cache a client-side xml doc with query results and then resolve updates to that data (AJAX is great for this, but not absolutely necessary), then you can start returning smaller result sets for only the data that has changed over time. When those result sets hit a certain size and the savings in response time deteriorate, its time to delete the client-side cache and regenerate a more fresh one. There are variations on this theme.

  38. 52

    See the way of clear explanation about data bases and try to be a person who can appreciate thing when some one do a good work.

    Thanks to your hard work Paul Tero

  39. 53

    You said
    The PHP flush function forces the browser to send the HTML that it has already built to the browser right away.

    Dont you mean the flush function forces the SERVER to send the html to the browser right away?

  40. 55

    Lots of very useful notes, would be helpfull to my undergrad project. Thanks.

  41. 56

    Good tutorial but nothing about ms-sql / oracle. it seem 99% of post here are only for php and mysql programmers

  42. 57

    Hi,

    Nice article !
    Though I guess it lacks the introduction of stored procedures, that can *greatly* reduce DB queries time.

    Nice job :)

    EDIT : and updating the mysql server version, which can also drastically improve DB performance

  43. 58

    Extending the indexing example, I’d say the entire database design needs to be optimized. For example, if you have a table with plenty of columns that are rarely used, it *may* be better to have those columns in a separate table and fetched via a join. Of course, for the wordpress kind of applications this is not a good example because developers don’t really design the data model.

  44. 59

    Great article Tero !

    Would like to add these
    http://www.devshed.com/c/a/MySQL/MySQL-Optimization-part-2/
    http://www.devshed.com/c/a/MySQL/MySQL-Optimization-part-1/

    Then search for ’20bits mysql optimization’ and read their “10 Tips for Optimizing MySQL Queries”

  45. 60

    May I add Jet Profiler for MySQL to the list of tools? It’s a graphical profiling tool, very easy to use. You can download it for free at:

    http://www.jetprofiler.com/

    It’s really powerful if you have a busy server and want to know what’s going on.

  46. 61

    The post provides a great and very detailed explanation on how your website is affected by the performance of your database, tips on how to speed up your website by optimizing your database, and useful tools to measure database performance. The article also discusses schema considerations, indexing, the importance of using memory cache, and problems you’re faced with when the cache is not large enough.

    At this point, I find that the article begins to drift into a “No-No-Land” from my perspective. It goes on to describe how to make your own cache on the application side, and then proceeds to making concessions on the business side in order to better deal with the performance issue.

    When you’re at the point where you’re thinking of building your own cache or throwing away important business functions, I think this is a good time to move to a more suitable solution for your database. You need a database that will provide efficient performance, stability and scalability; one that is “future ready” for your application’s anticipated growth.

    As an application developer, I think it would be better to focus on your code and not have to deal with database scaling and performance issues, even as your site grows more than you might have anticipated (which is good news!)

    Given limited time and financial resources an out-of-the-box solution, such as MySQL, is rather limited – providing you with only “one degree of scalability”. The article outlines several ways to achieve that one degree of scalability. However, when the need arises for increased scalability, and developers try to achieve the next level of scale, things get too complicated and too fragile. And that’s when you start trying to build your own cache… doing without some of the functionality you planned for… and basically fussing and worrying about IT and your database, instead of improving your application.

    Xeround cloud database is a great solution in this case. It offers great scalability and worry-free service that provides high availability with zero maintenance. What I particularly like about Xeround’s solution is that in most cases, it can help developers avoid even the initial obstacles of setting up their own database and making sure it suits both their initial needs and the needs of future expansion.

    Xeround provides a great solution for small-scale applications that will grow into a larger scale application, starting free of charge and progressing to a “pay per use” pricing model, which ensures your costs are highly correlated to your actual business activity.

  47. 62

    Couple of useful bits I hope to add here. Your function for timing databases queries is a little over the top. MySQL has profiling support built in since one of the 5.0 releases. We have this as part of a “app debug mode” for admins and it really does make profiing an entire page request effortless:

    http://dev.mysql.com/tech-resources/articles/using-new-query-profiler.html

    To answer a question above, the query cache is off for a new MySQL install but will be switched on by basically any hosting provider. If you’re running your own server, there are a few easy SQL commands to establish cache status:

    http://dev.mysql.com/doc/refman/5.1/en/query-cache-configuration.html

    Finally, one of the most useful things I find is to run a query without the help of the cache, using the SQL_NO_CACHE statement. For example, take a simple query:

    “SELECT * FROM posts ORDER BY post_date DESC”

    and change to:

    “SELECT SQL_NO_CACHE * FROM posts ORDER BY post_date DESC”

    and you’ll be seeing the performance of that query without a primed cache. Hope this helps some people.

  48. 63

    GT.M’s efficiency and productivity is out of the competition with the current databases in the Internet. I have launched successfully the sample http://cgi2m.net.ua/gedit (some things are not work in IE)
    with the processor quota at my hosting which can’t launch ANY SQL database. GT.M’s compiler is the fastest now – the speed of operations in memory in comparison with the V8 and node.js stuff – thrice faster.

  49. 64

    Simple GUI tool for profiling site queries Neor Profile SQL http://www.profilesql.com

  50. 65

    this is on line 1112 of the file wp-includes/wp-db.php. You can copy and paste the functions above into the top of this file (or into any PHP file that is included .

↑ Back to top