MySQL Slow Query Troubleshooting

Symptoms

Your application starts responding slowly. Users complain. You SSH into sw-infrarunbook-01 and run

SHOW FULL PROCESSLIST

— half the connections are stuck in "Sending data" or "Sorting result" states, some for 30 seconds or more. The MySQL slow query log is filling up. CPU isn't necessarily pegged, but disk I/O is through the roof. Or maybe the opposite: the server looks idle but queries are still hanging. These are the classic signs of a MySQL performance problem that needs diagnosis right now.

In my experience the symptoms across all causes look frustratingly similar. Here's what you typically see before you've narrowed anything down:

Application HTTP timeouts or 503 errors under otherwise normal load
SHOW FULL PROCESSLIST
showing queries stuck in "Sending data", "Sorting result", "Waiting for table metadata lock", or "Locked"
Slow query log entries with
Query_time
values well above your threshold
Connection pool exhaustion — the application layer queues up waiting for a database connection that never comes back
Disk I/O spikes correlating with query execution, suggesting temp tables landing on disk
InnoDB status showing elevated lock wait times or a trail of deadlocks

The challenge is that a query stuck in "Sending data" might be doing a full table scan, writing a temp table to disk, or waiting for a lock held by a completely unrelated transaction. You need real data before you start guessing.

First Step: Enable the Slow Query Log

If it's not already running, enable the slow query log at runtime — no restart needed:

mysql -u infrarunbook-admin -p -e "
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1;
SET GLOBAL slow_query_log_file = '/var/log/mysql/slow.log';
SET GLOBAL log_queries_not_using_indexes = 'ON';
"

With

long_query_time = 1

, anything taking more than one second gets logged. One second is a good starting threshold — go lower if you want to catch subtler degradation before it becomes a fire. The

log_queries_not_using_indexes

flag is invaluable: it catches queries that skip indexes entirely even when they complete quickly, which is a ticking time bomb on tables that are still small today.

Once you have data, use

mysqldumpslow

to aggregate it:

mysqldumpslow -s t -t 20 /var/log/mysql/slow.log

This surfaces the top 20 queries sorted by total execution time. The worst offenders rise fast. From there, you take each query and run it through

EXPLAIN

— which is the tool you'll live in for the rest of this runbook.

Root Cause 1: Missing Index

This is the most common cause I see in production systems, especially on tables that grew organically over time. The table was small when the app launched, nobody thought about indexes, and now it has 50 million rows and a WHERE clause that scans every single one of them.

Why it happens: When a column used in a

WHERE

JOIN ON

, or

ORDER BY

clause has no index, MySQL has no shortcut. It reads every row in the table and evaluates the condition against each one. On a table with millions of rows, this is catastrophic — and the problem compounds as the table grows.

How to identify it: Run

EXPLAIN

on the slow query:

EXPLAIN SELECT * FROM orders WHERE customer_id = 10042\G

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: orders
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 4823917
     filtered: 10.00
        Extra: Using where

Two dead giveaways:

type: ALL

and

possible_keys: NULL

. MySQL found no candidate indexes and is scanning all 4.8 million rows. The

rows

column is your cost estimate — that number should make you wince.

How to fix it:

ALTER TABLE orders ADD INDEX idx_customer_id (customer_id);

Run EXPLAIN again after adding the index:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: orders
         type: ref
possible_keys: idx_customer_id
          key: idx_customer_id
      key_len: 4
          ref: const
         rows: 31
     filtered: 100.00
        Extra: NULL

From 4.8 million rows examined down to 31. That's the difference between a 30-second query and a sub-millisecond one. On large tables in production, add indexes with

ALGORITHM=INPLACE, LOCK=NONE

to avoid blocking reads and writes during the operation:

ALTER TABLE orders
  ADD INDEX idx_customer_id (customer_id),
  ALGORITHM=INPLACE,
  LOCK=NONE;

Root Cause 2: Full Table Scan

A full table scan is what happens when MySQL reads every row in a table to resolve a query. Missing indexes cause it, but it also occurs when the optimizer decides an existing index isn't worth using — or when query patterns structurally prevent index use.

Why it happens: The MySQL query optimizer weighs the estimated cost of using an index against the cost of a sequential scan. If it estimates the query will return a large fraction of the rows — often anything above 20–30% — it may decide the full scan is actually cheaper. The optimizer's choice is driven by statistics stored in the data dictionary. Stale statistics, which happen after bulk loads or large deletes, can push it toward bad decisions even when an index would be clearly better.

How to identify it: Look for

type: ALL

in EXPLAIN output, especially when you expect an index to qualify:

EXPLAIN SELECT * FROM events WHERE status IN ('pending', 'processing', 'failed')\G

*************************** 1. row ***************************
         type: ALL
possible_keys: idx_status
          key: NULL
         rows: 12450000
        Extra: Using where

Here's an interesting case: there IS an index on

status

, but the optimizer chose not to use it. That's because

status

has very low cardinality — maybe only 5 distinct values across 12 million rows. The optimizer looked at the statistics and concluded it would have to hit the index then fetch 40% of the table anyway, so it's cheaper to just scan. Sometimes it's right. You need to verify.

Check whether stale statistics are contributing:

SELECT table_name, last_analyzed
FROM information_schema.tables
WHERE table_schema = 'myapp'
  AND table_name = 'events';

ANALYZE TABLE events;

How to fix it: If statistics were stale,

ANALYZE TABLE

alone often corrects the plan. If the optimizer is making a genuinely bad call even with fresh statistics, you can hint it — but use hints cautiously, because they override the optimizer permanently and can backfire as data distribution shifts over time. A more durable fix is usually a covering index that includes the columns you're actually selecting, reducing the cost of the index scan enough that the optimizer picks it naturally:

ALTER TABLE events ADD INDEX idx_status_covering (status, created_at, id);

Root Cause 3: Query Not Using Index

This one bites even experienced developers. The index exists. It looks like it should be used. EXPLAIN shows

possible_keys

has a value. But

key

is NULL, or worse, the query is doing a full scan with no possible keys at all despite an index on the column. The optimizer skipped it — usually for a reason you can fix once you understand it.

Why it happens: Several patterns silently break index usage:

A function wraps the indexed column:
WHERE DATE(created_at) = '2024-01-15'
— MySQL can't perform a range scan because it doesn't know what raw values will satisfy the function
A leading wildcard in LIKE:
WHERE email LIKE '%@solvethenetwork.com'
— B-tree indexes work left to right, a leading wildcard makes them useless
Implicit type conversion:
WHERE user_id = '10042'
when
user_id
is an INT — MySQL coerces every row's value rather than using the index
The leftmost prefix rule is violated on a composite index — querying on the second column of
INDEX(a, b)
without filtering on
a
first won't use the index
OR conditions across different indexed columns — the optimizer may not merge the index scans efficiently

How to identify it:

EXPLAIN SELECT * FROM sessions WHERE DATE(created_at) = '2024-03-10'\G

*************************** 1. row ***************************
         type: ALL
possible_keys: NULL
          key: NULL
         rows: 8900000
        Extra: Using where

The

DATE()

function wraps the column, preventing the optimizer from using the index on

created_at

. Even though the index exists and

created_at

is covered, MySQL can't use a range scan because it would have to evaluate

DATE()

on every row to know which ones qualify.

How to fix it: Rewrite the query to avoid wrapping the indexed column in a function. Use a range instead:

-- Before: index unusable
SELECT * FROM sessions WHERE DATE(created_at) = '2024-03-10';

-- After: index-friendly range scan
SELECT * FROM sessions
WHERE created_at >= '2024-03-10 00:00:00'
  AND created_at < '2024-03-11 00:00:00';

For the implicit type cast issue, fix the application to pass the correct data type — don't quote integer values. For leading wildcard LIKE patterns on large tables, if you genuinely need substring search, consider MySQL's

FULLTEXT

index or offload that workload to a dedicated search service. A

LIKE '%pattern'

against a column with millions of rows simply cannot be made fast with a B-tree index.

Root Cause 4: Temp Table Written to Disk

This one is sneaky. Indexes look fine, the query plan seems reasonable, but it's still slow and disk I/O spikes during execution. The culprit is often an in-memory temporary table that outgrew its memory budget and spilled to disk.

Why it happens: MySQL creates internal temporary tables when processing

GROUP BY

DISTINCT

ORDER BY

on derived tables, and certain join types. These start as in-memory MEMORY tables, limited by

tmp_table_size

and

max_heap_table_size

— whichever is smaller. The default for both is typically 16MB. If the intermediate result set grows beyond that limit, MySQL silently converts the temp table to a disk-based table and continues. Disk temp tables are orders of magnitude slower. The conversion happens without warning and is completely transparent to the query issuer.

How to identify it: EXPLAIN flags this in the Extra column:

EXPLAIN SELECT user_id, COUNT(*) AS cnt, SUM(amount)
FROM transactions
GROUP BY user_id
ORDER BY cnt DESC\G

*************************** 1. row ***************************
         type: ALL
         rows: 22000000
        Extra: Using temporary; Using filesort

"Using temporary; Using filesort" together is a bad combination. You can also check system-wide counters to understand how frequently this is happening across all queries:

SHOW GLOBAL STATUS LIKE 'Created_tmp%';

+-------------------------+----------+
| Variable_name           | Value    |
+-------------------------+----------+
| Created_tmp_disk_tables | 18423    |
| Created_tmp_files       | 4        |
| Created_tmp_tables      | 91205    |
+-------------------------+----------+

Created_tmp_disk_tables

is more than roughly 10% of

Created_tmp_tables

, you have a systemic problem worth addressing. I've seen servers where this ratio was above 50% — every other temporary table was hitting disk — and the engineering team had no idea because queries were still technically completing.

How to fix it: First, try to rewrite the query. Can you restructure the GROUP BY or add an index that covers both the filter and the grouping, letting MySQL walk the index instead of materializing a temp table? If the query is unavoidable, increase the memory budget. In a session or globally on sw-infrarunbook-01:

-- Session-level (for testing)
SET SESSION tmp_table_size = 67108864;
SET SESSION max_heap_table_size = 67108864;

# /etc/mysql/mysql.conf.d/mysqld.cnf on sw-infrarunbook-01
[mysqld]
tmp_table_size     = 64M
max_heap_table_size = 64M

Don't blindly crank these values high. Both settings apply per-connection — on a busy server handling 500 concurrent connections, 64MB each means up to 32GB in potential temp table memory. Profile the specific queries first, then set these to cover the 95th percentile result set size without over-allocating.

Root Cause 5: Lock Wait Timeout

Lock waits are fundamentally different from the other causes here. The query isn't doing too much work — it's not doing any work. It's just sitting there waiting for another transaction to release a lock it holds. The error when patience runs out is unmistakable:

ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

Why it happens: InnoDB uses row-level locking, but long-running transactions hold those locks for their entire duration. While they're held, any other transaction that needs the same rows has to wait. After

innodb_lock_wait_timeout

seconds — defaulting to 50 — MySQL gives up and returns the error to the waiting query. The holding transaction is not affected at all; it keeps running.

In my experience, the most common cause is a background job or batch process that opens a transaction, processes a large batch of rows, and takes its time committing — while the OLTP application queries pile up waiting for rows that batch job locked ten minutes ago. The second most common cause is a forgotten or abandoned connection holding an open transaction, sometimes from a client that crashed mid-operation.

How to identify it: Start by finding who is blocking whom:

SELECT
  r.trx_id            AS waiting_trx_id,
  r.trx_mysql_thread_id AS waiting_thread,
  r.trx_query         AS waiting_query,
  b.trx_id            AS blocking_trx_id,
  b.trx_mysql_thread_id AS blocking_thread,
  b.trx_query         AS blocking_query,
  b.trx_started       AS blocking_started
FROM information_schema.innodb_lock_waits w
INNER JOIN information_schema.innodb_trx b
  ON b.trx_id = w.blocking_trx_id
INNER JOIN information_schema.innodb_trx r
  ON r.trx_id = w.requesting_trx_id;

This query is invaluable during an active lock incident. It shows exactly which transaction is blocking which, when it started, and what query it's running — or

NULL

for the blocking query, which means the transaction did something, stopped issuing queries, and never committed. That sleeping thread holding an open transaction is your culprit.

For deeper context, check InnoDB status:

SHOW ENGINE INNODB STATUS\G

Look for the TRANSACTIONS section. It shows the full lock chain, which rows are locked, and how long things have been waiting. If there was a recent deadlock, that section also shows the last detected deadlock with both transactions and the chosen victim.

How to fix it: In an emergency, kill the blocking thread:

KILL 42;   -- use the blocking_thread value from the query above

That unblocks the waiting queries immediately. But killing threads is a band-aid — the real fixes are architectural. Keep transactions short and commit early. Never hold a transaction open while waiting for user input, making an external API call, or doing any non-database work. For batch jobs that process large numbers of rows, chunk the work and commit between batches of 1,000–10,000 rows rather than wrapping the entire job in a single transaction.

If you're seeing frequent lock contention between concurrent OLTP queries, review your transaction isolation level. The default

REPEATABLE READ

uses next-key locks and gap locks that can be surprisingly aggressive. Switching to

READ COMMITTED

for read-heavy workloads reduces the lock footprint significantly — it doesn't acquire gap locks, which eliminates a whole class of phantom-lock contention:

SET GLOBAL transaction_isolation = 'READ-COMMITTED';

Test this change in staging first. READ COMMITTED changes the semantics of reads inside transactions, which matters if your application relies on consistent reads across multiple statements.

Root Cause 6: Sorting Without an Index (Filesort)

ORDER BY performance is commonly underestimated until a table gets large. When MySQL can't resolve sort order directly from an index, it performs a filesort — collecting all candidate rows into a sort buffer, sorting them in memory (or on disk if the buffer isn't large enough), then returning results. On millions of rows, this is slow.

Why it happens: A filesort occurs when the ORDER BY columns aren't part of the index being used for the WHERE clause, or when the sort direction mixes ASC and DESC in a way that doesn't match a composite index. If the sort buffer (

sort_buffer_size

) can't hold all candidate rows, the sort spills to temporary files on disk — which you'll see as disk I/O without any obvious temp table cause.

How to identify it:

EXPLAIN SELECT * FROM audit_log
WHERE tenant_id = 500
ORDER BY created_at DESC
LIMIT 25\G

*************************** 1. row ***************************
         type: ref
          key: idx_tenant_id
         rows: 340000
        Extra: Using index condition; Using filesort

It's using the index for the WHERE clause but doing a filesort for ORDER BY. With 340,000 candidate rows to sort, that's expensive — and it gets worse as the tenant's data grows.

How to fix it: A composite index covering both the filter column and the sort column lets MySQL walk the index in the right order without sorting at all:

ALTER TABLE audit_log
  ADD INDEX idx_tenant_created (tenant_id, created_at DESC);

After adding this index, EXPLAIN should show

Using index

and the filesort is eliminated. The query cost drops from O(n log n) sort to a simple index range scan with an early LIMIT stop.

Prevention

Fixing slow queries reactively is exhausting and risky. The real goal is catching problems before they make it to production. Here's what actually works.

Keep the slow query log on permanently. Set

long_query_time = 1

and

log_queries_not_using_indexes = ON

in production and review it weekly — don't wait for an incident. Slow queries rarely appear suddenly; they creep up as data grows. A query that ran in 200ms at 1 million rows will run in 30 seconds at 50 million rows.

Use pt-query-digest for real analysis. Percona Toolkit's

pt-query-digest

does what

mysqldumpslow

can't — it clusters similar queries, shows execution time distributions, and flags queries with inconsistent timing (a sign of lock interference rather than algorithmic slowness):

pt-query-digest /var/log/mysql/slow.log --limit 20 --since 24h

Run EXPLAIN before every query that touches a large table. Add this to your code review checklist. Any new query against a table over 100,000 rows should have the EXPLAIN output documented in the PR. This catches issues before they ship, when fixing them is cheap.

Monitor InnoDB metrics continuously. Set up alerting on

Created_tmp_disk_tables

Handler_read_rnd_next

(rises with full table scans), and

Innodb_row_lock_waits

. A slow upward trend in these metrics over days or weeks tells you about degradation that won't show up in point-in-time checks:

SHOW GLOBAL STATUS LIKE 'Handler_read_rnd_next';
SHOW GLOBAL STATUS LIKE 'Innodb_row_lock_waits';
SHOW GLOBAL STATUS LIKE 'Created_tmp_disk_tables';

Run ANALYZE TABLE after bulk operations. After any large INSERT, UPDATE, or DELETE — data migrations, backfills, batch jobs — run ANALYZE on the affected tables. Stale statistics silently corrupt the optimizer's decisions for weeks until someone notices the query plans are wrong.

Audit and drop unused indexes. Unused indexes waste write performance and buffer pool space without helping reads. Query the performance schema on sw-infrarunbook-01 to find them:

SELECT object_schema, object_name, index_name
FROM performance_schema.table_io_waits_summary_by_index_usage
WHERE index_name IS NOT NULL
  AND count_star = 0
  AND object_schema NOT IN ('mysql','performance_schema','information_schema')
ORDER BY object_schema, object_name;

Note that these stats reset on server restart, so only act on indexes that show zero reads after the server has been running under normal load for at least a few days. Dropping genuinely unused indexes is free performance — fewer indexes means faster writes, smaller buffer pool footprint, and less work for the optimizer.

The pattern you'll notice across all of these causes: slow queries in MySQL are almost always diagnosable with two tools — the slow query log and EXPLAIN. The failure modes repeat. Once you've seen a missing index bring down a production database at 2 AM, you'll never push a new query without checking its execution plan first.

MySQL Slow Query Troubleshooting

Symptoms

First Step: Enable the Slow Query Log

Root Cause 1: Missing Index

Root Cause 2: Full Table Scan

Root Cause 3: Query Not Using Index

Root Cause 4: Temp Table Written to Disk

Root Cause 5: Lock Wait Timeout

Root Cause 6: Sorting Without an Index (Filesort)

Prevention

Related Articles

Frequently Asked Questions

How do I enable the MySQL slow query log without restarting the server?

What does 'Using temporary; Using filesort' mean in EXPLAIN output?

How can I identify which transaction is blocking another in MySQL?

Why is my query not using an index even though the index exists?

What is the difference between tmp_table_size and max_heap_table_size in MySQL?

Related Articles