InfraRunBook
    Back to articles

    MySQL Slow Query Troubleshooting

    Databases
    Published: Apr 11, 2026
    Updated: Apr 11, 2026

    A practical runbook for diagnosing and fixing MySQL slow queries, covering missing indexes, full table scans, temp tables on disk, lock wait timeouts, and more.

    MySQL Slow Query Troubleshooting

    Symptoms

    Your application starts responding slowly. Users complain. You SSH into sw-infrarunbook-01 and run

    SHOW FULL PROCESSLIST
    — half the connections are stuck in "Sending data" or "Sorting result" states, some for 30 seconds or more. The MySQL slow query log is filling up. CPU isn't necessarily pegged, but disk I/O is through the roof. Or maybe the opposite: the server looks idle but queries are still hanging. These are the classic signs of a MySQL performance problem that needs diagnosis right now.

    In my experience the symptoms across all causes look frustratingly similar. Here's what you typically see before you've narrowed anything down:

    • Application HTTP timeouts or 503 errors under otherwise normal load
    • SHOW FULL PROCESSLIST
      showing queries stuck in "Sending data", "Sorting result", "Waiting for table metadata lock", or "Locked"
    • Slow query log entries with
      Query_time
      values well above your threshold
    • Connection pool exhaustion — the application layer queues up waiting for a database connection that never comes back
    • Disk I/O spikes correlating with query execution, suggesting temp tables landing on disk
    • InnoDB status showing elevated lock wait times or a trail of deadlocks

    The challenge is that a query stuck in "Sending data" might be doing a full table scan, writing a temp table to disk, or waiting for a lock held by a completely unrelated transaction. You need real data before you start guessing.

    First Step: Enable the Slow Query Log

    If it's not already running, enable the slow query log at runtime — no restart needed:

    mysql -u infrarunbook-admin -p -e "
    SET GLOBAL slow_query_log = 'ON';
    SET GLOBAL long_query_time = 1;
    SET GLOBAL slow_query_log_file = '/var/log/mysql/slow.log';
    SET GLOBAL log_queries_not_using_indexes = 'ON';
    "

    With

    long_query_time = 1
    , anything taking more than one second gets logged. One second is a good starting threshold — go lower if you want to catch subtler degradation before it becomes a fire. The
    log_queries_not_using_indexes
    flag is invaluable: it catches queries that skip indexes entirely even when they complete quickly, which is a ticking time bomb on tables that are still small today.

    Once you have data, use

    mysqldumpslow
    to aggregate it:

    mysqldumpslow -s t -t 20 /var/log/mysql/slow.log

    This surfaces the top 20 queries sorted by total execution time. The worst offenders rise fast. From there, you take each query and run it through

    EXPLAIN
    — which is the tool you'll live in for the rest of this runbook.


    Root Cause 1: Missing Index

    This is the most common cause I see in production systems, especially on tables that grew organically over time. The table was small when the app launched, nobody thought about indexes, and now it has 50 million rows and a WHERE clause that scans every single one of them.

    Why it happens: When a column used in a

    WHERE
    ,
    JOIN ON
    , or
    ORDER BY
    clause has no index, MySQL has no shortcut. It reads every row in the table and evaluates the condition against each one. On a table with millions of rows, this is catastrophic — and the problem compounds as the table grows.

    How to identify it: Run

    EXPLAIN
    on the slow query:

    EXPLAIN SELECT * FROM orders WHERE customer_id = 10042\G
    *************************** 1. row ***************************
               id: 1
      select_type: SIMPLE
            table: orders
       partitions: NULL
             type: ALL
    possible_keys: NULL
              key: NULL
          key_len: NULL
              ref: NULL
             rows: 4823917
         filtered: 10.00
            Extra: Using where

    Two dead giveaways:

    type: ALL
    and
    possible_keys: NULL
    . MySQL found no candidate indexes and is scanning all 4.8 million rows. The
    rows
    column is your cost estimate — that number should make you wince.

    How to fix it:

    ALTER TABLE orders ADD INDEX idx_customer_id (customer_id);

    Run EXPLAIN again after adding the index:

    *************************** 1. row ***************************
               id: 1
      select_type: SIMPLE
            table: orders
             type: ref
    possible_keys: idx_customer_id
              key: idx_customer_id
          key_len: 4
              ref: const
             rows: 31
         filtered: 100.00
            Extra: NULL

    From 4.8 million rows examined down to 31. That's the difference between a 30-second query and a sub-millisecond one. On large tables in production, add indexes with

    ALGORITHM=INPLACE, LOCK=NONE
    to avoid blocking reads and writes during the operation:

    ALTER TABLE orders
      ADD INDEX idx_customer_id (customer_id),
      ALGORITHM=INPLACE,
      LOCK=NONE;

    Root Cause 2: Full Table Scan

    A full table scan is what happens when MySQL reads every row in a table to resolve a query. Missing indexes cause it, but it also occurs when the optimizer decides an existing index isn't worth using — or when query patterns structurally prevent index use.

    Why it happens: The MySQL query optimizer weighs the estimated cost of using an index against the cost of a sequential scan. If it estimates the query will return a large fraction of the rows — often anything above 20–30% — it may decide the full scan is actually cheaper. The optimizer's choice is driven by statistics stored in the data dictionary. Stale statistics, which happen after bulk loads or large deletes, can push it toward bad decisions even when an index would be clearly better.

    How to identify it: Look for

    type: ALL
    in EXPLAIN output, especially when you expect an index to qualify:

    EXPLAIN SELECT * FROM events WHERE status IN ('pending', 'processing', 'failed')\G
    *************************** 1. row ***************************
             type: ALL
    possible_keys: idx_status
              key: NULL
             rows: 12450000
            Extra: Using where

    Here's an interesting case: there IS an index on

    status
    , but the optimizer chose not to use it. That's because
    status
    has very low cardinality — maybe only 5 distinct values across 12 million rows. The optimizer looked at the statistics and concluded it would have to hit the index then fetch 40% of the table anyway, so it's cheaper to just scan. Sometimes it's right. You need to verify.

    Check whether stale statistics are contributing:

    SELECT table_name, last_analyzed
    FROM information_schema.tables
    WHERE table_schema = 'myapp'
      AND table_name = 'events';
    
    ANALYZE TABLE events;

    How to fix it: If statistics were stale,

    ANALYZE TABLE
    alone often corrects the plan. If the optimizer is making a genuinely bad call even with fresh statistics, you can hint it — but use hints cautiously, because they override the optimizer permanently and can backfire as data distribution shifts over time. A more durable fix is usually a covering index that includes the columns you're actually selecting, reducing the cost of the index scan enough that the optimizer picks it naturally:

    ALTER TABLE events ADD INDEX idx_status_covering (status, created_at, id);

    Root Cause 3: Query Not Using Index

    This one bites even experienced developers. The index exists. It looks like it should be used. EXPLAIN shows

    possible_keys
    has a value. But
    key
    is NULL, or worse, the query is doing a full scan with no possible keys at all despite an index on the column. The optimizer skipped it — usually for a reason you can fix once you understand it.

    Why it happens: Several patterns silently break index usage:

    • A function wraps the indexed column:
      WHERE DATE(created_at) = '2024-01-15'
      — MySQL can't perform a range scan because it doesn't know what raw values will satisfy the function
    • A leading wildcard in LIKE:
      WHERE email LIKE '%@solvethenetwork.com'
      — B-tree indexes work left to right, a leading wildcard makes them useless
    • Implicit type conversion:
      WHERE user_id = '10042'
      when
      user_id
      is an INT — MySQL coerces every row's value rather than using the index
    • The leftmost prefix rule is violated on a composite index — querying on the second column of
      INDEX(a, b)
      without filtering on
      a
      first won't use the index
    • OR conditions across different indexed columns — the optimizer may not merge the index scans efficiently

    How to identify it:

    EXPLAIN SELECT * FROM sessions WHERE DATE(created_at) = '2024-03-10'\G
    *************************** 1. row ***************************
             type: ALL
    possible_keys: NULL
              key: NULL
             rows: 8900000
            Extra: Using where

    The

    DATE()
    function wraps the column, preventing the optimizer from using the index on
    created_at
    . Even though the index exists and
    created_at
    is covered, MySQL can't use a range scan because it would have to evaluate
    DATE()
    on every row to know which ones qualify.

    How to fix it: Rewrite the query to avoid wrapping the indexed column in a function. Use a range instead:

    -- Before: index unusable
    SELECT * FROM sessions WHERE DATE(created_at) = '2024-03-10';
    
    -- After: index-friendly range scan
    SELECT * FROM sessions
    WHERE created_at >= '2024-03-10 00:00:00'
      AND created_at < '2024-03-11 00:00:00';

    For the implicit type cast issue, fix the application to pass the correct data type — don't quote integer values. For leading wildcard LIKE patterns on large tables, if you genuinely need substring search, consider MySQL's

    FULLTEXT
    index or offload that workload to a dedicated search service. A
    LIKE '%pattern'
    against a column with millions of rows simply cannot be made fast with a B-tree index.


    Root Cause 4: Temp Table Written to Disk

    This one is sneaky. Indexes look fine, the query plan seems reasonable, but it's still slow and disk I/O spikes during execution. The culprit is often an in-memory temporary table that outgrew its memory budget and spilled to disk.

    Why it happens: MySQL creates internal temporary tables when processing

    GROUP BY
    ,
    DISTINCT
    ,
    ORDER BY
    on derived tables, and certain join types. These start as in-memory MEMORY tables, limited by
    tmp_table_size
    and
    max_heap_table_size
    — whichever is smaller. The default for both is typically 16MB. If the intermediate result set grows beyond that limit, MySQL silently converts the temp table to a disk-based table and continues. Disk temp tables are orders of magnitude slower. The conversion happens without warning and is completely transparent to the query issuer.

    How to identify it: EXPLAIN flags this in the Extra column:

    EXPLAIN SELECT user_id, COUNT(*) AS cnt, SUM(amount)
    FROM transactions
    GROUP BY user_id
    ORDER BY cnt DESC\G
    *************************** 1. row ***************************
             type: ALL
             rows: 22000000
            Extra: Using temporary; Using filesort

    "Using temporary; Using filesort" together is a bad combination. You can also check system-wide counters to understand how frequently this is happening across all queries:

    SHOW GLOBAL STATUS LIKE 'Created_tmp%';
    +-------------------------+----------+
    | Variable_name           | Value    |
    +-------------------------+----------+
    | Created_tmp_disk_tables | 18423    |
    | Created_tmp_files       | 4        |
    | Created_tmp_tables      | 91205    |
    +-------------------------+----------+

    If

    Created_tmp_disk_tables
    is more than roughly 10% of
    Created_tmp_tables
    , you have a systemic problem worth addressing. I've seen servers where this ratio was above 50% — every other temporary table was hitting disk — and the engineering team had no idea because queries were still technically completing.

    How to fix it: First, try to rewrite the query. Can you restructure the GROUP BY or add an index that covers both the filter and the grouping, letting MySQL walk the index instead of materializing a temp table? If the query is unavoidable, increase the memory budget. In a session or globally on sw-infrarunbook-01:

    -- Session-level (for testing)
    SET SESSION tmp_table_size = 67108864;
    SET SESSION max_heap_table_size = 67108864;
    # /etc/mysql/mysql.conf.d/mysqld.cnf on sw-infrarunbook-01
    [mysqld]
    tmp_table_size     = 64M
    max_heap_table_size = 64M

    Don't blindly crank these values high. Both settings apply per-connection — on a busy server handling 500 concurrent connections, 64MB each means up to 32GB in potential temp table memory. Profile the specific queries first, then set these to cover the 95th percentile result set size without over-allocating.


    Root Cause 5: Lock Wait Timeout

    Lock waits are fundamentally different from the other causes here. The query isn't doing too much work — it's not doing any work. It's just sitting there waiting for another transaction to release a lock it holds. The error when patience runs out is unmistakable:

    ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

    Why it happens: InnoDB uses row-level locking, but long-running transactions hold those locks for their entire duration. While they're held, any other transaction that needs the same rows has to wait. After

    innodb_lock_wait_timeout
    seconds — defaulting to 50 — MySQL gives up and returns the error to the waiting query. The holding transaction is not affected at all; it keeps running.

    In my experience, the most common cause is a background job or batch process that opens a transaction, processes a large batch of rows, and takes its time committing — while the OLTP application queries pile up waiting for rows that batch job locked ten minutes ago. The second most common cause is a forgotten or abandoned connection holding an open transaction, sometimes from a client that crashed mid-operation.

    How to identify it: Start by finding who is blocking whom:

    SELECT
      r.trx_id            AS waiting_trx_id,
      r.trx_mysql_thread_id AS waiting_thread,
      r.trx_query         AS waiting_query,
      b.trx_id            AS blocking_trx_id,
      b.trx_mysql_thread_id AS blocking_thread,
      b.trx_query         AS blocking_query,
      b.trx_started       AS blocking_started
    FROM information_schema.innodb_lock_waits w
    INNER JOIN information_schema.innodb_trx b
      ON b.trx_id = w.blocking_trx_id
    INNER JOIN information_schema.innodb_trx r
      ON r.trx_id = w.requesting_trx_id;

    This query is invaluable during an active lock incident. It shows exactly which transaction is blocking which, when it started, and what query it's running — or

    NULL
    for the blocking query, which means the transaction did something, stopped issuing queries, and never committed. That sleeping thread holding an open transaction is your culprit.

    For deeper context, check InnoDB status:

    SHOW ENGINE INNODB STATUS\G

    Look for the TRANSACTIONS section. It shows the full lock chain, which rows are locked, and how long things have been waiting. If there was a recent deadlock, that section also shows the last detected deadlock with both transactions and the chosen victim.

    How to fix it: In an emergency, kill the blocking thread:

    KILL 42;   -- use the blocking_thread value from the query above

    That unblocks the waiting queries immediately. But killing threads is a band-aid — the real fixes are architectural. Keep transactions short and commit early. Never hold a transaction open while waiting for user input, making an external API call, or doing any non-database work. For batch jobs that process large numbers of rows, chunk the work and commit between batches of 1,000–10,000 rows rather than wrapping the entire job in a single transaction.

    If you're seeing frequent lock contention between concurrent OLTP queries, review your transaction isolation level. The default

    REPEATABLE READ
    uses next-key locks and gap locks that can be surprisingly aggressive. Switching to
    READ COMMITTED
    for read-heavy workloads reduces the lock footprint significantly — it doesn't acquire gap locks, which eliminates a whole class of phantom-lock contention:

    SET GLOBAL transaction_isolation = 'READ-COMMITTED';

    Test this change in staging first. READ COMMITTED changes the semantics of reads inside transactions, which matters if your application relies on consistent reads across multiple statements.


    Root Cause 6: Sorting Without an Index (Filesort)

    ORDER BY performance is commonly underestimated until a table gets large. When MySQL can't resolve sort order directly from an index, it performs a filesort — collecting all candidate rows into a sort buffer, sorting them in memory (or on disk if the buffer isn't large enough), then returning results. On millions of rows, this is slow.

    Why it happens: A filesort occurs when the ORDER BY columns aren't part of the index being used for the WHERE clause, or when the sort direction mixes ASC and DESC in a way that doesn't match a composite index. If the sort buffer (

    sort_buffer_size
    ) can't hold all candidate rows, the sort spills to temporary files on disk — which you'll see as disk I/O without any obvious temp table cause.

    How to identify it:

    EXPLAIN SELECT * FROM audit_log
    WHERE tenant_id = 500
    ORDER BY created_at DESC
    LIMIT 25\G
    *************************** 1. row ***************************
             type: ref
              key: idx_tenant_id
             rows: 340000
            Extra: Using index condition; Using filesort

    It's using the index for the WHERE clause but doing a filesort for ORDER BY. With 340,000 candidate rows to sort, that's expensive — and it gets worse as the tenant's data grows.

    How to fix it: A composite index covering both the filter column and the sort column lets MySQL walk the index in the right order without sorting at all:

    ALTER TABLE audit_log
      ADD INDEX idx_tenant_created (tenant_id, created_at DESC);
    

    After adding this index, EXPLAIN should show

    Using index
    and the filesort is eliminated. The query cost drops from O(n log n) sort to a simple index range scan with an early LIMIT stop.


    Prevention

    Fixing slow queries reactively is exhausting and risky. The real goal is catching problems before they make it to production. Here's what actually works.

    Keep the slow query log on permanently. Set

    long_query_time = 1
    and
    log_queries_not_using_indexes = ON
    in production and review it weekly — don't wait for an incident. Slow queries rarely appear suddenly; they creep up as data grows. A query that ran in 200ms at 1 million rows will run in 30 seconds at 50 million rows.

    Use pt-query-digest for real analysis. Percona Toolkit's

    pt-query-digest
    does what
    mysqldumpslow
    can't — it clusters similar queries, shows execution time distributions, and flags queries with inconsistent timing (a sign of lock interference rather than algorithmic slowness):

    pt-query-digest /var/log/mysql/slow.log --limit 20 --since 24h

    Run EXPLAIN before every query that touches a large table. Add this to your code review checklist. Any new query against a table over 100,000 rows should have the EXPLAIN output documented in the PR. This catches issues before they ship, when fixing them is cheap.

    Monitor InnoDB metrics continuously. Set up alerting on

    Created_tmp_disk_tables
    ,
    Handler_read_rnd_next
    (rises with full table scans), and
    Innodb_row_lock_waits
    . A slow upward trend in these metrics over days or weeks tells you about degradation that won't show up in point-in-time checks:

    SHOW GLOBAL STATUS LIKE 'Handler_read_rnd_next';
    SHOW GLOBAL STATUS LIKE 'Innodb_row_lock_waits';
    SHOW GLOBAL STATUS LIKE 'Created_tmp_disk_tables';

    Run ANALYZE TABLE after bulk operations. After any large INSERT, UPDATE, or DELETE — data migrations, backfills, batch jobs — run ANALYZE on the affected tables. Stale statistics silently corrupt the optimizer's decisions for weeks until someone notices the query plans are wrong.

    Audit and drop unused indexes. Unused indexes waste write performance and buffer pool space without helping reads. Query the performance schema on sw-infrarunbook-01 to find them:

    SELECT object_schema, object_name, index_name
    FROM performance_schema.table_io_waits_summary_by_index_usage
    WHERE index_name IS NOT NULL
      AND count_star = 0
      AND object_schema NOT IN ('mysql','performance_schema','information_schema')
    ORDER BY object_schema, object_name;

    Note that these stats reset on server restart, so only act on indexes that show zero reads after the server has been running under normal load for at least a few days. Dropping genuinely unused indexes is free performance — fewer indexes means faster writes, smaller buffer pool footprint, and less work for the optimizer.

    The pattern you'll notice across all of these causes: slow queries in MySQL are almost always diagnosable with two tools — the slow query log and EXPLAIN. The failure modes repeat. Once you've seen a missing index bring down a production database at 2 AM, you'll never push a new query without checking its execution plan first.

    Frequently Asked Questions

    How do I enable the MySQL slow query log without restarting the server?

    You can enable it at runtime with SET GLOBAL slow_query_log = 'ON'; SET GLOBAL long_query_time = 1; SET GLOBAL slow_query_log_file = '/var/log/mysql/slow.log';. This takes effect immediately without a restart and persists until the server is restarted. To make it permanent, add the settings to your my.cnf file.

    What does 'Using temporary; Using filesort' mean in EXPLAIN output?

    It means MySQL had to create an internal temporary table to resolve the query (commonly for GROUP BY or DISTINCT) and then sort the results separately. If the temporary table exceeds tmp_table_size or max_heap_table_size, it spills to disk, causing a significant performance hit. Rewriting the query to use a covering composite index often eliminates both.

    How can I identify which transaction is blocking another in MySQL?

    Query the information_schema: SELECT r.trx_mysql_thread_id AS waiting_thread, b.trx_mysql_thread_id AS blocking_thread, b.trx_query AS blocking_query FROM information_schema.innodb_lock_waits w JOIN information_schema.innodb_trx b ON b.trx_id = w.blocking_trx_id JOIN information_schema.innodb_trx r ON r.trx_id = w.requesting_trx_id; This shows you exactly which thread is holding the lock and what query it last ran.

    Why is my query not using an index even though the index exists?

    Several patterns prevent index use: wrapping the indexed column in a function (WHERE DATE(col) = ...), using a leading wildcard in LIKE ('%value'), implicit type mismatches (comparing an INT column to a string), or violating the leftmost prefix rule on a composite index. Rewrite the query to reference the column directly without transformation, and verify the data types match on both sides of the comparison.

    What is the difference between tmp_table_size and max_heap_table_size in MySQL?

    Both settings cap the size of in-memory temporary tables, and MySQL uses whichever is smaller. tmp_table_size controls the maximum size for internal temporary tables created during query execution. max_heap_table_size controls the maximum size for user-created MEMORY tables but also applies to internal temp tables. You must set both to the same value for the intended limit to take effect — setting only tmp_table_size while leaving max_heap_table_size at the default 16MB means internal temp tables are still capped at 16MB.

    Related Articles