InfraRunBook
    Back to articles

    Elasticsearch Index Not Created

    Logging
    Published: Apr 15, 2026
    Updated: Apr 15, 2026

    Elasticsearch refusing to create an index can stem from missing templates, mapping conflicts, disk watermarks, ILM errors, or permission failures. This runbook walks through every root cause with real commands and fixes.

    Elasticsearch Index Not Created

    Symptoms

    You've just pushed data into Elasticsearch — maybe through Logstash, Filebeat, or a direct API call — and nothing shows up. The index simply doesn't exist. Running

    GET _cat/indices?v
    returns either a blank list or completely omits the index name you expected. Kibana's Discover tab greets you with "No results found." Your application logs are filling up with connection errors or 403/400 responses from the Elasticsearch REST API.

    The specific error messages vary depending on the root cause, but here are the most common ones you'll encounter in the wild:

    • HTTP 400:
      mapper_parsing_exception
      when an incoming document contradicts the index mapping
    • HTTP 403:
      security_exception
      with
      action [indices:admin/create] is unauthorized
    • HTTP 400:
      illegal_argument_exception
      around field types or alias configuration
    • No error at all — the write silently fails because Logstash swallowed the exception and kept retrying into a dead end

    Before chasing any specific cause, get a baseline snapshot of your cluster state with these two commands:

    GET _cluster/health?pretty
    GET _cat/indices?v&h=health,status,index,pri,rep,docs.count,store.size

    If cluster health is red, index creation will fail universally. Yellow means some replicas are unassigned — that usually won't block new index creation, but it's worth noting before you proceed. Now let's walk through every root cause I've seen in production environments.


    Root Cause 1: Index Template Missing

    Why It Happens

    Elasticsearch uses index templates to define settings and mappings for indices that match a given name pattern. When you rely on a template to configure shard counts, replica counts, or field mappings — and that template doesn't exist — Elasticsearch either falls back to system defaults (which may not match your expectations) or, when

    action.auto_create_index
    is disabled cluster-wide, refuses to create the index entirely. In my experience, this bites teams hardest right after a cluster migration or snapshot restore where templates weren't included in the restore scope. Someone restores the data but forgets
    include_global_state: true
    , and the templates vanish silently.

    How to Identify It

    Check which templates exist and whether any match your target index name pattern:

    GET _index_template?pretty
    GET _index_template/logs-*

    If the second call returns a 404 or an empty result set, your template is gone. You can also probe the legacy template API, which some older pipelines still rely on:

    GET _template?pretty

    If the template exists but the index still isn't being created, verify that the template's

    index_patterns
    field actually matches your index name. I've seen cases where someone changed the index naming convention from
    logs-app-2024.01.01
    to
    app-logs-2024.01.01
    and forgot the template pattern only covered
    logs-*
    . The template is present, the pattern is wrong, and the index gets no settings applied — or doesn't get created at all.

    Simulate template application without actually creating the index to confirm what settings it would receive:

    POST _index_template/_simulate_index/logs-app-2024.01.01

    How to Fix It

    Re-create the template. Here's a minimal working example for a logs index template:

    PUT _index_template/logs-template
    {
      "index_patterns": ["logs-*"],
      "priority": 100,
      "template": {
        "settings": {
          "number_of_shards": 1,
          "number_of_replicas": 1
        },
        "mappings": {
          "properties": {
            "@timestamp": { "type": "date" },
            "message": { "type": "text" },
            "level": { "type": "keyword" },
            "host": { "type": "keyword" }
          }
        }
      }
    }

    After creating the template, trigger a manual index creation to confirm it applies cleanly:

    PUT logs-test-verify-001
    GET logs-test-verify-001/_settings
    DELETE logs-test-verify-001

    If you're running a snapshot-restore workflow, make sure templates are included going forward. The key flag is

    include_global_state
    :

    POST _snapshot/my_backup/snapshot_1/_restore
    {
      "include_global_state": true
    }

    Store your templates in version control and apply them via a bootstrap script during cluster provisioning. Treat them like infrastructure code — because they are.


    Root Cause 2: Mapping Conflict

    Why It Happens

    Elasticsearch is strict about field types once a mapping is established. If you try to create an index — or write data to an auto-created index — and the inferred or declared field type conflicts with an existing template mapping, you'll get a

    mapper_parsing_exception
    and the index creation will be rejected. This happens most often when a field that's been mapped as
    integer
    in the template suddenly receives a string value like
    N/A
    , or when a field mapped as
    keyword
    receives a nested object. Dynamic mapping makes this worse because what worked on your first document might fail on your hundredth when the data shape changes.

    How to Identify It

    The error from Elasticsearch is usually explicit about which field is the problem:

    {
      "error": {
        "root_cause": [
          {
            "type": "mapper_parsing_exception",
            "reason": "failed to parse field [response_code] of type [integer] in document with id '1'. Preview of field's value: 'N/A'"
          }
        ],
        "type": "mapper_parsing_exception",
        "reason": "failed to parse field [response_code] of type [integer] in document with id '1'. Preview of field's value: 'N/A'"
      },
      "status": 400
    }

    To see the current effective mapping that a new index would inherit from a template, use the simulate API and compare it against what your data actually looks like:

    POST _index_template/_simulate_index/logs-app-2024.01.01
    GET logs-existing-index/_mapping

    How to Fix It

    The fix depends on which side of the conflict is wrong. If your data is correct and the template mapping is too strict, update the template to use a compatible type. Switching a numeric field to

    keyword
    is a common and safe resolution when the field contains mixed content:

    PUT _index_template/logs-template
    {
      "index_patterns": ["logs-*"],
      "priority": 100,
      "template": {
        "mappings": {
          "properties": {
            "response_code": { "type": "keyword" }
          }
        }
      }
    }

    Note that you can't change the mapping on an existing index after the fact — you'd need to reindex the data into a new index with the corrected mapping:

    POST _reindex
    {
      "source": { "index": "logs-app-old" },
      "dest": { "index": "logs-app-corrected" }
    }

    If the data itself is malformed upstream, fix it at the source rather than hacking around it in Elasticsearch. In Logstash, use a

    mutate
    filter to cast or sanitize the field. In Filebeat, use a processor. Pushing bad data and relying on the index to tolerate it is a path to more pain later.


    Root Cause 3: Disk Watermark Reached

    Why It Happens

    Elasticsearch has built-in disk-based shard allocation thresholds that kick in automatically. When any node crosses the high watermark (default: 90% disk used), Elasticsearch stops allocating new shards to that node. When the flood stage watermark is hit (default: 95%), it goes further and enforces a read-only index block on all indices assigned to that node. At that point, new indices can't be created and existing ones can't be written to. The cluster isn't broken — it's protecting your data from corruption due to a full disk — but the effect from the application side looks like a total write outage.

    How to Identify It

    This one's fast to diagnose. Check disk usage across all nodes:

    GET _cat/allocation?v&h=node,disk.used,disk.avail,disk.percent,shards

    Example output showing a node over the threshold:

    node                disk.used  disk.avail  disk.percent  shards
    sw-infrarunbook-01  450gb      50gb        90%           120
    es-data-02          200gb      300gb       40%            80

    Also check whether any index has been placed into read-only mode by the flood stage watermark trigger:

    GET logs-app-2024.01.01/_settings
    
    # You'll see this in the response if it's been locked:
    # "index.blocks.read_only_allow_delete": "true"

    To see what watermark thresholds are currently in effect:

    GET _cluster/settings?include_defaults=true&filter_path=**.watermark

    How to Fix It

    First, actually free up disk space — delete old indices, move data to cheaper storage, or expand the disk. Running

    GET _cat/indices?v&s=store.size:desc
    will show you the largest indices to target. Once disk usage is back below the high watermark, remove the read-only block:

    # Remove from a specific index
    PUT logs-app-2024.01.01/_settings
    {
      "index.blocks.read_only_allow_delete": null
    }
    
    # Or remove from all indices at once
    PUT _all/_settings
    {
      "index.blocks.read_only_allow_delete": null
    }

    As a temporary measure during an active incident — not a permanent fix — you can raise the watermark thresholds to give yourself breathing room while you sort out the disk situation:

    PUT _cluster/settings
    {
      "transient": {
        "cluster.routing.allocation.disk.watermark.low": "92%",
        "cluster.routing.allocation.disk.watermark.high": "95%",
        "cluster.routing.allocation.disk.watermark.flood_stage": "97%"
      }
    }

    Use

    transient
    here deliberately — transient settings don't survive a cluster restart, which is exactly the behavior you want for an emergency override. Reset them once the disk situation is resolved.


    Root Cause 4: ILM Policy Error

    Why It Happens

    Index Lifecycle Management is powerful when it works. When it doesn't, it can silently block index creation in ways that are genuinely confusing to debug. The most common failure mode is a broken rollover configuration: your data stream or rollover alias is attached to an ILM policy, but either the policy doesn't exist, the rollover alias is misconfigured, or the ILM step has entered an error state mid-cycle. When ILM can't roll over the current write index to a new one, writes stall. New documents have nowhere to land, and the index that should have been created for the current time period simply never gets made.

    How to Identify It

    Start by checking the overall ILM status and the specific policy:

    GET _ilm/status
    GET _ilm/policy/logs-policy

    Then use the explain API to see what ILM is actually doing with your index:

    GET logs-app-000001/_ilm/explain

    A stuck ILM phase will show

    step: ERROR
    in the response, along with a
    failed_step
    and a
    step_info
    block explaining the cause:

    {
      "indices": {
        "logs-app-000001": {
          "index": "logs-app-000001",
          "managed": true,
          "policy": "logs-policy",
          "phase": "hot",
          "action": "rollover",
          "step": "ERROR",
          "failed_step": "check-rollover-ready",
          "step_info": {
            "type": "illegal_argument_exception",
            "reason": "index.lifecycle.rollover_alias [logs-app] does not point to index [logs-app-000001]"
          }
        }
      }
    }

    How to Fix It

    If the rollover alias is missing or misconfigured, re-create it with the correct

    is_write_index
    flag:

    POST _aliases
    {
      "actions": [
        {
          "add": {
            "index": "logs-app-000001",
            "alias": "logs-app",
            "is_write_index": true
          }
        }
      ]
    }

    If the ILM policy itself has a configuration problem, update it with corrected parameters:

    PUT _ilm/policy/logs-policy
    {
      "policy": {
        "phases": {
          "hot": {
            "actions": {
              "rollover": {
                "max_size": "50gb",
                "max_age": "7d"
              }
            }
          },
          "delete": {
            "min_age": "30d",
            "actions": {
              "delete": {}
            }
          }
        }
      }
    }

    After fixing the underlying cause, move the stuck index out of its ERROR step and back into the retry path:

    POST _ilm/move/logs-app-000001
    {
      "current_step": {
        "phase": "hot",
        "action": "rollover",
        "name": "ERROR"
      },
      "next_step": {
        "phase": "hot",
        "action": "rollover",
        "name": "check-rollover-ready"
      }
    }

    ILM runs on a polling interval (default: 10 minutes). If you need it to pick up the fix immediately, you can trigger a manual poll:

    POST _ilm/start

    Root Cause 5: Insufficient Permissions

    Why It Happens

    If you're running Elasticsearch with security features enabled — and you should be, especially in any environment that touches real data — every API call requires authentication and authorization. A service account that has

    read
    privileges on an index pattern won't be able to create indices under it. This shows up constantly in environments where the Elasticsearch security configuration was recently tightened, or where a new ingestion pipeline was deployed without verifying that its service user actually has the
    create_index
    privilege. It also appears after role changes: someone updates the role definition to remove a privilege and forgets that three pipelines depend on it.

    How to Identify It

    The error is usually direct and explicit:

    {
      "error": {
        "root_cause": [
          {
            "type": "security_exception",
            "reason": "action [indices:admin/create] is unauthorized for user [infrarunbook-admin] with roles [logs-read-only] on indices [logs-app-2024.01.01], this action is granted by the index privileges [create_index,manage,all]"
          }
        ],
        "type": "security_exception",
        "status": 403
      }
    }

    Verify what roles a user currently has and what those roles actually grant:

    GET _security/user/infrarunbook-admin
    GET _security/role/logs-read-only

    You can also use the has-privileges API to test permissions directly without guessing:

    POST _security/user/infrarunbook-admin/_has_privileges
    {
      "index": [
        {
          "names": ["logs-*"],
          "privileges": ["create_index", "write", "manage"]
        }
      ]
    }

    The response will tell you exactly which privileges the user has and which they're missing, field by field.

    How to Fix It

    Create or update a role with the correct index privileges, then assign it to the user or service account:

    PUT _security/role/logs-writer
    {
      "indices": [
        {
          "names": ["logs-*"],
          "privileges": ["create_index", "write", "manage", "read"]
        }
      ]
    }
    
    PUT _security/user/infrarunbook-admin
    {
      "roles": ["logs-writer"]
    }

    If you're using API keys instead of user accounts — which is the recommended pattern for Filebeat, Logstash, and other pipeline tools — generate a key with explicit role descriptors scoped to the minimum required access:

    POST _security/api_key
    {
      "name": "logstash-ingest-key",
      "role_descriptors": {
        "logs-writer": {
          "indices": [
            {
              "names": ["logs-*"],
              "privileges": ["create_index", "write", "manage", "read"]
            }
          ]
        }
      }
    }

    Store the returned API key in your secrets manager and configure it in your ingestion pipeline. Don't use the

    elastic
    superuser for ingestion pipelines — it's an operational anti-pattern and a serious security risk. If that key is ever leaked, your entire cluster is exposed.


    Root Cause 6: Cluster Health Is Red

    Why It Happens

    A red cluster health means at least one primary shard is unassigned. Elasticsearch won't create new indices when it can't guarantee shard placement across the cluster. This is often a cascading failure — a data node goes down, its primary shards become unassigned, and suddenly nothing new can be written anywhere. The cluster itself is in a degraded state and refuses to take on more work until the shard allocation problem is resolved.

    How to Identify It

    GET _cluster/health?pretty
    GET _cat/shards?v&h=index,shard,prirep,state,unassigned.reason&s=state

    Look for shards in

    UNASSIGNED
    state with a reason of
    NODE_LEFT
    or
    ALLOCATION_FAILED
    . The cluster allocation explain API gives the most detailed answer on why a specific shard won't assign:

    GET _cluster/allocation/explain
    {
      "index": "logs-app-000001",
      "shard": 0,
      "primary": true
    }

    How to Fix It

    Bring the missing node back online if it's recoverable. If the node is permanently lost and you need to force-allocate the primary shard, be aware this is a data-loss operation — you're telling Elasticsearch to treat a shard as empty rather than wait for the original copy:

    POST _cluster/reroute
    {
      "commands": [
        {
          "allocate_empty_primary": {
            "index": "logs-app-000001",
            "shard": 0,
            "node": "sw-infrarunbook-01",
            "accept_data_loss": true
          }
        }
      ]
    }

    Only run this command when you have confirmed the original node is gone for good and you've exhausted recovery options. For logging data the risk is usually acceptable; for transactional data it is not.


    Prevention

    Most of these failures are entirely preventable with a few operational habits baked into your workflow.

    Monitor disk usage proactively. Set alerts when any node crosses 75% disk utilization — by the time you're debugging at 90%, you're already in crisis mode. A Prometheus alert using the Elasticsearch exporter catches this early:

    - alert: ElasticsearchDiskHigh
      expr: elasticsearch_filesystem_data_available_bytes / elasticsearch_filesystem_data_size_bytes < 0.20
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Elasticsearch node {{ $labels.node }} disk above 80%"

    Keep your index templates and ILM policies in version control. Treat them like code, because they are. Use a CI/CD pipeline to push templates to your cluster on every deployment, so a cluster restore or rebuild never leaves you without the templates your pipelines expect. A simple bootstrap script that does a

    PUT _index_template/
    for each file in your
    elasticsearch/templates/
    directory takes thirty minutes to write and saves hours of debugging.

    Use dedicated service accounts for each ingestion pipeline, scoped to the minimum required index patterns and privileges. This makes permission failures trivially traceable — if the

    filebeat-ingest-key
    is getting 403s, you know exactly which role to check. Don't reuse the same credentials across multiple pipelines.

    Set up ILM monitoring. The ILM explain API can be polled programmatically to detect stuck lifecycle steps before they cascade into write failures. A cron job that queries

    GET */_ilm/explain
    and alerts on any index with
    step: ERROR
    will catch rollover failures long before they manifest as missing indices.

    Test your index creation path in staging after any cluster configuration change — security policy update, template modification, ILM policy change, node resize. A quick integration test that writes a document through your normal ingestion path and verifies the index was created takes minutes to run and catches the majority of these failures before they reach production.

    Finally, audit your

    auto_create_index
    setting and make it intentional:

    GET _cluster/settings?include_defaults=true&filter_path=**.auto_create_index

    In most production environments, either disable it entirely or constrain it to specific patterns. Allowing unrestricted auto-creation can mask template misconfigurations — your index gets created but with wrong settings — and you don't find out until you run a query that returns garbage data or hits a mapping exception three weeks later.

    Frequently Asked Questions

    Why does Elasticsearch not create an index even though auto_create_index is enabled?

    Auto-creation can still be blocked by disk watermarks, cluster red health, mapping conflicts in an applied template, or security policies that deny the create_index privilege to the requesting user or API key. Check cluster health and disk allocation first, then verify the requesting principal's permissions.

    How do I tell if an Elasticsearch disk watermark is blocking index creation?

    Run GET _cat/allocation?v to see disk usage per node. If any node is above 90%, the high watermark has been breached and new shard allocation is blocked. Check for read-only index blocks with GET <index>/_settings and look for index.blocks.read_only_allow_delete set to true.

    Can a broken ILM policy prevent a new Elasticsearch index from being created?

    Yes. When ILM fails to roll over the current write index to a new one — because of a missing rollover alias, a misconfigured policy, or a step stuck in ERROR state — the new index that should receive writes never gets created. Use GET <index>/_ilm/explain to diagnose the exact failed step.

    What Elasticsearch privileges does a user need to create an index?

    At minimum, the create_index privilege on the target index pattern. The manage privilege also covers it, as does all. You can test what a specific user has with POST _security/user/<username>/_has_privileges, passing the index name and privilege list you want to check.

    How do I fix an Elasticsearch index stuck in read-only mode due to disk watermark?

    First free up disk space by deleting old indices or expanding storage. Then remove the read-only block: PUT _all/_settings with {"index.blocks.read_only_allow_delete": null}. If you need immediate relief, temporarily raise the flood stage watermark using a transient cluster setting and reset it once disk is under control.

    Related Articles