InfraRunBook
    Back to articles

    Loki Ingestion Issues

    Logging
    Published: Apr 16, 2026
    Updated: Apr 16, 2026

    Diagnose and fix the most common Loki log ingestion failures, from Promtail not running to rate limit errors, out-of-order entries, and label mismatches.

    Loki Ingestion Issues

    Symptoms

    You open Grafana, switch to the Explore view, select your Loki datasource, and run a query that normally returns thousands of lines — and you get nothing. No entries. Not even an error, just silence. Or maybe you are getting an error:

    context deadline exceeded
    or
    no org id
    staring back at you. Either way, logs that should be flowing into Loki aren't arriving, and you need to figure out why quickly.

    The most common symptoms when Loki ingestion breaks down:

    • Grafana Explore returns empty results for queries that previously worked fine
    • Log-based alerts stop firing even though the underlying condition is actively occurring
    • Promtail logs fill with repeated error messages like
      429 Too Many Requests
      or
      connection refused
    • The Loki
      /metrics
      endpoint shows
      loki_distributor_lines_received_total
      has stopped incrementing
    • logcli
      returns zero results for streams you know are active
    • Application teams report that their structured logs vanished from dashboards with no obvious explanation

    The tricky thing about Loki ingestion failures is that they're often silent. Promtail keeps running, applications keep writing logs, but nothing makes it through the pipeline. Everything looks healthy on the surface until you actually query for data. This article walks through the most common culprits, how to identify each one precisely, and exactly how to fix them.


    Root Cause 1: Promtail Not Running

    This is the obvious one, but it bites people more often than they'd like to admit. Promtail is the agent that scrapes log files and ships them to Loki. If it's not running, nothing gets ingested — full stop. In my experience, this happens most frequently right after a system reboot that didn't restore the service, after a package update that silently replaced the unit file, or after a config change that introduced a syntax error and prevented the service from starting cleanly.

    To check whether Promtail is actually running on

    sw-infrarunbook-01
    :

    systemctl status promtail.service

    If it's failed or inactive, you'll see something like:

    ● promtail.service - Promtail log shipping agent
         Loaded: loaded (/etc/systemd/system/promtail.service; enabled; vendor preset: enabled)
         Active: failed (Result: exit-code) since Wed 2026-04-16 09:12:44 UTC; 3min 22s ago
        Process: 4821 ExecStart=/usr/local/bin/promtail -config.file=/etc/promtail/config.yaml (code=exited, status=1/FAILURE)
       Main PID: 4821 (code=exited, status=1/FAILURE)

    Pull the actual error from the journal to understand why it failed:

    journalctl -u promtail.service -n 50 --no-pager

    A config syntax error produces output like this in the journal:

    Apr 16 09:12:44 sw-infrarunbook-01 promtail[4821]: level=error ts=2026-04-16T09:12:44.312Z caller=main.go:67 msg="error creating promtail" err="invalid config: yaml: line 34: mapping values are not allowed in this context"

    Always validate your config before attempting a restart — saves you from bouncing the service only to have it fail again immediately:

    promtail -config.file=/etc/promtail/config.yaml -check-syntax

    Once the config passes validation, enable and start the service:

    systemctl enable promtail.service
    systemctl start promtail.service
    systemctl status promtail.service

    If Promtail keeps crashing in a loop even with a valid config, check file permissions on the log paths it's trying to tail. The user Promtail runs as must have read access to every file it's watching. A quick sanity check:

    stat /var/log/syslog
    ls -la /var/log/app/*.log

    If Promtail runs as the

    promtail
    system user and the log files are owned by
    root
    with mode
    640
    , it can't read them. Depending on the version, it may fail silently rather than throwing an obvious permission denied error. Either add
    promtail
    to the appropriate group, or adjust the ACLs on the log directory.


    Root Cause 2: Wrong Loki Endpoint

    Promtail is running fine, but it's pointed at the wrong address. This is surprisingly common in environments where Loki was recently migrated to a new server, placed behind a reverse proxy, or switched from HTTP to HTTPS. The agent happily attempts to ship logs into the void and, depending on the error, may not complain loudly enough for anyone to notice until there's a gap in the data.

    The Loki push endpoint in Promtail's config lives under the

    clients
    block:

    clients:
      - url: http://192.168.10.25:3100/loki/api/v1/push

    Common mistakes I've seen: using port

    3100
    when Loki is actually behind nginx on port
    80
    , omitting the
    /loki/api/v1/push
    path suffix and pointing at the root URL, pointing at a dev Loki instance instead of production, or using an IP address that changed after a VM migration or re-IP event. The best way to catch this quickly is to look at Promtail's own logs for connection errors:

    journalctl -u promtail.service --since "10 minutes ago" | grep -iE "error|failed|refused|404|429"

    A wrong endpoint will produce output like:

    level=warn ts=2026-04-16T09:45:11.803Z caller=client.go:349 component=client host=192.168.10.99:3100 msg="error sending batch, will retry" status=0 err="Post \"http://192.168.10.99:3100/loki/api/v1/push\": dial tcp 192.168.10.99:3100: connect: connection refused"

    Or if the host resolves but the path is wrong:

    level=warn ts=2026-04-16T09:45:11.803Z caller=client.go:349 component=client host=192.168.10.25:3100 msg="error sending batch, will retry" status=404 err="server returned HTTP status 404 Not Found"

    Verify Loki is actually reachable at the expected address from the Promtail host before editing any config:

    curl -v http://192.168.10.25:3100/ready

    If Loki is running properly, you'll get back a

    200 OK
    with the body
    ready
    . If you're getting a 404 specifically on the push endpoint, also check whether you're running Loki behind a reverse proxy with a path prefix. Some configurations serve Loki at
    /loki/
    on the proxy, which means the full push URL becomes
    http://192.168.10.25:80/loki/loki/api/v1/push
    — yes, with
    loki
    doubled. It looks wrong, but it's correct in that scenario.

    Fix the URL in

    /etc/promtail/config.yaml
    , then restart Promtail and tail the logs to confirm entries are flowing:

    systemctl restart promtail.service
    journalctl -u promtail.service -f

    You should see lines like

    msg="successfully sent batch"
    appearing within seconds if the endpoint is now correct and Loki is reachable.


    Root Cause 3: Label Mismatch

    Loki's data model is fundamentally label-based. Every log stream is uniquely identified by a set of labels, and those labels define the stream's identity permanently once created. When Promtail ships a batch of logs with a label set that conflicts with how Loki has an existing stream structured — or when a pipeline stage produces unexpected label combinations — you'll end up with rejected entries, duplicate streams, or query results that don't match what you expect.

    One common scenario: you rename a label in your Promtail pipeline stages partway through the day, say changing

    job
    to
    app
    . Loki already has an open stream for that log source with the old label schema. The new batches arrive with a different label fingerprint, and instead of appending cleanly to the existing stream, Loki creates a new one. Old queries using
    {job="webserver"}
    stop returning new data because the stream has effectively diverged.

    Another classic problem is high-cardinality labels — including a user ID, request UUID, pod IP, or any other value that changes per-request as a Loki label. This explodes the number of streams Loki has to manage, eventually triggering limits and causing ingestion to stall entirely.

    To see what labels Promtail is actually attaching, enable debug logging temporarily:

    promtail -config.file=/etc/promtail/config.yaml -log.level=debug 2>&1 | grep -i "labels\|stream"

    You can also query Loki directly to see the full label set it has indexed:

    curl http://192.168.10.25:3100/loki/api/v1/labels | jq .
    {
      "status": "success",
      "data": [
        "app",
        "env",
        "host",
        "job"
      ]
    }

    To detect a cardinality problem, check the number of unique values for a label that should be low-cardinality:

    curl "http://192.168.10.25:3100/loki/api/v1/label/host/values" | jq '.data | length'

    If that returns hundreds or thousands for a label that should have a handful, you've found the problem. Fix it in the Promtail pipeline stage by dropping the high-cardinality label and keeping that value inside the log line body as a structured field instead:

    pipeline_stages:
      - json:
          expressions:
            request_id: request_id
            level: level
      - labels:
          app:
          env:
          level:
      # request_id stays in the log body — never make it a label

    After fixing the label config, restart Promtail. Cleaning up existing high-cardinality streams in Loki requires either using the Loki admin delete API or waiting for the retention period to expire and compact them away.


    Root Cause 4: Out of Order Entries

    This one trips up engineers who are shipping logs from multiple sources, replaying archived log data, or dealing with hosts that have drifted clocks. Loki enforces that within a single stream, log entries must arrive in strictly monotonically increasing timestamp order. If an entry arrives with a timestamp older than the most recently accepted entry for that stream, Loki rejects it outright.

    Why does this happen in practice? NTP drift between hosts producing logs, log files being re-read from the beginning by Promtail after a crash or restart that wiped the position file, log aggregators that buffer and reorder entries before forwarding, and manual imports of archived log data are all common triggers. The error message in Promtail's logs is descriptive when this is the cause:

    level=warn ts=2026-04-16T10:03:22.194Z caller=client.go:349 component=client host=192.168.10.25:3100 msg="error sending batch, will retry" status=400 err="rpc error: code = Code(400) desc = entry for stream '{app=\"webserver\", env=\"prod\", host=\"sw-infrarunbook-01\"}' has timestamp too old: 2026-04-16T09:55:01Z, oldest acceptable timestamp is 2026-04-16T10:03:00Z"

    HTTP status

    400
    combined with the phrase "timestamp too old" or "entry out of order" makes this unambiguous. Note that unlike rate limit errors (which return 429 and trigger retries), out-of-order rejections are permanent — those log entries are lost unless you can re-deliver them after adjusting Loki's tolerance window.

    On the Loki server side, you can widen the acceptance window by adjusting

    reject_old_samples_max_age
    in
    limits_config
    :

    limits_config:
      reject_old_samples: true
      reject_old_samples_max_age: 168h

    Be careful — setting this too loosely can mask genuine clock problems and make future debugging harder. Use the narrowest window that fixes the actual symptom.

    On the Promtail side, make sure the position file is on a persistent path so it survives restarts without re-reading log files from the beginning:

    positions:
      filename: /var/lib/promtail/positions.yaml

    If the positions file gets deleted — which happens during certain upgrade procedures or aggressive

    /var/lib
    cleanups — Promtail starts reading every configured log file from offset zero on next start. For active high-volume logs, this floods Loki with entries that are days old, arriving far out of order relative to what's already been ingested. Restore the positions file from backup if you have one, or add
    tail_from_end: true
    to your scrape configs to start from the current tail on first read rather than the beginning.

    For NTP drift issues, verify clock sync on all log-producing hosts:

    timedatectl status
    chronyc tracking

    If the offset is more than a second or two, fix NTP synchronization first. Everything downstream of a drifted clock becomes unreliable.


    Root Cause 5: Rate Limit Hit

    Loki ships with ingestion rate limits enabled by default, and they're conservative by design. Under steady-state operations you won't hit them. The moment you deploy a newly verbose microservice, start tailing a debug log that's producing ten thousand lines per second, kick off a batch job that dumps gigabytes of application output, or add ten new hosts all pointing at the same Loki instance simultaneously — you'll slam into the limits and start dropping entries.

    The error in Promtail's logs when rate limits are hit is hard to miss:

    level=warn ts=2026-04-16T11:22:05.441Z caller=client.go:349 component=client host=192.168.10.25:3100 msg="error sending batch, will retry" status=429 err="rpc error: code = Code(429) desc = ingestion rate limit (4194304 bytes/s) exceeded while adding 65536 bytes for user 'fake', reduce log volume or contact your Loki administrator"

    Status

    429
    with the phrase "ingestion rate limit exceeded" is the key. Unlike 400 errors for out-of-order entries, Promtail will retry 429 responses with backoff — so in some cases the logs will eventually make it through if the rate normalizes. But if the elevated rate persists, Promtail's internal batch buffer fills up, retries accumulate, and you start losing entries permanently.

    Check the current rate limit configuration on the Loki server:

    grep -A 10 'limits_config' /etc/loki/config.yaml
    limits_config:
      ingestion_rate_mb: 4
      ingestion_burst_size_mb: 6
      per_stream_rate_limit: 3MB
      per_stream_rate_limit_burst: 15MB

    The default

    ingestion_rate_mb
    of 4 MB/s is per tenant in multi-tenant mode, or global in single-tenant mode. For anything beyond a handful of lightly active services, that ceiling is low. Raise it to match your actual volume with some headroom:

    limits_config:
      ingestion_rate_mb: 32
      ingestion_burst_size_mb: 64
      per_stream_rate_limit: 10MB
      per_stream_rate_limit_burst: 30MB

    Restart Loki after changing its config and confirm it came up cleanly:

    systemctl restart loki.service
    journalctl -u loki.service -n 20 --no-pager

    Raising limits is the fast fix, but you should also find out why the rate spiked in the first place. Identify which stream is responsible by checking Loki's metrics endpoint:

    curl -s http://192.168.10.25:3100/metrics | grep loki_distributor_bytes_received_total

    If one specific app or host label is generating the overwhelming bulk of the volume, consider adding a Promtail pipeline stage to drop unnecessary log levels before they ever reach Loki. This is cleaner than just opening up the rate limit ceiling:

    pipeline_stages:
      - match:
          selector: '{app="verbose-service"}'
          stages:
            - drop:
                expression: ".*level=debug.*"
            - drop:
                expression: ".*level=trace.*"

    Drop noisy log levels at the agent, not at query time. Query-time filtering doesn't reduce your ingestion cost — the data is already in Loki eating storage and bandwidth.


    Root Cause 6: Filesystem Full on the Loki Host

    Simple and brutal. If the disk where Loki stores its chunks and index fills up, ingestion stops. Depending on your Loki version, the error messages can be surprisingly unhelpful — sometimes appearing as generic internal server errors rather than clearly pointing to disk exhaustion. This one tends to cause confusion because the root cause is completely outside the Loki configuration itself.

    Check disk usage on the Loki storage path:

    df -h /var/loki
    du -sh /var/loki/chunks /var/loki/index /var/loki/boltdb-shipper-active
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/sdb1        50G   49G  512M  99% /var/loki

    If you're at 99%, that's your problem. Short-term: check whether retention is configured and actually deleting old data. Long-term: expand the volume or move Loki's storage backend to object storage so you're not constrained by local disk capacity.

    Verify Loki's retention configuration:

    grep -A 5 'table_manager\|compactor\|retention' /etc/loki/config.yaml
    compactor:
      working_directory: /var/loki/compactor
      shared_store: filesystem
      retention_enabled: true
    
    limits_config:
      retention_period: 336h

    If

    retention_enabled
    is missing or set to
    false
    , Loki never cleans up old chunks. Enable it, set a retention period appropriate for your needs (336 hours — 14 days — is a sensible default for most environments), and restart Loki. The compactor will begin cleaning up expired chunks on its next scheduled run. Don't expect immediate disk reclamation; give it an hour and re-check.


    Root Cause 7: Authentication and Tenant ID Errors

    If your Loki deployment has multi-tenancy enabled (the

    auth_enabled: true
    setting), every push request must include an
    X-Scope-OrgID
    header. Promtail must be configured to send it. Without it, Loki rejects the push with a 401 or a generic error. I've seen this catch teams off guard when they stand up a new Loki instance and copy a working Promtail config from a single-tenant environment where the header was never required.

    The Promtail config for tenant ID lives under the client block:

    clients:
      - url: http://192.168.10.25:3100/loki/api/v1/push
        tenant_id: infrarunbook-admin

    If you see errors like

    no org id
    in Loki's logs or Promtail gets back a
    401 Unauthorized
    , add the
    tenant_id
    field matching the org ID configured in Loki's tenant configuration. Alternatively, if you're intentionally running single-tenant and don't need auth, set
    auth_enabled: false
    in Loki's config and restart.


    Prevention

    Most Loki ingestion failures are detectable before they become visible outages, provided you have the right observability in place on the pipeline itself. The single most valuable metric to watch is

    loki_distributor_lines_received_total
    . Set an alert: if this counter stops incrementing for any stream that should be continuously active, something is wrong. You want to know in two minutes, not two hours.

    For rate limit exposure, alert on the ratio of dropped entries in Promtail. The metric

    promtail_dropped_entries_total
    should be zero in steady state. A non-zero and rising value means either rate limits are being hit, the endpoint is unreachable, or out-of-order rejections are accumulating. Any of those conditions warrants immediate investigation.

    Keep your Promtail position file on a persistent, backed-up volume — never on a tmpfs, a path that gets wiped on reboot, or anywhere that aggressive cleanup scripts might touch. The position file is the only thing standing between you and Promtail flooding Loki with re-read historical data after every restart. Treat it as important operational state.

    Run

    promtail -config.file=/etc/promtail/config.yaml -check-syntax
    as an explicit validation step in your configuration management pipeline — Ansible, Chef, Puppet, whatever you use. A config file with a syntax error will kill Promtail on the next restart, which kills log ingestion for every service on that host. Adding this check takes thirty seconds and eliminates an entire class of outage.

    Enforce label discipline across your fleet. Establish a fixed schema —

    host
    ,
    job
    ,
    app
    ,
    env
    — and document it clearly so every team onboarding a new service knows exactly which fields are labels and which stay in the log body. Every high-cardinality value (request IDs, user IDs, trace IDs, session tokens) belongs inside the structured log line, not as a Loki label. Violating this will cause stream explosion, which leads directly to rate limit failures and query performance degradation that's expensive to reverse.

    Finally, build synthetic end-to-end ingestion monitoring. A simple cron job on

    sw-infrarunbook-01
    that writes a known, timestamped sentinel entry to a file Promtail is watching, then queries Loki for that entry via
    logcli
    and alerts if it's not found within two minutes, gives you real end-to-end pipeline visibility. If that check goes red, you know immediately that ingestion is broken — and you have a clear starting point for every diagnostic step in this article.

    Frequently Asked Questions

    How do I verify that Promtail is successfully sending logs to Loki?

    Check Promtail's logs with `journalctl -u promtail.service -f` and look for lines containing `msg="successfully sent batch"`. You can also query the Loki metrics endpoint at `http://192.168.10.25:3100/metrics` and watch `loki_distributor_lines_received_total` increment. If it's static, nothing is being ingested.

    What does the 'entry out of order' error mean in Loki?

    Loki requires that log entries within a single stream arrive in strictly increasing timestamp order. If Promtail sends an entry with a timestamp older than the most recent entry already accepted for that stream, Loki rejects it with a 400 error and the message 'timestamp too old'. This commonly happens when Promtail restarts and re-reads a log file from the beginning, or when there is NTP clock drift between log-producing hosts.

    How do I increase the Loki ingestion rate limit?

    Edit the `limits_config` block in your Loki server config at `/etc/loki/config.yaml` and raise the `ingestion_rate_mb` and `ingestion_burst_size_mb` values. A common starting point for moderate production load is 32 MB/s with a 64 MB burst. After saving, restart Loki with `systemctl restart loki.service` and verify the service started cleanly via `journalctl -u loki.service -n 20`.

    Why are my Loki queries returning no results even though logs are being written?

    The most likely causes are: Promtail is not running or pointing at the wrong Loki endpoint, logs are being rejected due to rate limits or out-of-order timestamps, or the label set in your query doesn't match the label set Promtail is actually sending. Start by checking Promtail's journal for errors, verify the endpoint in its config, then query `http://192.168.10.25:3100/loki/api/v1/labels` to confirm Loki is actually receiving streams with the labels you're querying for.

    How can I prevent high-cardinality labels from breaking Loki ingestion?

    Only use low-cardinality values as Loki labels — values like environment name, application name, and hostname that have a small, bounded set of possible values. Never use request IDs, user IDs, trace IDs, or any per-request dynamic value as a label. Keep those values inside the structured log line body where they can be searched at query time using `|= ` or `| json` expressions. High-cardinality labels cause stream explosion that leads to ingestion failures and poor query performance.

    Related Articles