InfraRunBook
    Back to articles

    Fluent Bit Not Forwarding Logs

    Logging
    Published: Apr 19, 2026
    Updated: Apr 19, 2026

    Fluent Bit running but logs not reaching Elasticsearch, Loki, or your aggregator? This troubleshooting guide covers every common root cause with real commands, error messages, and fixes.

    Fluent Bit Not Forwarding Logs

    Symptoms

    You've deployed Fluent Bit across your fleet and pointed it at a downstream destination — Elasticsearch, Loki, Splunk, or a remote syslog receiver — but logs aren't showing up. The service is running,

    systemctl status fluent-bit
    reports active, but the destination is silent. Maybe forwarding worked fine last week, and now there's a gap in your dashboards starting at some arbitrary hour nobody can explain.

    Here's what the failure surface usually looks like. The Fluent Bit process is alive but

    output.errors
    on the internal metrics endpoint keeps climbing. Or the process is running and metrics look completely flat — zero records in, zero records out — which is its own kind of broken. You tail
    /var/log/fluent-bit.log
    and see a wall of retry messages, or worse, nothing at all. Some sources forward fine while others are silently dropped. Running
    fluent-bit -c /etc/fluent-bit/fluent-bit.conf --dry-run
    exits clean, making you feel like the config is correct — but that's a trap. Dry-run validates syntax, not runtime behavior.

    The root causes below cover the vast majority of forwarding failures I've seen in production. Work through them systematically rather than tweaking random knobs.


    Root Cause 1: Output Plugin Misconfigured

    This is the most common cause of logs-not-forwarding bugs, and it's embarrassing how often it catches experienced engineers. The output plugin block has a typo, a wrong port, or a parameter that silently gets ignored because it's not a recognized key for that plugin.

    Why it happens: Fluent Bit's configuration parser is permissive. Unknown keys don't cause a startup failure — they're silently dropped. So if you write the wrong port number, reference an Elasticsearch 7 parameter against an Elasticsearch 8 endpoint, or misname an option, Fluent Bit starts fine and then either connects to the wrong place or sends malformed payloads the destination rejects without making noise about it.

    How to identify it. First, enable debug logging and pipe it somewhere you can read:

    fluent-bit -c /etc/fluent-bit/fluent-bit.conf -vv 2>&1 | tee /tmp/fb-debug.log

    Look for lines like these in the output:

    [2024/10/14 03:22:17] [ warn] [output:es:es.0] could not connect to 10.10.1.45:9200
    [2024/10/14 03:22:17] [error] [output:es:es.0] HTTP status=400 URI=/fluent-bit/_doc, response:
    {"error":{"root_cause":[{"type":"mapper_parsing_exception","reason":"failed to parse"}]}}

    A 400 from Elasticsearch almost always means the index mapping is wrong or you're sending a field Elasticsearch doesn't expect. An HTTP 404 usually means the

    Index
    parameter points to a non-existent index or the path is wrong. Connection refused means the host or port is wrong. A common misconfigured block looks like this:

    [OUTPUT]
        Name  es
        Match *
        Host  10.10.1.45
        Port  9300          # Wrong: 9300 is the transport port, not the HTTP port
        Index fluent-bit
        Type  _doc

    How to fix it. Cross-reference the Fluent Bit docs for exact plugin parameter names — they're case-insensitive but spelling matters. For Elasticsearch the HTTP port is 9200. Use

    curl
    to independently verify the endpoint is reachable from sw-infrarunbook-01 before blaming Fluent Bit:

    curl -v http://10.10.1.45:9200/_cluster/health

    A corrected output block for Elasticsearch 8.x looks like this:

    [OUTPUT]
        Name              es
        Match             *
        Host              10.10.1.45
        Port              9200
        Index             fluent-bit
        Suppress_Type_Name On

    The

    Suppress_Type_Name
    flag is required for Elasticsearch 8.x — without it you'll get 400 errors on every write because type mappings were removed. If you're on ES 7.x, drop that flag.


    Root Cause 2: TLS Certificate Not Trusted

    If your log destination uses HTTPS and the certificate is self-signed, issued by an internal CA, or expired, Fluent Bit will refuse the connection. In my experience this is the second most common cause of silent forwarding failure, and it's particularly nasty because the error messages aren't always obvious — depending on retry configuration, Fluent Bit may just keep retrying in the background without prominently surfacing why.

    Why it happens: Fluent Bit validates server certificates against the system trust store. If your internal CA cert isn't in that store — or if the cert has expired — the TLS handshake fails and records queue up until the buffer fills and they start dropping. The service keeps running. Nothing crashes. You just stop getting logs.

    How to identify it. Run with debug verbosity and filter for TLS-related output:

    fluent-bit -c /etc/fluent-bit/fluent-bit.conf -vv 2>&1 | grep -i 'tls\|ssl\|cert'

    The lines you're looking for:

    [2024/10/14 04:11:02] [error] [tls] error: certificate verify failed
    [2024/10/14 04:11:02] [error] [output:http:http.0] could not connect to 10.10.1.50:9243

    Independently verify the certificate chain from sw-infrarunbook-01 before touching Fluent Bit config:

    openssl s_client -connect 10.10.1.50:9243 -CAfile /etc/ssl/certs/ca-bundle.crt 2>&1 | head -30

    If you see

    Verify return code: 21 (unable to verify the first certificate)
    or
    certificate has expired
    , you've confirmed the problem is the cert chain, not Fluent Bit itself.

    How to fix it. The right approach is to add your internal CA to the system trust store and reference it in your Fluent Bit output block:

    # Add the internal CA to the system trust store
    cp /etc/pki/internal-ca.crt /usr/local/share/ca-certificates/internal-ca.crt
    update-ca-certificates
    
    # Reference it explicitly in the output plugin
    [OUTPUT]
        Name        http
        Match       *
        Host        10.10.1.50
        Port        9243
        tls         On
        tls.verify  On
        tls.ca_file /usr/local/share/ca-certificates/internal-ca.crt

    The wrong approach — and I see it in production configs more often than I'd like — is setting

    tls.verify Off
    . That disables certificate validation entirely. You're encrypting the channel but not authenticating the server, which defeats a core purpose of TLS. Don't do this in production. Fix the cert trust instead.

    If the certificate is expired, get it renewed. There's no workaround for an expired cert that isn't a security regression. Set a calendar reminder to check cert expiry 30 days out on every log forwarding endpoint — catching this before Fluent Bit does is considerably less stressful.


    Root Cause 3: Buffer Full

    Fluent Bit uses in-memory and optional filesystem buffers to absorb bursts and handle backpressure. When the downstream destination is slow, down, or rejecting records, Fluent Bit retries and those records sit in the buffer. If the buffer fills up, new incoming records are dropped. The service keeps running, nothing crashes, but your logs stop forwarding and nobody gets paged about it.

    Why it happens: the default

    Mem_Buf_Limit
    for most inputs is around 5MB. In a high-volume environment or during a destination outage, that fills up in minutes. Once the limit is hit, Fluent Bit emits a warning and begins pausing inputs — which means records being written to the log file are no longer tracked. You don't lose the file data, but Fluent Bit falls behind, and if the pause persists long enough, the input offset can drift such that you miss records entirely.

    How to identify it. Watch Fluent Bit logs for this specific pattern:

    [2024/10/14 05:43:10] [ warn] [input] pausing tail.0 (mem buf overlimit)
    [2024/10/14 05:43:10] [ warn] [input] resume tail.0 (mem buf overlimit)

    The rapid pausing and resuming cycle is the telltale sign of buffer pressure. Also check the metrics endpoint if you've enabled it:

    curl -s http://127.0.0.1:2020/api/v1/metrics/v2 | python3 -m json.tool
    {
      "output": {
        "es.0": {
          "proc_records": 14203,
          "proc_bytes": 8921043,
          "errors": 892,
          "retries": 1204,
          "retries_failed": 156,
          "dropped_records": 156
        }
      }
    }

    Any non-zero

    dropped_records
    value means you've already lost data. This is a production incident, not just a config tweak.

    How to fix it. Increase the

    Mem_Buf_Limit
    and switch to filesystem buffering so records survive both restarts and destination outages:

    [SERVICE]
        storage.path              /var/lib/fluent-bit/buffer
        storage.sync              normal
        storage.checksum          off
        storage.max_chunks_up     128
    
    [INPUT]
        Name              tail
        Path              /var/log/app/*.log
        Tag               app.*
        Mem_Buf_Limit     50MB
        storage.type      filesystem

    Also tune the retry behavior in your output plugin. Setting

    Retry_Limit
    to
    False
    tells Fluent Bit to retry indefinitely rather than dropping records after a fixed number of attempts — acceptable once the destination issue is resolved:

    [OUTPUT]
        Name        es
        Match       *
        Host        10.10.1.45
        Port        9200
        Retry_Limit False

    A full buffer is always a symptom, not the root cause. Something downstream is slow or broken, and Fluent Bit is absorbing the pressure. Fix the destination issue, then tune the buffer to give yourself headroom the next time a downstream hiccup happens.


    Root Cause 4: Parser Not Matching Log Format

    Parsers are the part of Fluent Bit configs that most people set up once and forget — until logs start arriving at Elasticsearch as a single unparsed blob, filter rules stop working, or structured fields you expect to query on don't exist. When a parser doesn't match, Fluent Bit typically passes the raw line through as a single

    log
    field. Records are still forwarded, but they're useless for anything beyond archival.

    Why it happens: log formats change. An application update switches the timestamp from ISO 8601 to syslog format, and your regex parser silently stops matching. Someone adds multiline Java stack traces and the tail plugin splits each line into a hundred individual events. Or the parser was written against a test log that didn't represent the full range of real output — missing fields, extra spaces, different HTTP methods — and only fails on production traffic.

    How to identify it. The fastest diagnostic is adding a temporary stdout output plugin to see what Fluent Bit is actually producing:

    [OUTPUT]
        Name   stdout
        Match  *
        Format json_lines

    Run Fluent Bit and watch the output. If you see records like this, the parser is not matching:

    {"date":1728872537.0,"log":"Oct 14 03:22:17 sw-infrarunbook-01 nginx: 10.10.1.10 - infrarunbook-admin [14/Oct/2024:03:22:17 +0000] GET /api/health HTTP/1.1 200 45"}

    The entire log line dumped into a single

    log
    string instead of discrete fields like
    remote_addr
    ,
    status
    , and
    body_bytes_sent
    . You can also test a parser directly from the command line:

    echo '10.10.1.10 - infrarunbook-admin [14/Oct/2024:03:22:17 +0000] GET /health HTTP/1.1 200 45' \
      | fluent-bit -R /etc/fluent-bit/parsers.conf --stdin --parser nginx -i stdin -o stdout -f json_lines

    If the parser matches, you'll get structured JSON fields back. If it doesn't, you'll get the raw input passed through unchanged.

    How to fix it. Validate your regex against actual log samples before deploying. A common nginx combined log parser that works against real-world output:

    [PARSER]
        Name        nginx
        Format      regex
        Regex       ^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)] (?<method>\S+) (?<path>\S+) \S+ (?<code>[^ ]*) (?<size>[^ ]*)
        Time_Key    time
        Time_Format %d/%b/%Y:%H:%M:%S %z

    For multiline logs like Java stack traces, use the multiline parser feature rather than trying to reassemble them in a filter:

    [MULTILINE_PARSER]
        name          java_multiline
        type          regex
        flush_timeout 1000
        rule          "start_state"   "/^\d{4}-\d{2}-\d{2}/"  "java_after_ts"
        rule          "java_after_ts" "/^(\s+at|\s+\.{3}\s+\d+)/" "java_after_ts"
    
    [INPUT]
        Name              tail
        Path              /var/log/app/service.log
        Tag               app.service
        multiline.parser  java_multiline

    After any parser change, run the command-line test utility against a representative sample of real log data — not just a clean happy-path example you wrote yourself — before rolling it to production.


    Root Cause 5: Permission Denied Reading Log File

    Fluent Bit runs as a non-root user in most production deployments, and log files written by applications often have restrictive permissions. If Fluent Bit can't read a file, it silently skips it. No crash, no retry, no error in the destination — just nothing.

    Why it happens: the tail input plugin tries to open log files at startup and whenever new files match the configured glob. If the Fluent Bit process user doesn't have read permission on the file, or execute permission on a parent directory, the file is simply not watched. This is particularly common when log files are owned by application-specific users like

    www-data
    ,
    postgres
    , or
    nginx
    , or when log directories use restrictive permission bits that block traversal by other users.

    How to identify it. Enable debug logging and filter for permission and open errors:

    fluent-bit -c /etc/fluent-bit/fluent-bit.conf -vv 2>&1 | grep -iE 'permission|denied|open|inode'

    You're looking for output like this:

    [2024/10/14 06:01:33] [ warn] [input:tail:tail.0] error opening file /var/log/app/service.log: Permission denied

    Find what user Fluent Bit runs as, then manually test access as that user:

    # Check the systemd service unit
    grep -i user /lib/systemd/system/fluent-bit.service
    
    # Test access as the fluent-bit service user
    sudo -u fluent ls -la /var/log/app/
    sudo -u fluent cat /var/log/app/service.log

    Don't just check the file — check the entire directory path. A file can be world-readable but completely unreachable if a parent directory doesn't allow execute for the Fluent Bit user. Use

    namei
    to trace the full permission chain:

    namei -l /var/log/app/service.log
    f: /var/log/app/service.log
    drwxr-xr-x root   root   /
    drwxr-xr-x root   root   var
    drwxr-x--- root   syslog log        <-- Fluent Bit user cannot enter this directory
    drwxr-x--- syslog syslog app
    -rw-r----- syslog syslog service.log

    That

    drwxr-x---
    on
    /var/log
    is the culprit — it only allows the
    syslog
    group to traverse, so any user not in that group is blocked before even reaching the file.

    How to fix it. The cleanest solution is adding the Fluent Bit service user to the group that owns the log directory and files:

    # If Fluent Bit runs as user 'fluent' and logs are owned by group 'syslog'
    usermod -aG syslog fluent
    systemctl restart fluent-bit
    
    # Verify the fix
    sudo -u fluent ls -la /var/log/app/

    For application logs owned by a dedicated app user where changing group ownership isn't practical, use POSIX ACLs. The default ACL ensures new log files created by log rotation are automatically accessible too:

    # Grant read + directory traversal to the fluent-bit user
    setfacl -R -m u:fluent:rX /var/log/app/
    
    # Apply the same ACL to all future files and directories created under /var/log/app/
    setfacl -R -d -m u:fluent:rX /var/log/app/
    
    # Verify
    getfacl /var/log/app/service.log

    Avoid making log files world-readable (

    chmod o+r
    ) as a shortcut — logs often contain sensitive data. Group membership or ACLs give you the access you need without oversharing.


    Root Cause 6: Tag and Match Misconfiguration

    Every record in Fluent Bit carries a tag, and output plugins use

    Match
    patterns to decide which records they process. If your tags and match patterns don't align, records are ingested and parsed correctly but never forwarded — they fall into the void at the routing stage and are silently discarded.

    In debug output, you'll see input metrics incrementing while output metrics stay completely flat. The quickest diagnostic is adding a temporary stdout output with

    Match *
    — if records appear there but not in your real output, the match pattern is the problem. Then check what tags are actually being generated:

    fluent-bit -c /etc/fluent-bit/fluent-bit.conf -vv 2>&1 | grep engine
    [2024/10/14 06:15:22] [debug] [engine] flush chunk with tag 'app.service.production' to output 'es.0'

    If your output has

    Match app.*
    , that tag matches fine. If it has
    Match application.*
    or
    Match app.service
    without a wildcard that covers
    app.service.production
    , those records are dropped. Fix the match pattern or adjust the tag in your input block to align with your existing output rules.


    Root Cause 7: Network Connectivity or Firewall Blocking

    Sometimes the problem has nothing to do with Fluent Bit configuration. The destination is unreachable because a firewall rule changed, a security group was tightened, or a DNS record wasn't updated after a migration. Fluent Bit retries endlessly and logs connection errors, but if nobody is watching, it looks like a Fluent Bit problem rather than a network problem.

    Test directly from sw-infrarunbook-01 to confirm connectivity before spending more time in the Fluent Bit config:

    # TCP-level connectivity check
    nc -zv 10.10.1.45 9200
    
    # DNS resolution if using a hostname
    dig +short logs.solvethenetwork.com
    
    # Full HTTP round-trip
    curl -v --max-time 5 http://10.10.1.45:9200/_cluster/health

    A timeout or connection refused from

    curl
    confirms the issue is network-level. Check iptables or nftables rules on both the source and destination, review any intermediate firewall policy, and verify routing from the source subnet to the destination IP. Don't fix Fluent Bit config when the real problem is a firewall ticket that needs to be filed.


    Prevention

    Most of these failures are preventable with a few operational habits that don't require much upfront investment.

    Enable the built-in HTTP server and scrape its metrics with Prometheus or your preferred monitoring stack. A single endpoint at

    http://127.0.0.1:2020/api/v1/metrics/v2
    gives you visibility into dropped records, retry counts, and buffer pressure before users notice a gap in their dashboards. Alert on
    dropped_records > 0
    and
    retries_failed > 0
    — these are production incidents masquerading as background noise.

    [SERVICE]
        HTTP_Server  On
        HTTP_Listen  127.0.0.1
        HTTP_Port    2020
        Log_Level    info

    Always run configuration changes through a staging deployment first. Use the stdout plugin to verify that records are being parsed and tagged correctly before touching your production output destination. Treat parser changes the same way you'd treat a schema migration — test against a representative sample of real log data, not just a clean example you wrote to match the parser.

    Use filesystem buffering in every production deployment. The default in-memory buffer disappears on restart and fills up fast under load. Filesystem buffering survives restarts, provides much more headroom during destination outages, and gives you a path to recover records that would otherwise be lost. The storage path needs to be on a filesystem with adequate free space — monitor it the same way you'd monitor any other critical volume.

    Document the service account Fluent Bit runs as, and make log file permissions part of your application deployment and log rotation checklists. Every time a new log path is added or a

    logrotate
    config is changed, someone needs to verify Fluent Bit can still read the files. Using default ACLs on log directories —
    setfacl -d
    — means new files automatically inherit the right permissions, which eliminates the entire class of post-rotation permission failures.

    Keep your TLS certificate inventory current. If Fluent Bit forwards to an HTTPS endpoint signed by an internal CA, make sure CA cert rotation is part of your certificate lifecycle process. A cert expiring at 2 AM on a Sunday is not when you want to discover Fluent Bit has been silently dropping logs for six hours. Set alerts at 30 and 7 days before expiry on every log forwarding endpoint, and test the full chain with

    openssl s_client
    after any cert rotation, not just the endpoint health check.

    Finally, treat Fluent Bit's own log file as a first-class operational signal. Route it through your monitoring stack. If the log goes silent or the error rate climbs, you want to know about it the same way you'd know about any other service degradation — before your users do.

    Frequently Asked Questions

    Why does Fluent Bit start successfully but not forward any logs to the destination?

    Fluent Bit can start without errors even with broken configuration because its parser ignores unknown keys and doesn't validate runtime connectivity. Common causes include a misconfigured output plugin (wrong host, port, or parameter names), a TLS certificate the system doesn't trust, or a tag/match mismatch that routes records to nowhere. Start with <code>fluent-bit -c /path/to/config -vv</code> and look for connection errors or retry messages in the debug output.

    How do I test whether my Fluent Bit parser is matching log lines correctly?

    Use the command-line test: pipe a sample log line to Fluent Bit using <code>--stdin</code> with the <code>--parser</code> flag and output to stdout in JSON format. If the parser matches, you'll get structured key-value output. If it doesn't, the raw log line comes back in a single <code>log</code> field. You can also add a temporary <code>[OUTPUT] Name stdout Match *</code> block to your running config and watch what records actually look like as they flow through the pipeline.

    What is the difference between Mem_Buf_Limit and filesystem buffering in Fluent Bit?

    <code>Mem_Buf_Limit</code> sets the maximum in-memory buffer per input plugin. When it's full, Fluent Bit pauses the input and begins dropping incoming records. Filesystem buffering (<code>storage.type filesystem</code>) persists chunks to disk instead, which survives process restarts and can absorb much larger backlogs during destination outages. In production, you should use both: set a reasonable <code>Mem_Buf_Limit</code> and enable filesystem buffering so records aren't lost when the destination is unavailable.

    How do I check if Fluent Bit is silently dropping records?

    Enable the built-in HTTP server (<code>HTTP_Server On</code> in the SERVICE block) and query the metrics endpoint: <code>curl -s http://127.0.0.1:2020/api/v1/metrics/v2</code>. Look for <code>dropped_records</code> and <code>retries_failed</code> values greater than zero in the output section. You can also watch Fluent Bit logs for the pattern <code>[warn] [input] pausing tail.X (mem buf overlimit)</code>, which indicates the buffer is full and records are being discarded.

    Can Fluent Bit forward logs to a destination with a self-signed TLS certificate?

    Yes, but you need to explicitly trust the certificate. Add the signing CA to your system trust store with <code>update-ca-certificates</code>, then reference it in your output block using <code>tls.ca_file /path/to/ca.crt</code> with <code>tls.verify On</code>. Avoid setting <code>tls.verify Off</code> in production — this disables certificate validation entirely, which means you're encrypting traffic but not authenticating the server, leaving you vulnerable to interception.

    Related Articles