InfraRunBook
    Back to articles

    Node Exporter Metrics Missing

    Monitoring
    Published: Apr 19, 2026
    Updated: Apr 19, 2026

    Step-by-step runbook for diagnosing missing Node Exporter metrics in Prometheus, covering every common cause from a crashed service and firewall blocks to scrape config errors, disabled collectors, and label mismatches.

    Node Exporter Metrics Missing

    Symptoms

    You open Grafana, pull up the node dashboard, and something's wrong. Half the panels are blank. Or you're checking Prometheus directly and notice a host that should be reporting is simply gone from

    up{job="node"}
    . The alert fires at 2 AM — NodeDown — and now you're digging through logs trying to figure out why metrics stopped flowing.

    Before diving in, here's what "node exporter metrics missing" typically looks like in practice:

    • The Prometheus Targets page shows a host in DOWN state with a scrape error message
    • up{instance="192.168.10.45:9100"}
      returns 0 or produces no data at all
    • Grafana panels show "No data" for one or more specific nodes while others report fine
    • Individual metrics like
      node_cpu_seconds_total
      or
      node_memory_MemAvailable_bytes
      are absent from query results
    • Prometheus scrape errors read something like
      context deadline exceeded
      ,
      connection refused
      , or
      no such host

    This guide walks through every root cause I've run into in production — from the obvious "the exporter died" to the sneaky label mismatch that once took me the better part of an hour to track down. Let's get into it.


    Root Cause 1: Node Exporter Is Not Running

    This is the most common cause, and it's also the easiest to overlook because it feels too obvious to check first. The exporter process crashed, was never started after a reboot, or was stopped during maintenance and nobody restarted it. In my experience, the reboot scenario is especially common after a kernel update — the service is marked enabled but something in the startup sequence fails and nobody notices for hours.

    How to Identify It

    SSH into the target host and check the service state directly:

    ssh infrarunbook-admin@192.168.10.45
    systemctl status node_exporter

    If the exporter has crashed, you'll see output along these lines:

    ● node_exporter.service - Node Exporter
         Loaded: loaded (/etc/systemd/system/node_exporter.service; enabled; vendor preset: enabled)
         Active: failed (Result: exit-code) since Sat 2026-04-19 03:12:44 UTC; 2h 14min ago
        Process: 3842 ExecStart=/usr/local/bin/node_exporter (code=exited, status=1/FAILURE)
       Main PID: 3842 (code=exited, status=1/FAILURE)

    Confirm nothing is listening on port 9100:

    ss -tlnp | grep 9100

    No output means the port is dead. Then pull the journal to understand why it failed — don't skip this step, because the fix depends on the failure reason:

    journalctl -u node_exporter -n 50 --no-pager

    How to Fix It

    Start the service and make sure it's enabled for future reboots:

    systemctl start node_exporter
    systemctl enable node_exporter
    systemctl status node_exporter

    After it starts, confirm the metrics endpoint is responding:

    curl -s http://192.168.10.45:9100/metrics | head -20

    A healthy exporter returns output starting with something like this:

    # HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
    # TYPE go_gc_duration_seconds summary
    go_gc_duration_seconds{quantile="0"} 4.9351e-05
    go_gc_duration_seconds{quantile="0.25"} 6.7598e-05
    go_gc_duration_seconds{quantile="0.5"} 9.1254e-05

    If the service keeps crashing, the journal will tell you why. Usual suspects: a missing or moved binary, a permissions error on a path a collector tries to read, or a startup flag that was valid in an older version of node_exporter but got removed in a recent upgrade.


    Root Cause 2: Firewall Blocking Port 9100

    The exporter is running. The metrics endpoint responds locally. But Prometheus still can't collect anything. Firewall rules are the silent killer here. A teammate tightened iptables during a security hardening sprint and dropped the rule for 9100. Or a cloud security group got updated and nobody noticed node exporter traffic wasn't exempted. I've also seen this happen when a host gets migrated to a new subnet and the firewall policy for that VLAN didn't include a carve-out for monitoring traffic.

    How to Identify It

    From the Prometheus server, try to reach the exporter directly and watch for a timeout rather than an immediate connection refused:

    curl -v --connect-timeout 5 http://192.168.10.45:9100/metrics

    A firewall-blocked connection looks like this — the packet is dropped, not rejected:

    * Trying 192.168.10.45:9100...
    * Connection timeout after 5001ms
    * Closing connection 0
    curl: (28) Connection timed out after 5001 milliseconds

    Compare that to connection refused, which means the port is reachable but nothing is listening. A timeout means the packet never gets a response. On the target host, check the active rules:

    iptables -L INPUT -n -v | grep 9100

    Or if the host uses nftables:

    nft list ruleset | grep 9100

    For hosts running firewalld:

    firewall-cmd --list-all

    If port 9100 isn't listed under

    ports:
    or in an explicit allow rule, that's your problem.

    How to Fix It

    For iptables, add a rule scoped to the Prometheus server IP — don't open it to the world:

    iptables -A INPUT -s 192.168.10.10 -p tcp --dport 9100 -j ACCEPT
    iptables-save > /etc/iptables/rules.v4

    For firewalld:

    firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="192.168.10.10" port port="9100" protocol="tcp" accept'
    firewall-cmd --reload

    After applying the rule, rerun the curl from the Prometheus host to confirm connectivity is restored before calling it done.


    Root Cause 3: Prometheus Scrape Config Is Wrong

    The exporter is running, the port is open, and Prometheus still isn't collecting. Now you need to look at

    prometheus.yml
    itself. A typo in a hostname, an incorrect port number, a missing job entry, a stale IP — any of these will silently leave a host unmonitored. This is more common than it sounds, especially in environments where the Prometheus config is edited by hand rather than generated from a service catalog.

    How to Identify It

    Go to the Prometheus UI at

    http://192.168.10.10:9090/targets
    . Look for the host in question. If it doesn't appear at all, it's not in any scrape config. If it appears but shows DOWN, click the error — it'll say something like:

    Get "http://192.168.10.45:9100/metrics": dial tcp 192.168.10.45:9100: connect: connection refused

    Or for a DNS resolution failure:

    Get "http://sw-infrarunbook-01.solvethenetwork.com:9100/metrics": dial tcp: lookup sw-infrarunbook-01.solvethenetwork.com: no such host

    Check your scrape config directly:

    cat /etc/prometheus/prometheus.yml

    A correct static node exporter job looks like this:

    scrape_configs:
      - job_name: 'node'
        static_configs:
          - targets:
              - '192.168.10.45:9100'
              - '192.168.10.46:9100'
              - '192.168.10.47:9100'

    Common mistakes: the port is listed as

    9190
    instead of
    9100
    , the IP has a transposed digit, the host was added to the wrong job block, or the config was edited but Prometheus was never told to reload it. Check whether Prometheus is actually running the config you think it is:

    curl -X POST http://192.168.10.10:9090/-/reload

    Or check the last reload timestamp in the Prometheus UI under Status > Runtime & Build Information.

    How to Fix It

    Correct the target entry, validate the file, then reload — always in that order:

    promtool check config /etc/prometheus/prometheus.yml
    systemctl reload prometheus

    Running

    promtool check config
    before reloading is non-negotiable. It catches syntax errors before they cause Prometheus to reject the entire config and fall back to whatever it had before — or worse, fail to start after a restart.


    Root Cause 4: Collector Disabled

    Node exporter ships with a large set of collectors — cpu, memory, disk, network, filesystem, and many more. By default, most are enabled, but some aren't. And in some environments I've worked in, teams explicitly disable collectors to reduce metric cardinality or shorten scrape time. If you're looking for a specific metric and it simply isn't there, the collector that exposes it might be disabled — either intentionally or by accident when someone copied a startup config from a different environment.

    How to Identify It

    Check how node_exporter is being launched and what flags are passed to it:

    ps aux | grep node_exporter

    Or inspect the systemd unit file directly:

    cat /etc/systemd/system/node_exporter.service

    You might find something like this:

    [Service]
    ExecStart=/usr/local/bin/node_exporter \
      --no-collector.wifi \
      --no-collector.nfs \
      --no-collector.xfs \
      --no-collector.cpu

    That last flag —

    --no-collector.cpu
    — explains why
    node_cpu_seconds_total
    is completely absent. You can also hit the metrics endpoint directly and search for the specific metric:

    curl -s http://192.168.10.45:9100/metrics | grep node_cpu_seconds_total

    No output confirms the collector is off. To see all currently enabled collectors and their scrape status, look at the

    node_scrape_collector_success
    metric:

    curl -s http://192.168.10.45:9100/metrics | grep node_scrape_collector_success
    node_scrape_collector_success{collector="arp"} 1
    node_scrape_collector_success{collector="bcache"} 1
    node_scrape_collector_success{collector="bonding"} 1
    node_scrape_collector_success{collector="conntrack"} 1
    node_scrape_collector_success{collector="diskstats"} 1
    node_scrape_collector_success{collector="filesystem"} 1
    node_scrape_collector_success{collector="meminfo"} 1

    If

    cpu
    doesn't appear in that list, it was disabled at launch.

    How to Fix It

    Edit the service file to remove the

    --no-collector.cpu
    flag (or whichever collector you need to restore), then reload systemd and restart the service:

    systemctl daemon-reload
    systemctl restart node_exporter

    If you need to enable a collector that's off by default, use the

    --collector.<name>
    flag. For example, to enable the perf collector:

    ExecStart=/usr/local/bin/node_exporter --collector.perf

    Be deliberate about which collectors you run. Enabling everything in a high-cardinality environment can meaningfully increase scrape duration and Prometheus memory usage. If you're disabling collectors intentionally, document it with a comment in the unit file — undocumented omissions are indistinguishable from bugs.


    Root Cause 5: Label Mismatch

    This one is subtle and easy to miss because the data is actually being scraped — it's in Prometheus — but your query returns nothing, or Grafana shows blank panels. The reason is a label mismatch: your query is filtering on a label value that doesn't match what's actually attached to the metric in storage.

    In my experience, this surfaces most often after someone changes how targets are discovered — switching from static configs to file-based or Consul service discovery, renaming a job, or modifying relabeling rules. The metric is right there in Prometheus. It's just labeled differently than what your dashboard expects.

    How to Identify It

    Run a broad query in Prometheus without any label filters to see what labels are actually attached to the metric you're looking for:

    node_cpu_seconds_total

    Look at what comes back. You might see:

    node_cpu_seconds_total{cpu="0",instance="192.168.10.45:9100",job="linux_nodes",mode="idle"} 12345.67

    Now check your Grafana panel query — it might be filtering on

    job="node"
    when the actual job label is
    job="linux_nodes"
    . That single mismatch means zero results, even though all the data is right there. Also check the relabeling rules in your prometheus.yml, which can silently transform or drop labels during scraping:

    scrape_configs:
      - job_name: 'linux_nodes'
        relabel_configs:
          - source_labels: [__address__]
            target_label: instance
            regex: '(.+):9100'
            replacement: '$1'

    That rule strips the port from the instance label. If your dashboard is querying

    instance="192.168.10.45:9100"
    , it won't match — the stored label is
    instance="192.168.10.45"
    . Use the Prometheus UI's Labels explorer to inspect exactly what's stored for a given metric and target.

    How to Fix It

    You have two options: update your queries and dashboards to match the actual label values, or adjust your relabeling rules to produce the labels your queries expect. In most cases, updating the queries is faster and less risky. In Grafana, go into the panel editor, locate the label filter, and correct it to match what Prometheus actually stores.

    For Prometheus alerting rules that rely on specific labels, update them in your rules files and reload:

    promtool check rules /etc/prometheus/rules/*.yml
    curl -X POST http://192.168.10.10:9090/-/reload

    If the mismatch is widespread — say, you renamed a job and now thirty dashboards are broken — you can temporarily add a metric_relabel_config to add a backward-compatible label alias while you migrate. Don't leave that workaround in place indefinitely, though. Fix the root definition and update the consumers systematically.


    Root Cause 6: Node Exporter Bound to Loopback Only

    The exporter is running and healthy, but it's only listening on

    127.0.0.1
    — so nothing outside the host can reach it. I've seen this happen when someone copies a startup script from an old tutorial that explicitly binds to loopback for "security," not realizing it breaks remote scraping entirely. It's also a common artifact of installing node_exporter from a distribution package rather than the official release, where the default unit file may bind to localhost.

    How to Identify It

    Check what address node_exporter is actually listening on:

    ss -tlnp | grep 9100

    If you see loopback only, the exporter is unreachable from outside the host:

    LISTEN 0   128   127.0.0.1:9100   0.0.0.0:*   users:(("node_exporter",pid=4521,fd=3))

    What you want instead — binding to all interfaces:

    LISTEN 0   128   0.0.0.0:9100   0.0.0.0:*   users:(("node_exporter",pid=4521,fd=3))

    How to Fix It

    The flag controlling bind address is

    --web.listen-address
    . Check the service file and update it:

    ExecStart=/usr/local/bin/node_exporter --web.listen-address="0.0.0.0:9100"

    Reload systemd and restart the service, then confirm the binding changed:

    systemctl daemon-reload
    systemctl restart node_exporter
    ss -tlnp | grep 9100

    If your security policy doesn't allow binding to all interfaces, bind to the specific interface that Prometheus uses to reach this host — just not loopback. A host with IP

    192.168.10.45
    on its primary interface would use
    --web.listen-address="192.168.10.45:9100"
    .


    Root Cause 7: TLS or Authentication Misconfiguration

    Hardened environments often add TLS and basic authentication to node exporter. When the exporter is configured to require these but Prometheus isn't configured with matching credentials or a trusted CA, scrapes fail — either with an explicit HTTP error or a cryptic connection error that doesn't immediately suggest an auth problem.

    How to Identify It

    On the Prometheus Targets page, check the error column. A missing or wrong credential shows up as:

    server returned HTTP status 401 Unauthorized

    If TLS is required but Prometheus is still using plain HTTP, you'll often see:

    Get "http://192.168.10.45:9100/metrics": EOF

    Check whether the exporter has a web config file that enables TLS or auth:

    cat /etc/node_exporter/web-config.yml
    tls_server_config:
      cert_file: /etc/ssl/node_exporter/node_exporter.crt
      key_file: /etc/ssl/node_exporter/node_exporter.key
    basic_auth_users:
      prometheus: $2y$10$X5jVqN8...

    If this file exists and is being passed to node_exporter via

    --web.config.file
    , then Prometheus must be configured to match.

    How to Fix It

    Update your Prometheus scrape job to include the TLS configuration and credentials:

    scrape_configs:
      - job_name: 'node'
        scheme: https
        tls_config:
          ca_file: /etc/prometheus/certs/ca.crt
          insecure_skip_verify: false
        basic_auth:
          username: prometheus
          password_file: /etc/prometheus/node_exporter_password
        static_configs:
          - targets:
              - '192.168.10.45:9100'

    Store the password in a file with restricted permissions, not inline in the config:

    chmod 600 /etc/prometheus/node_exporter_password
    chown prometheus:prometheus /etc/prometheus/node_exporter_password

    Validate and reload Prometheus, then confirm the target transitions to UP on the Targets page.


    Prevention

    Most of these issues are preventable with upfront work. Here's what I keep in place on every environment I manage.

    Start with an alerting rule for the

    up
    metric. If node exporter stops reporting, you want to know within minutes — not when someone opens a dashboard and notices blank panels:

    groups:
      - name: node_exporter
        rules:
          - alert: NodeExporterDown
            expr: up{job="node"} == 0
            for: 2m
            labels:
              severity: critical
            annotations:
              summary: "Node exporter down on {{ $labels.instance }}"
              description: "Node exporter on {{ $labels.instance }} has been unreachable for more than 2 minutes."

    Use configuration management — Ansible, Puppet, Salt — to enforce that the node_exporter service is running and enabled. Don't rely on humans to remember to start it after a reboot. A simple Ansible handler with

    state: started
    and
    enabled: yes
    handles the whole class of "forgot to restart after kernel update" incidents.

    Add firewall rule management to the same configuration management role that deploys node exporter. The rule and the service should live together. If you install the exporter, you open the port. If you remove the exporter, the rule goes with it. Treating them as separate tasks is how you end up with an exporter that starts but can't be scraped.

    Validate your prometheus.yml on every change with

    promtool check config
    before applying it. If you manage your Prometheus config in a git repo — and you should — run this check in CI. Catching a port typo in a target address before it hits production is far better than hunting it down while an alert is firing.

    Document disabled collectors in the unit file. A one-line comment explaining why

    --no-collector.xfs
    is set takes ten seconds to write and saves the next engineer (including future you) from assuming it's a bug. Undocumented intentional omissions are indistinguishable from accidents.

    Finally, do a periodic audit of your label structure whenever you modify scrape configs or relabeling rules. Grep your alerting rules and Grafana dashboards for any label values that might be affected by the change. Label drift is one of those problems that compounds silently — dashboards can appear functional while actually showing stale or incomplete data for weeks until someone notices.

    Node exporter is stable, mature software. It rarely breaks on its own. When metrics go missing, something in the surrounding environment changed. Work outward from the basics: is the process running, can Prometheus reach it, is Prometheus configured to look for it, and do the labels on what gets scraped match what your queries expect. That covers the overwhelming majority of cases you'll encounter.

    Frequently Asked Questions

    Why does up{job="node"} return 0 even though node exporter appears to be running?

    The most common reason is that node exporter is listening on loopback (127.0.0.1:9100) instead of all interfaces, or a firewall rule is blocking Prometheus from reaching port 9100. Run `ss -tlnp | grep 9100` on the target host to confirm the bind address, and test connectivity from the Prometheus server using `curl --connect-timeout 5 http://<target-ip>:9100/metrics`.

    How do I reload Prometheus after editing prometheus.yml without restarting it?

    Always validate first with `promtool check config /etc/prometheus/prometheus.yml`, then send a reload signal: `systemctl reload prometheus` or `curl -X POST http://192.168.10.10:9090/-/reload`. The HTTP reload endpoint is convenient but only works if `--web.enable-lifecycle` is passed to Prometheus at startup.

    How can I tell which node exporter collectors are currently enabled?

    Query the `node_scrape_collector_success` metric from the exporter directly: `curl -s http://192.168.10.45:9100/metrics | grep node_scrape_collector_success`. Each label value represents a running collector. Any collector disabled via a `--no-collector.<name>` flag won't appear in this output.

    Why are some node exporter metrics missing after switching from static configs to service discovery?

    This is almost always a label mismatch. Service discovery often attaches different label values — or transforms them via relabeling rules — compared to a static config entry. Run a label-free query for the metric in Prometheus to see what labels are actually stored, then compare them against any filtering your dashboards or alerting rules apply.

    What is the correct port for node exporter, and can I change it?

    Node exporter listens on port 9100 by default. You can change it with the `--web.listen-address` flag, for example `--web.listen-address="0.0.0.0:9200"`. If you change the port, update both your firewall rules and the Prometheus scrape config target entry to match.

    Related Articles