Node Exporter Metrics Missing

Symptoms

You open Grafana, pull up the node dashboard, and something's wrong. Half the panels are blank. Or you're checking Prometheus directly and notice a host that should be reporting is simply gone from

up{job="node"}

. The alert fires at 2 AM — NodeDown — and now you're digging through logs trying to figure out why metrics stopped flowing.

Before diving in, here's what "node exporter metrics missing" typically looks like in practice:

The Prometheus Targets page shows a host in DOWN state with a scrape error message
up{instance="192.168.10.45:9100"}
returns 0 or produces no data at all
Grafana panels show "No data" for one or more specific nodes while others report fine
Individual metrics like
node_cpu_seconds_total
or
node_memory_MemAvailable_bytes
are absent from query results
Prometheus scrape errors read something like
context deadline exceeded
,
connection refused
, or
no such host

This guide walks through every root cause I've run into in production — from the obvious "the exporter died" to the sneaky label mismatch that once took me the better part of an hour to track down. Let's get into it.

Root Cause 1: Node Exporter Is Not Running

This is the most common cause, and it's also the easiest to overlook because it feels too obvious to check first. The exporter process crashed, was never started after a reboot, or was stopped during maintenance and nobody restarted it. In my experience, the reboot scenario is especially common after a kernel update — the service is marked enabled but something in the startup sequence fails and nobody notices for hours.

How to Identify It

SSH into the target host and check the service state directly:

ssh infrarunbook-admin@192.168.10.45
systemctl status node_exporter

If the exporter has crashed, you'll see output along these lines:

● node_exporter.service - Node Exporter
     Loaded: loaded (/etc/systemd/system/node_exporter.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Sat 2026-04-19 03:12:44 UTC; 2h 14min ago
    Process: 3842 ExecStart=/usr/local/bin/node_exporter (code=exited, status=1/FAILURE)
   Main PID: 3842 (code=exited, status=1/FAILURE)

Confirm nothing is listening on port 9100:

ss -tlnp | grep 9100

No output means the port is dead. Then pull the journal to understand why it failed — don't skip this step, because the fix depends on the failure reason:

journalctl -u node_exporter -n 50 --no-pager

How to Fix It

Start the service and make sure it's enabled for future reboots:

systemctl start node_exporter
systemctl enable node_exporter
systemctl status node_exporter

After it starts, confirm the metrics endpoint is responding:

curl -s http://192.168.10.45:9100/metrics | head -20

A healthy exporter returns output starting with something like this:

# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 4.9351e-05
go_gc_duration_seconds{quantile="0.25"} 6.7598e-05
go_gc_duration_seconds{quantile="0.5"} 9.1254e-05

If the service keeps crashing, the journal will tell you why. Usual suspects: a missing or moved binary, a permissions error on a path a collector tries to read, or a startup flag that was valid in an older version of node_exporter but got removed in a recent upgrade.

Root Cause 2: Firewall Blocking Port 9100

The exporter is running. The metrics endpoint responds locally. But Prometheus still can't collect anything. Firewall rules are the silent killer here. A teammate tightened iptables during a security hardening sprint and dropped the rule for 9100. Or a cloud security group got updated and nobody noticed node exporter traffic wasn't exempted. I've also seen this happen when a host gets migrated to a new subnet and the firewall policy for that VLAN didn't include a carve-out for monitoring traffic.

How to Identify It

From the Prometheus server, try to reach the exporter directly and watch for a timeout rather than an immediate connection refused:

curl -v --connect-timeout 5 http://192.168.10.45:9100/metrics

A firewall-blocked connection looks like this — the packet is dropped, not rejected:

* Trying 192.168.10.45:9100...
* Connection timeout after 5001ms
* Closing connection 0
curl: (28) Connection timed out after 5001 milliseconds

Compare that to connection refused, which means the port is reachable but nothing is listening. A timeout means the packet never gets a response. On the target host, check the active rules:

iptables -L INPUT -n -v | grep 9100

Or if the host uses nftables:

nft list ruleset | grep 9100

For hosts running firewalld:

firewall-cmd --list-all

If port 9100 isn't listed under

ports:

or in an explicit allow rule, that's your problem.

How to Fix It

For iptables, add a rule scoped to the Prometheus server IP — don't open it to the world:

iptables -A INPUT -s 192.168.10.10 -p tcp --dport 9100 -j ACCEPT
iptables-save > /etc/iptables/rules.v4

For firewalld:

firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="192.168.10.10" port port="9100" protocol="tcp" accept'
firewall-cmd --reload

After applying the rule, rerun the curl from the Prometheus host to confirm connectivity is restored before calling it done.

Root Cause 3: Prometheus Scrape Config Is Wrong

The exporter is running, the port is open, and Prometheus still isn't collecting. Now you need to look at

prometheus.yml

itself. A typo in a hostname, an incorrect port number, a missing job entry, a stale IP — any of these will silently leave a host unmonitored. This is more common than it sounds, especially in environments where the Prometheus config is edited by hand rather than generated from a service catalog.

How to Identify It

Go to the Prometheus UI at

http://192.168.10.10:9090/targets

. Look for the host in question. If it doesn't appear at all, it's not in any scrape config. If it appears but shows DOWN, click the error — it'll say something like:

Get "http://192.168.10.45:9100/metrics": dial tcp 192.168.10.45:9100: connect: connection refused

Or for a DNS resolution failure:

Get "http://sw-infrarunbook-01.solvethenetwork.com:9100/metrics": dial tcp: lookup sw-infrarunbook-01.solvethenetwork.com: no such host

Check your scrape config directly:

cat /etc/prometheus/prometheus.yml

A correct static node exporter job looks like this:

scrape_configs:
  - job_name: 'node'
    static_configs:
      - targets:
          - '192.168.10.45:9100'
          - '192.168.10.46:9100'
          - '192.168.10.47:9100'

Common mistakes: the port is listed as

9190

instead of

9100

, the IP has a transposed digit, the host was added to the wrong job block, or the config was edited but Prometheus was never told to reload it. Check whether Prometheus is actually running the config you think it is:

curl -X POST http://192.168.10.10:9090/-/reload

Or check the last reload timestamp in the Prometheus UI under Status > Runtime & Build Information.

How to Fix It

Correct the target entry, validate the file, then reload — always in that order:

promtool check config /etc/prometheus/prometheus.yml
systemctl reload prometheus

Running

promtool check config

before reloading is non-negotiable. It catches syntax errors before they cause Prometheus to reject the entire config and fall back to whatever it had before — or worse, fail to start after a restart.

Root Cause 4: Collector Disabled

Node exporter ships with a large set of collectors — cpu, memory, disk, network, filesystem, and many more. By default, most are enabled, but some aren't. And in some environments I've worked in, teams explicitly disable collectors to reduce metric cardinality or shorten scrape time. If you're looking for a specific metric and it simply isn't there, the collector that exposes it might be disabled — either intentionally or by accident when someone copied a startup config from a different environment.

How to Identify It

Check how node_exporter is being launched and what flags are passed to it:

ps aux | grep node_exporter

Or inspect the systemd unit file directly:

cat /etc/systemd/system/node_exporter.service

You might find something like this:

[Service]
ExecStart=/usr/local/bin/node_exporter \
  --no-collector.wifi \
  --no-collector.nfs \
  --no-collector.xfs \
  --no-collector.cpu

That last flag —

--no-collector.cpu

— explains why

node_cpu_seconds_total

is completely absent. You can also hit the metrics endpoint directly and search for the specific metric:

curl -s http://192.168.10.45:9100/metrics | grep node_cpu_seconds_total

No output confirms the collector is off. To see all currently enabled collectors and their scrape status, look at the

node_scrape_collector_success

metric:

curl -s http://192.168.10.45:9100/metrics | grep node_scrape_collector_success

node_scrape_collector_success{collector="arp"} 1
node_scrape_collector_success{collector="bcache"} 1
node_scrape_collector_success{collector="bonding"} 1
node_scrape_collector_success{collector="conntrack"} 1
node_scrape_collector_success{collector="diskstats"} 1
node_scrape_collector_success{collector="filesystem"} 1
node_scrape_collector_success{collector="meminfo"} 1

cpu

doesn't appear in that list, it was disabled at launch.

How to Fix It

Edit the service file to remove the

--no-collector.cpu

flag (or whichever collector you need to restore), then reload systemd and restart the service:

systemctl daemon-reload
systemctl restart node_exporter

If you need to enable a collector that's off by default, use the

--collector.<name>

flag. For example, to enable the perf collector:

ExecStart=/usr/local/bin/node_exporter --collector.perf

Be deliberate about which collectors you run. Enabling everything in a high-cardinality environment can meaningfully increase scrape duration and Prometheus memory usage. If you're disabling collectors intentionally, document it with a comment in the unit file — undocumented omissions are indistinguishable from bugs.

Root Cause 5: Label Mismatch

This one is subtle and easy to miss because the data is actually being scraped — it's in Prometheus — but your query returns nothing, or Grafana shows blank panels. The reason is a label mismatch: your query is filtering on a label value that doesn't match what's actually attached to the metric in storage.

In my experience, this surfaces most often after someone changes how targets are discovered — switching from static configs to file-based or Consul service discovery, renaming a job, or modifying relabeling rules. The metric is right there in Prometheus. It's just labeled differently than what your dashboard expects.

How to Identify It

Run a broad query in Prometheus without any label filters to see what labels are actually attached to the metric you're looking for:

node_cpu_seconds_total

Look at what comes back. You might see:

node_cpu_seconds_total{cpu="0",instance="192.168.10.45:9100",job="linux_nodes",mode="idle"} 12345.67

Now check your Grafana panel query — it might be filtering on

job="node"

when the actual job label is

job="linux_nodes"

. That single mismatch means zero results, even though all the data is right there. Also check the relabeling rules in your prometheus.yml, which can silently transform or drop labels during scraping:

scrape_configs:
  - job_name: 'linux_nodes'
    relabel_configs:
      - source_labels: [__address__]
        target_label: instance
        regex: '(.+):9100'
        replacement: '$1'

That rule strips the port from the instance label. If your dashboard is querying

instance="192.168.10.45:9100"

, it won't match — the stored label is

instance="192.168.10.45"

. Use the Prometheus UI's Labels explorer to inspect exactly what's stored for a given metric and target.

How to Fix It

You have two options: update your queries and dashboards to match the actual label values, or adjust your relabeling rules to produce the labels your queries expect. In most cases, updating the queries is faster and less risky. In Grafana, go into the panel editor, locate the label filter, and correct it to match what Prometheus actually stores.

For Prometheus alerting rules that rely on specific labels, update them in your rules files and reload:

promtool check rules /etc/prometheus/rules/*.yml
curl -X POST http://192.168.10.10:9090/-/reload

If the mismatch is widespread — say, you renamed a job and now thirty dashboards are broken — you can temporarily add a metric_relabel_config to add a backward-compatible label alias while you migrate. Don't leave that workaround in place indefinitely, though. Fix the root definition and update the consumers systematically.

Root Cause 6: Node Exporter Bound to Loopback Only

The exporter is running and healthy, but it's only listening on

127.0.0.1

— so nothing outside the host can reach it. I've seen this happen when someone copies a startup script from an old tutorial that explicitly binds to loopback for "security," not realizing it breaks remote scraping entirely. It's also a common artifact of installing node_exporter from a distribution package rather than the official release, where the default unit file may bind to localhost.

How to Identify It

Check what address node_exporter is actually listening on:

ss -tlnp | grep 9100

If you see loopback only, the exporter is unreachable from outside the host:

LISTEN 0   128   127.0.0.1:9100   0.0.0.0:*   users:(("node_exporter",pid=4521,fd=3))

What you want instead — binding to all interfaces:

LISTEN 0   128   0.0.0.0:9100   0.0.0.0:*   users:(("node_exporter",pid=4521,fd=3))

How to Fix It

The flag controlling bind address is

--web.listen-address

. Check the service file and update it:

ExecStart=/usr/local/bin/node_exporter --web.listen-address="0.0.0.0:9100"

Reload systemd and restart the service, then confirm the binding changed:

systemctl daemon-reload
systemctl restart node_exporter
ss -tlnp | grep 9100

If your security policy doesn't allow binding to all interfaces, bind to the specific interface that Prometheus uses to reach this host — just not loopback. A host with IP

192.168.10.45

on its primary interface would use

--web.listen-address="192.168.10.45:9100"

Root Cause 7: TLS or Authentication Misconfiguration

Hardened environments often add TLS and basic authentication to node exporter. When the exporter is configured to require these but Prometheus isn't configured with matching credentials or a trusted CA, scrapes fail — either with an explicit HTTP error or a cryptic connection error that doesn't immediately suggest an auth problem.

How to Identify It

On the Prometheus Targets page, check the error column. A missing or wrong credential shows up as:

server returned HTTP status 401 Unauthorized

If TLS is required but Prometheus is still using plain HTTP, you'll often see:

Get "http://192.168.10.45:9100/metrics": EOF

Check whether the exporter has a web config file that enables TLS or auth:

cat /etc/node_exporter/web-config.yml

tls_server_config:
  cert_file: /etc/ssl/node_exporter/node_exporter.crt
  key_file: /etc/ssl/node_exporter/node_exporter.key
basic_auth_users:
  prometheus: $2y$10$X5jVqN8...

If this file exists and is being passed to node_exporter via

--web.config.file

, then Prometheus must be configured to match.

How to Fix It

Update your Prometheus scrape job to include the TLS configuration and credentials:

scrape_configs:
  - job_name: 'node'
    scheme: https
    tls_config:
      ca_file: /etc/prometheus/certs/ca.crt
      insecure_skip_verify: false
    basic_auth:
      username: prometheus
      password_file: /etc/prometheus/node_exporter_password
    static_configs:
      - targets:
          - '192.168.10.45:9100'

Store the password in a file with restricted permissions, not inline in the config:

chmod 600 /etc/prometheus/node_exporter_password
chown prometheus:prometheus /etc/prometheus/node_exporter_password

Validate and reload Prometheus, then confirm the target transitions to UP on the Targets page.

Prevention

Most of these issues are preventable with upfront work. Here's what I keep in place on every environment I manage.

Start with an alerting rule for the

up

metric. If node exporter stops reporting, you want to know within minutes — not when someone opens a dashboard and notices blank panels:

groups:
  - name: node_exporter
    rules:
      - alert: NodeExporterDown
        expr: up{job="node"} == 0
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Node exporter down on {{ $labels.instance }}"
          description: "Node exporter on {{ $labels.instance }} has been unreachable for more than 2 minutes."

Use configuration management — Ansible, Puppet, Salt — to enforce that the node_exporter service is running and enabled. Don't rely on humans to remember to start it after a reboot. A simple Ansible handler with

state: started

and

enabled: yes

handles the whole class of "forgot to restart after kernel update" incidents.

Add firewall rule management to the same configuration management role that deploys node exporter. The rule and the service should live together. If you install the exporter, you open the port. If you remove the exporter, the rule goes with it. Treating them as separate tasks is how you end up with an exporter that starts but can't be scraped.

Validate your prometheus.yml on every change with

promtool check config

before applying it. If you manage your Prometheus config in a git repo — and you should — run this check in CI. Catching a port typo in a target address before it hits production is far better than hunting it down while an alert is firing.

Document disabled collectors in the unit file. A one-line comment explaining why

--no-collector.xfs

is set takes ten seconds to write and saves the next engineer (including future you) from assuming it's a bug. Undocumented intentional omissions are indistinguishable from accidents.

Finally, do a periodic audit of your label structure whenever you modify scrape configs or relabeling rules. Grep your alerting rules and Grafana dashboards for any label values that might be affected by the change. Label drift is one of those problems that compounds silently — dashboards can appear functional while actually showing stale or incomplete data for weeks until someone notices.

Node exporter is stable, mature software. It rarely breaks on its own. When metrics go missing, something in the surrounding environment changed. Work outward from the basics: is the process running, can Prometheus reach it, is Prometheus configured to look for it, and do the labels on what gets scraped match what your queries expect. That covers the overwhelming majority of cases you'll encounter.

Node Exporter Metrics Missing

Symptoms

Root Cause 1: Node Exporter Is Not Running

How to Identify It

How to Fix It

Root Cause 2: Firewall Blocking Port 9100

How to Identify It

How to Fix It

Root Cause 3: Prometheus Scrape Config Is Wrong

How to Identify It

How to Fix It

Root Cause 4: Collector Disabled

How to Identify It

How to Fix It

Root Cause 5: Label Mismatch

How to Identify It

How to Fix It

Root Cause 6: Node Exporter Bound to Loopback Only

How to Identify It

How to Fix It

Root Cause 7: TLS or Authentication Misconfiguration

How to Identify It

How to Fix It

Prevention

Frequently Asked Questions

Why does up{job="node"} return 0 even though node exporter appears to be running?

How do I reload Prometheus after editing prometheus.yml without restarting it?

How can I tell which node exporter collectors are currently enabled?

Why are some node exporter metrics missing after switching from static configs to service discovery?

What is the correct port for node exporter, and can I change it?

Related Articles