Symptoms
You open a Grafana dashboard and instead of graphs, you get nothing. Empty panels stare back at you — some show "No data," others spin indefinitely, and a few throw a red error banner across the top. The time range picker looks right, the datasource appears configured, yet nothing renders. Sometimes one panel loads and the rest don't. Sometimes the whole dashboard is blank from the moment it opens.
In my experience, this is one of those problems that has a dozen different causes but presents almost identically every time. The surface symptom — no data — could mean anything from a misconfigured Prometheus URL to an expired API token to a PromQL query that's technically valid but returns nothing for the selected window. You have to dig methodically.
Before you start hunting, open your browser's developer tools (F12 → Network tab) and keep it visible. A lot of the diagnosis below depends on what HTTP status codes Grafana is actually receiving. Also tail Grafana's server log in a terminal — on most Linux installs that's
journalctl -u grafana-server -for
/var/log/grafana/grafana.log. Keep both open as you work through each cause below.
Root Cause 1: Datasource Connection Failed
Why It Happens
This is the most common cause I encounter. Grafana simply can't reach the datasource backend — whether that's Prometheus, InfluxDB, Loki, or something else. It happens when the datasource URL is wrong, the service is down, or a firewall rule was quietly added that blocks the connection. It also surfaces after infrastructure migrations where a Prometheus instance moved to a new IP or port and nobody updated Grafana's datasource config to match.
How to Identify It
Go to Configuration → Data sources, select the relevant datasource, and click Save & Test. If the connection is broken you'll see an error like this:
Post "http://192.168.10.45:9090/api/v1/query": dial tcp 192.168.10.45:9090: connect: connection refusedOr if DNS resolution is the issue:
Post "http://prometheus.solvethenetwork.com:9090/api/v1/query": dial tcp: lookup prometheus.solvethenetwork.com: no such hostFrom the Grafana server itself, test connectivity directly with curl:
curl -v http://192.168.10.45:9090/api/v1/query?query=upIf that returns a connection refused or times out, the problem is at the network or service layer, not in Grafana's configuration. Go check the Prometheus host:
ssh infrarunbook-admin@192.168.10.45
systemctl status prometheus
● prometheus.service - Prometheus
Loaded: loaded (/etc/systemd/system/prometheus.service; enabled)
Active: inactive (dead)
systemctl start prometheus
systemctl status prometheus
● prometheus.service - Prometheus
Active: active (running) since Tue 2026-04-15 09:14:32 UTC; 4s agoHow to Fix It
First confirm the service is actually running. Then verify the URL in Grafana's datasource config is exact — protocol, hostname or IP, port, and any subpath. If Prometheus sits behind a reverse proxy at a path like
/prometheus, that path must appear in the URL. After correcting it, click Save & Test and confirm the green success banner appears.
If the service is running but curl still fails from Grafana's host, a firewall rule is likely blocking port 9090. Check it on the Prometheus host:
iptables -L INPUT -n -v | grep 9090
# Or with firewalld:
firewall-cmd --list-allAdd an allow rule if needed:
firewall-cmd --permanent --add-rich-rule='rule family="ipv4" source address="192.168.10.50/32" port protocol="tcp" port="9090" accept'
firewall-cmd --reloadRoot Cause 2: PromQL Query Wrong
Why It Happens
This one is sneaky because Grafana often won't show you a hard error — the panel just displays "No data" while the query silently returns an empty result set. It happens when a metric name changes after a Prometheus exporter upgrade, when a label selector is too narrow and matches nothing, or when someone uses a function that doesn't suit the metric type. Dashboards built against one version of an exporter quietly break when that exporter is upgraded and renames its metrics.
How to Identify It
Open the panel editor and look at the raw query. Then go test it directly in Prometheus's expression browser at
http://192.168.10.45:9090/graph. If the query returns nothing there either, the query itself is the problem — not the connection.
A classic example: a dashboard queries
node_cpu_seconds_totalbut after a node_exporter upgrade the metric is now
node_cpu_usage_seconds_total. The old name simply doesn't exist anymore:
curl -s 'http://192.168.10.45:9090/api/v1/query?query=node_cpu_seconds_total' | python3 -m json.tool
{
"status": "success",
"data": {
"resultType": "vector",
"result": []
}
}Empty result. Confirm what's actually available:
curl -s 'http://192.168.10.45:9090/api/v1/label/__name__/values' | python3 -m json.tool | grep node_cpu
"node_cpu_usage_seconds_total"
"node_cpu_guest_seconds_total"Label mismatches are equally common and equally silent:
# Panel queries job="node_exporter" but the actual label value is "node"
curl -s 'http://192.168.10.45:9090/api/v1/query?query=up%7Bjob%3D%22node_exporter%22%7D'
# Returns: result: []
curl -s 'http://192.168.10.45:9090/api/v1/query?query=up%7Bjob%3D%22node%22%7D'
# Returns actual time series dataHow to Fix It
Correct the metric name or label selectors in the panel query. In Grafana's panel editor, use the metric browser dropdown to autocomplete metric names and inspect available labels before committing to a query string. If your dashboards are stored in version control — and they should be — update the exported JSON and redeploy through provisioning. For teams that regularly upgrade exporters, checking the exporter's changelog before upgrading in production takes 5 minutes and prevents hours of dashboard debugging.
Root Cause 3: Time Range Mismatch
Why It Happens
Grafana queries data for a specific time range. If there's no data in Prometheus for that range — because the metric just started being scraped an hour ago, because someone set the dashboard to look back 90 days when retention is only 15, or because the system clock on the Grafana or Prometheus host has drifted — you'll get empty panels with no obvious error message pointing you at the real cause.
I've seen this catch teams off guard after a server migration. The new Prometheus instance starts fresh with zero historical data. Someone opens the dashboard with a "Last 7 days" window selected. There's data for the last 20 minutes and nothing before that. They assume the whole dashboard is broken.
How to Identify It
Check the time range selected in Grafana first — make sure it's reasonable relative to when data collection started. Then check how far back Prometheus actually retains data:
systemctl show prometheus | grep ExecStart
ExecStart=/usr/local/bin/prometheus \
--config.file=/etc/prometheus/prometheus.yml \
--storage.tsdb.path=/var/lib/prometheus \
--storage.tsdb.retention.time=15dClock skew between hosts is the less obvious variant of this problem. Check NTP status on both the Grafana and Prometheus hosts:
# On sw-infrarunbook-01 (Grafana host)
timedatectl status
Local time: Tue 2026-04-15 09:22:11 UTC
Universal time: Tue 2026-04-15 09:22:11 UTC
System clock synchronized: yes
NTP service: active
# On 192.168.10.45 (Prometheus host)
ssh infrarunbook-admin@192.168.10.45 timedatectl status
System clock synchronized: no
NTP service: inactiveA clock skew of more than a few seconds causes Grafana's query window to be misaligned with the timestamps stored in Prometheus. Queries return empty because the time ranges don't overlap correctly.
How to Fix It
If it's a data retention issue, narrow the time range picker to a window within your configured retention period. If data is genuinely missing before a certain date because Prometheus was just stood up, that's expected — there's nothing to do except wait for data to accumulate.
If it's clock skew, fix NTP on every host in your monitoring stack:
ssh infrarunbook-admin@192.168.10.45
systemctl enable --now systemd-timesyncd
timedatectl set-ntp true
timedatectl timesync-status
Server: 162.159.200.123 (time.cloudflare.com)
Poll interval: 32s (min: 32s; max: 34min 8s)
Stratum: 3
Offset: +1.427ms
Delay: 9.716msRoot Cause 4: CORS Error from Datasource
Why It Happens
Cross-Origin Resource Sharing errors happen when Grafana's frontend JavaScript tries to directly query a datasource from the browser, but the datasource server doesn't include the required
Access-Control-Allow-Originheaders in its HTTP response. Modern browsers enforce CORS strictly — they'll block the request entirely and the panel gets no data, often with no visible error in Grafana's UI itself.
This typically surfaces when a datasource is configured with Browser access mode instead of Server access mode. In Server mode, Grafana's backend process makes the query and proxies the result to the browser — no CORS involvement. In Browser mode, the JavaScript in the user's browser makes the request directly to the datasource URL, which triggers full CORS enforcement. It's a configuration choice that's easy to get wrong when first setting up a datasource.
How to Identify It
Open your browser's developer tools (F12), go to the Console tab, and look for errors like this:
Access to XMLHttpRequest at 'http://192.168.10.45:9090/api/v1/query_range'
from origin 'http://192.168.10.50:3000' has been blocked by CORS policy:
No 'Access-Control-Allow-Origin' header is present on the requested resource.In the Network tab you'll see the preflight OPTIONS request either returning a non-200 status or a 200 with no CORS headers in the response:
Request URL: http://192.168.10.45:9090/api/v1/query_range
Request Method: OPTIONS
Status Code: 200 OK
Response Headers:
Content-Type: text/plain; charset=utf-8
Content-Length: 0
# No Access-Control-Allow-Origin present — browser blocks the follow-up requestHow to Fix It
The cleanest fix is switching the datasource access mode from Browser to Server. In Grafana, go to Configuration → Data sources → [your datasource], find the Access dropdown, change it to Server (default), and save. This routes all queries through Grafana's backend proxy and eliminates the CORS problem completely. Don't overthink it — this is almost always the right answer.
If you have an unusual requirement where browser-mode access is genuinely necessary, you'll need to configure the datasource to emit correct CORS headers. For Prometheus, the cleanest approach is to put nginx in front of it and add the headers at the proxy layer:
# /etc/nginx/conf.d/prometheus-cors.conf
server {
listen 9091;
location / {
proxy_pass http://127.0.0.1:9090;
add_header 'Access-Control-Allow-Origin' 'http://192.168.10.50:3000' always;
add_header 'Access-Control-Allow-Methods' 'GET, POST, OPTIONS' always;
add_header 'Access-Control-Allow-Headers' 'Authorization, Content-Type' always;
if ($request_method = 'OPTIONS') {
add_header 'Access-Control-Allow-Origin' 'http://192.168.10.50:3000';
add_header 'Access-Control-Max-Age' 1728000;
add_header 'Content-Length' 0;
return 204;
}
}
}Then update the Grafana datasource URL to point at port 9091 instead of 9090. That said — use Server mode. Browser mode is a footgun in practically every scenario and offers no meaningful benefit for typical deployments.
Root Cause 5: Authentication Expired
Why It Happens
Many datasources require authentication — API keys, bearer tokens, basic auth credentials, or OAuth tokens. When these expire or get rotated without anyone updating Grafana's datasource configuration, every query fails with a 401 or 403 and panels go dark simultaneously. This is especially common with cloud-hosted datasources where tokens have short lifetimes, or in environments where a secrets rotation policy runs automatically and the Grafana config isn't part of the rotation workflow.
It's also common after team changes. Someone who managed the credentials leaves, the API key was stored only in Grafana's UI, and eventually the key gets revoked as part of offboarding cleanup. Nobody notices until dashboards stop working.
How to Identify It
Check Grafana's server log for 401 or 403 responses on datasource proxy requests:
journalctl -u grafana-server --since "1 hour ago" | grep -i "401\|403\|unauthorized\|forbidden"
Apr 15 09:31:14 sw-infrarunbook-01 grafana-server[1423]: logger=data-proxy-log userId=1 orgId=1 uname=infrarunbook-admin path=/api/datasources/proxy/1/api/v1/query_range statusCode=401 duration=142msOpen the panel inspector by clicking the panel title → Inspect → Query, and look at the raw response returned from the datasource:
{
"status": "error",
"errorType": "bad_data",
"error": "401 Unauthorized: invalid or expired API key"
}You can also test the current credentials manually from the command line to confirm they're the issue:
# Test the bearer token currently configured in the Grafana datasource
curl -v -H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9..." \
http://192.168.10.45:9090/api/v1/query?query=up
# If auth is expired or invalid, you'll see:
< HTTP/1.1 401 Unauthorized
{"error": "token expired or invalid"}How to Fix It
Generate fresh credentials from the datasource side first, then update Grafana. For a Prometheus datasource using basic auth, you can update it via Grafana's REST API without touching the UI:
curl -X PUT \
-H "Content-Type: application/json" \
-u infrarunbook-admin:currentadminpassword \
http://192.168.10.50:3000/api/datasources/1 \
-d '{
"id": 1,
"orgId": 1,
"name": "Prometheus",
"type": "prometheus",
"url": "http://192.168.10.45:9090",
"access": "proxy",
"basicAuth": true,
"basicAuthUser": "prometheus-reader",
"secureJsonData": {
"basicAuthPassword": "new-rotated-password-here"
}
}'After updating, run Save & Test in the Grafana UI to confirm the new credentials are accepted. For teams managing Grafana-as-code with Terraform or Grafana's built-in provisioning system, the secret should come from a vault solution (HashiCorp Vault, AWS Secrets Manager, etc.) and be injected at deploy time rather than hardcoded anywhere.
Root Cause 6: Panel Query Timeout
Why It Happens
Sometimes the datasource is perfectly reachable and the query is syntactically valid, but it's too expensive for the configured timeout. A PromQL query scanning millions of time series over a 30-day range will hit Grafana's HTTP timeout before Prometheus finishes evaluating it. The panel shows "No data" or a generic error, while Prometheus is quietly hammered in the background. I've seen this happen after someone copies a dashboard built for a small environment and runs it against a production cluster with 50x the cardinality.
How to Identify It
Look for timeout messages in Grafana's log:
journalctl -u grafana-server | grep -i "timeout\|context deadline"
Apr 15 09:45:02 sw-infrarunbook-01 grafana-server[1423]: logger=tsdb.prometheus error="Post \"http://192.168.10.45:9090/api/v1/query_range\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"Check Prometheus's query duration metrics to see how expensive your queries actually are:
curl -s 'http://192.168.10.45:9090/api/v1/query?query=prometheus_engine_query_duration_seconds%7Bquantile%3D%220.9%22%7D' | python3 -m json.toolHow to Fix It
The right fix is optimizing the query, not just raising the timeout. Move expensive aggregations to Prometheus recording rules so the result is pre-computed at scrape time rather than calculated on demand:
# /etc/prometheus/rules/recording_rules.yml
groups:
- name: node_aggregations
interval: 1m
rules:
- record: job:node_cpu_usage:avg1m
expr: avg by (job, instance) (rate(node_cpu_usage_seconds_total{mode!="idle"}[1m]))Then update the panel to query
job:node_cpu_usage:avg1minstead of the raw metric. Grafana's datasource settings also let you increase the HTTP timeout under HTTP Settings → Timeout as a short-term measure, but fix the query — a slow query that doesn't time out is still a slow query that degrades Prometheus for everyone.
Root Cause 7: Grafana Provisioning Misconfiguration
Why It Happens
If your Grafana instance uses file-based provisioning — datasource definitions in
/etc/grafana/provisioning/datasources/— a YAML syntax error or wrong field name in the provisioning file can create a silently broken datasource. Grafana starts without errors but the datasource URL or credentials are wrong because the provisioning file wasn't parsed as intended. This is easy to introduce when making manual edits to provisioning files in a hurry.
How to Identify It
cat /etc/grafana/provisioning/datasources/prometheus.yaml
# Check Grafana startup logs for provisioning parse errors
journalctl -u grafana-server -b | grep -i "provision\|error"
Apr 15 08:01:14 sw-infrarunbook-01 grafana-server[1423]: logger=provisioning.datasources level=error msg="Failed to load datasource" error="invalid url: must be absolute"How to Fix It
Validate the YAML before restarting Grafana, then restart and watch the logs:
python3 -c "import yaml; yaml.safe_load(open('/etc/grafana/provisioning/datasources/prometheus.yaml'))" && echo "YAML OK"
systemctl restart grafana-server
journalctl -u grafana-server -fIf you're managing provisioning files through a config management system like Ansible, add a YAML lint step to your playbook before the restart handler fires. Catching the syntax error in CI is better than catching it on a production Grafana host at midnight.
Prevention
Most of these issues are preventable with straightforward infrastructure hygiene. Here's what I'd put in place on any production Grafana deployment.
Monitor Grafana itself. Add a Prometheus scrape job for Grafana's own metrics endpoint at
/metrics. Watch the
grafana_datasource_request_totalcounter broken down by status code. A spike in non-2xx responses is an early warning sign that a datasource is degrading — you want to catch this before users open dashboards and start filing tickets.
Alert on datasource failure rates. Configure a Prometheus alerting rule or a Grafana alert that fires when datasource success rate drops below a threshold over a 5-minute window. Being woken up at 2am by an alert is annoying. Being woken up at 9am by your entire ops team simultaneously is much worse.
Document and calendar credential expiry dates. Keep a simple table — datasource name, credential type, expiry date, owner. Set a reminder two weeks before expiry. This one habit eliminates the entire authentication expired class of failures. If you're using a secrets manager, configure automatic rotation and wire Grafana's provisioning to pull the rotated secret at deploy time.
Default to Server access mode. Never configure a production datasource in Browser mode. It's more fragile, CORS-sensitive, and it bypasses Grafana's proxy layer where useful logging and auth handling live. There's almost no scenario where Browser mode is worth the trade-off.
Enforce NTP on every host in the monitoring stack. Every node — Grafana, Prometheus, and exporters — needs to be syncing time consistently. A 30-second clock skew between Grafana and Prometheus produces subtle, maddening query alignment errors. Add NTP enforcement to your base configuration management role and apply it universally.
Test datasources after any infrastructure change. Any time you rotate a secret, upgrade an exporter, change a firewall rule, or migrate a service, run a Grafana datasource connection test immediately afterward. It takes 10 seconds and it's trivially easy. Don't wait for the next person to open a dashboard to find out something broke.
Version-control your dashboards and provisioning files. Store Grafana JSON dashboard exports and provisioning YAML in git alongside your infrastructure code. When a PromQL query breaks after an exporter upgrade, you can diff the metric names against the previous version and trace exactly what changed. Rollbacks become a one-liner.
Grafana dashboard problems almost always have a clear root cause if you know where to look. The browser console, Grafana's server log, and a direct curl to the datasource API will resolve the vast majority of cases in under 10 minutes. Resist the impulse to restart Grafana first — read the logs before you reach for systemctl. The answer is almost always already in the output waiting for you.
