Symptoms
NTP sync failures don't always make themselves obvious. The server keeps running, services stay up, and there's no red alarm going off. What you get instead are subtle, maddening side effects: Kerberos tickets that suddenly stop authenticating, TLS handshakes failing because a certificate looks like it's not yet valid, cron jobs drifting out of coordination across a cluster. I've spent more than one evening chasing what looked like an application bug, only to find that one node's clock was forty-five minutes behind the rest of the fleet.
The direct indicators are easier to spot once you know to look. Running
timedatectl statuson a Linux system with a sync problem typically shows something like this:
Local time: Sat 2026-04-18 14:32:10 UTC
Universal time: Sat 2026-04-18 14:32:10 UTC
RTC time: Thu 2026-01-01 10:15:43
Time zone: UTC (UTC, +0000)
System clock synchronized: no
NTP service: inactive
RTC in local TZ: no
Two lines to watch: System clock synchronized: no and NTP service: inactive. Either one alone is a problem. Both together means you're flying without instruments. Let's go through the most common reasons this happens and exactly how to fix each one.
Root Causes and Fixes
1. ntpd (or chronyd) Is Not Running
This is the most common cause I run into, and it's almost always the first thing to check. The NTP daemon was either never started after install, got killed by an OOM event, crashed due to a config error, or was manually stopped during earlier troubleshooting and nobody re-enabled it. Servers in production have a way of accumulating exactly that kind of technical debt.
To identify this, check the status of the relevant service. Modern distributions typically run either
ntpdor
chronyd— some use
systemd-timesyncdas a lightweight alternative. Check all three if you're unsure what's installed:
systemctl status ntpd
systemctl status chronyd
systemctl status systemd-timesyncd
A stopped or failed service looks like this:
● ntpd.service - Network Time Service
Loaded: loaded (/usr/lib/systemd/system/ntpd.service; disabled; vendor preset: disabled)
Active: inactive (dead)
Apr 18 14:30:01 sw-infrarunbook-01 systemd[1]: ntpd.service: Scheduled restart job, restart counter is at 3.
Apr 18 14:30:01 sw-infrarunbook-01 systemd[1]: Stopped Network Time Service.
Note disabled in the Loaded line — that means it won't survive a reboot even if you start it now. Fix both the immediate problem and the persistence issue at the same time:
# For ntpd (Red Hat / Rocky / AlmaLinux)
systemctl enable --now ntpd
# For chronyd (most modern distros)
systemctl enable --now chronyd
# Verify it's running after start
systemctl status chronyd
After starting the service, give it 30–60 seconds and then confirm sync with
chronyc trackingor
ntpq -p. If it immediately fails again, there's likely a config error driving the restart loop. Check
journalctl -u chronyd -n 50for the actual error message before assuming it's anything more exotic than a typo in the config file.
2. Firewall Blocking UDP Port 123
NTP operates over UDP port 123. If your iptables or nftables rules don't permit outbound UDP 123, the daemon will start cleanly, show no errors in the logs, and simply never sync. This is one of the more frustrating failure modes because the service looks perfectly healthy — it just isn't doing anything useful.
In my experience this happens most often on servers that were hardened with a restrictive default-deny outbound policy, or when the server sits behind a corporate firewall that blocks NTP traffic to external pools. The daemon starts, tries to reach the configured NTP servers, gets no response, and sits in an endless retry loop. No alarm, no error message in the logs — just silence.
First, check your local iptables rules for anything that might be blocking UDP 123:
iptables -L OUTPUT -n -v | grep -E "DROP|REJECT|123"
iptables -L FORWARD -n -v | grep -E "DROP|REJECT|123"
If you're using nftables instead:
nft list ruleset | grep -A2 -B2 "123"
For a more definitive test, use
tcpdumpto watch whether NTP packets are actually leaving the network interface while the daemon is running:
tcpdump -i eth0 -n udp port 123
If you see no packets at all — not even outbound attempts — the daemon isn't sending, which points toward a config or service issue rather than a firewall block. If you see outbound UDP 123 packets but no replies, the firewall is eating the responses, either locally or upstream at a network device.
You can also use
ntpdatein debug mode to force a single sync attempt and watch the packet flow in detail:
ntpdate -d 192.168.1.10
The output will show Transmit and Receive events for each packet. Missing Receive lines with no corresponding error mean the packets went out but never came back — classic firewall block signature.
To fix a local iptables block, insert an explicit ACCEPT rule ahead of the DROP:
# Allow outbound NTP
iptables -I OUTPUT -p udp --dport 123 -j ACCEPT
# Allow stateful replies inbound
iptables -I INPUT -p udp --sport 123 -m state --state ESTABLISHED -j ACCEPT
# Persist the rules
iptables-save > /etc/sysconfig/iptables
If the block is at a network firewall and you can't open external UDP 123, the practical fix is to point the local NTP daemon at an internal NTP server that is reachable from this network segment. Something like
server ntp.solvethenetwork.com iburstor an internal RFC 1918 address that's allowed through the perimeter.
3. Wrong NTP Server Configured
This one catches people who provisioned a new server, copied the NTP config from another host, and didn't notice that the configured NTP server is only reachable from a different network segment. Or the internal NTP server's IP changed after a network renumbering. Or someone hand-typed an IP address that has since been decommissioned. The daemon starts fine, polls correctly, and never gets an answer because the servers it's talking to either don't exist or aren't reachable.
Check what's configured in your NTP daemon's config file:
# For chronyd
grep -E "^server|^pool" /etc/chrony.conf
# For ntpd
grep -E "^server|^pool" /etc/ntp.conf
Example output showing an internal server configured:
server 192.168.50.10 iburst
server 192.168.50.11 iburst
Now verify those servers are actually reachable and responding:
ntpdate -q 192.168.50.10
If the server is unreachable you'll get a timeout with no useful output, or this:
18 Apr 14:35:22 ntpdate[12847]: no server suitable for synchronization found
For chronyd, check the real-time status of each configured source with:
chronyc sources -v
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
^? 192.168.50.10 0 6 0 - +0ns[ +0ns] +/- 0ns
^? 192.168.50.11 0 6 0 - +0ns[ +0ns] +/- 0ns
A Reach value of 0 means the source has never responded. The
^?prefix confirms the source is unreachable. For ntpd, the equivalent is
ntpq -p:
remote refid st t when poll reach delay offset jitter
==============================================================================
192.168.50.10 .INIT. 16 u - 64 0 0.000 0.000 0.000
A stratum of 16 and refid of
.INIT.means ntpd has never successfully contacted that server. Fix it by updating the config to point at working NTP sources. For most environments, that means pointing at your internal NTP hierarchy:
# /etc/chrony.conf
server ntp.solvethenetwork.com iburst prefer
pool 2.pool.ntp.org iburst
Then restart the daemon and watch the sources come online:
systemctl restart chronyd
watch -n2 chronyc sources -v
Within a minute or two you should see
^*next to one source, indicating it's selected as the current sync reference. If all sources still show
^?after two minutes, you're back to a network reachability problem rather than a config problem.
4. Large Time Drift Rejected by the NTP Daemon
Both ntpd and chronyd have a built-in safety mechanism: if the system clock is too far off from what the NTP servers report, they refuse to apply the correction automatically. This is intentional. Abruptly stepping the clock on a running production system can corrupt logs, break database transactions, and confuse any process that measures elapsed time internally. The daemons default to slewing (gradually adjusting) the clock at a rate of about 0.5ms per second, and they won't attempt to slew a massive offset — it would take days.
For ntpd, the default panic threshold is 1000 seconds. Exceed that and ntpd logs an error and exits rather than applying the correction. For chronyd, the default behavior on a running system is to refuse any step larger than 1 second unless explicitly configured to allow it. This means a VM that was suspended for several hours and resumed will be stuck out of sync until someone intervenes manually.
When this happens, you'll see it in the system journal:
Apr 18 14:20:05 sw-infrarunbook-01 ntpd[3421]: time correction of 1847 seconds exceeds sanity limit (1000); set clock manually or use iburst/tinker panic 0 to override
Or for chronyd refusing to track a large offset:
Apr 18 14:20:05 sw-infrarunbook-01 chronyd[2981]: System clock wrong by 1847.231456 seconds, not synchronising
Check the current offset directly:
chronyc tracking | grep "System time"
System time : 1847.231456193 seconds slow of NTP time
Nearly thirty minutes of drift. The daemon won't touch that automatically. The fix is to manually step the clock first, then let the daemon take over for ongoing discipline. If chronyd is already running, the cleanest way is:
chronyc makestep
That forces an immediate step correction regardless of the offset magnitude. If you're using ntpd, stop it first, use ntpdate to step, then restart:
systemctl stop ntpd
ntpdate -b 192.168.1.10
systemctl start ntpd
To prevent this from happening again on systems that might experience large offsets (VMs, containers, anything that gets suspended and resumed), add the
makestepdirective to your chrony config:
# /etc/chrony.conf
# Step clock on any update, no limit on times (-1)
makestep 1.0 -1
The
-1means step is allowed on every update cycle, not just the first N. This is appropriate for VMs. For physical servers running databases or other time-sensitive workloads where abrupt steps are genuinely disruptive, use
makestep 1.0 3instead — step is only allowed on the first three updates after daemon start, which covers the boot-time catch-up case without permitting runtime steps.
5. Hardware Clock (RTC) Is Wrong
The hardware clock — the Real Time Clock (RTC) — is the battery-backed clock on the motherboard that keeps time when the system is powered off. On boot, the kernel reads the RTC to initialize the system clock before NTP kicks in. If the RTC is significantly wrong, the system starts life far off from real time, and depending on your NTP configuration, the daemon might refuse to correct that large an offset (see above).
I've seen this happen in two main scenarios. First, the CMOS battery is dead or dying. On bare-metal servers that are five or more years old, this is not rare — the battery keeps the RTC running at standby power, and when it fails, the clock either freezes or resets to the epoch on next power cycle. Second, on virtual machines, the hypervisor manages the RTC, and after VM migrations or snapshot restores the guest RTC can be out of sync with the host by hours or more.
To check the hardware clock directly:
hwclock --show --verbose
hwclock from util-linux 2.37.4
System Time: 1745000000.123456
Using the rtc interface to the clock.
Hardware clock is on UTC time
Assuming hardware clock is kept in UTC.
Time read from Hardware Clock: 2026/01/01 10:15:43
2026-01-01 10:15:43.236401+00:00
Compare that RTC timestamp against actual current time. If they're hours or months apart, the RTC is the problem. The
timedatectl statusoutput makes this even easier to spot, because it shows system time and RTC time side by side:
Local time: Sat 2026-04-18 14:32:10 UTC
Universal time: Sat 2026-04-18 14:32:10 UTC
RTC time: Thu 2026-01-01 10:15:43
Time zone: UTC (UTC, +0000)
System clock synchronized: no
NTP service: active
RTC in local TZ: no
The NTP service is active but the RTC is showing January 1st while the real date is April 18th. That gap is why things are broken. The fix is to get the system clock right first, then sync it back to the hardware clock:
# Step the system clock from NTP
ntpdate -b 192.168.1.10
# Or with chronyd
chronyc makestep
# Write corrected system time to the hardware clock
hwclock --systohc
# Verify RTC now matches system time
hwclock --show
On a VM, check whether the hypervisor provides time synchronization tools. VMware uses open-vm-tools, KVM guests can use the QEMU guest agent, and cloud instances on AWS and GCP have their own platform time sync drivers. If the hypervisor sync is available and enabled, it's more reliable than NTP on a guest OS because it survives suspend/resume and live migration events. Make sure it's enabled as part of your VM template.
On physical hardware with a confirmed dead CMOS battery, the manual sync with
hwclock --systohcwill survive reboots as long as AC power is maintained, but the moment you pull the power cord the RTC will reset again. Replace the battery before any planned power cycle. A CR2032 in a server that goes down for maintenance and comes back up showing the year 2000 is not an incident you want to explain to anyone.
6. Conflicting Time Sync Services
This is increasingly common now that
systemd-timesyncdships enabled by default on most distributions. If you install ntpd or chronyd without explicitly disabling timesyncd, you can end up with two daemons competing over the same clock. The result is either one daemon constantly overriding the other's corrections, unpredictable sync behavior, or one service failing silently because the other has already bound the socket it needs.
Check whether multiple time services are active simultaneously:
systemctl is-active systemd-timesyncd ntpd chronyd 2>/dev/null
active
active
inactive
Both
systemd-timesyncdand
ntpdshowing active at the same time is the problem. Pick one daemon and disable the other. For production servers that need proper NTP discipline — multiple sources, stratum awareness, offset tracking, and monitoring integration — chronyd wins every time over the minimal systemd-timesyncd. Disable and mask timesyncd to prevent it from being pulled back in as a dependency:
systemctl disable --now systemd-timesyncd
systemctl mask systemd-timesyncd
systemctl enable --now chronyd
# Confirm only chronyd is active
systemctl is-active systemd-timesyncd chronyd
Masking is the important step here. Without it, certain package operations or other services that list timesyncd as a soft dependency can re-enable it without you noticing, and you're back to the conflict.
Prevention
Most NTP sync issues are preventable with a few consistent practices built into your provisioning and monitoring workflows.
Start with monitoring. Don't wait for symptoms — add an NTP offset check to your monitoring stack from day one. Nagios and Icinga both have
check_ntp_timeplugins that alert when offset exceeds a configurable threshold. For Prometheus environments, the
node_exporterexposes NTP metrics via the
timexcollector. Set alerts on two conditions:
node_timex_sync_status == 0(sync lost) and
abs(node_timex_offset_seconds) > 0.5(offset too large). Tighten the offset threshold to 0.1 seconds for anything running Kerberos, distributed databases, or financial workloads.
Use
iburston all configured NTP sources. Without it, a freshly started daemon waits several full poll intervals before achieving sync, which means the system can run unsynced for several minutes after boot. With
iburst, the daemon sends a burst of packets immediately on start and typically achieves sync within a few seconds:
# /etc/chrony.conf
pool 2.pool.ntp.org iburst
server ntp.solvethenetwork.com iburst prefer
If you manage an environment with more than a handful of Linux servers, run an internal NTP hierarchy rather than pointing every host directly at public pools. Set up two or three internal NTP servers at stratum 2 that synchronize against public stratum 1 sources, then point all other hosts at those internal servers. This reduces your dependency on external network reachability, gives you a single place to monitor and control, and keeps your NTP traffic off the public internet.
Audit your firewall rules explicitly for UDP 123 and document it in your runbook as a required outbound port for any server that needs NTP. This prevents the silent failure mode where the daemon starts cleanly but can't reach anything. Include a post-provision connectivity test —
ntpdate -q <ntp-server>from the new host — as part of your automated provisioning checklist.
For VMs and containers, add
makestep 1.0 -1to your chrony.conf template. It doesn't hurt anything on a stable VM, and it saves you from a post-resume or post-migration sync failure that would otherwise require manual intervention at an inconvenient moment.
Finally, audit your physical servers' CMOS battery status during regular hardware maintenance windows. There's no great automated way to catch a failing CMOS battery before it causes a problem, but including a quick
hwclock --showand comparing it to system time in your quarterly server checklist will catch drift early, before the battery fully fails and leaves you with a server booting to 1970.
NTP sync problems are almost never hard to fix once you've found them. The challenge is finding them before they cause damage. Build the visibility in, set the alerts, and you'll spend a lot less time playing clock detective at 2am.
