Introduction
Running a single HAProxy instance creates a critical single point of failure. If that node goes down, every service behind it becomes unreachable. Keepalived solves this by implementing the Virtual Router Redundancy Protocol (VRRP), allowing two or more HAProxy nodes to share a floating Virtual IP (VIP). When the active node fails, Keepalived promotes a standby within milliseconds — completely transparent to clients.
This guide walks through every step: installing Keepalived, configuring active-passive failover, extending to active-active with dual VIPs, writing custom health-check scripts, tuning advertisement intervals, preventing split-brain, and verifying the entire setup in production.
Architecture Overview
Active-Passive (Single VIP)
One HAProxy node owns the VIP and serves all traffic. The backup node monitors VRRP advertisements. If advertisements stop, the backup assumes the VIP.
┌─────────────────┐
Clients ──────►│ VIP 10.20.30.100 │
└────────┬────────┘
│
┌───────────┴───────────┐
│ │
┌──────┴──────┐ ┌──────┴──────┐
│ lb-infrarunbook-01 │ lb-infrarunbook-02
│ MASTER │ │ BACKUP │
│ 10.20.30.11 │ │ 10.20.30.12 │
└─────────────┘ └─────────────┘
Active-Active (Dual VIPs)
Each node is MASTER for one VIP and BACKUP for the other. DNS round-robin or an upstream router distributes traffic across both VIPs. If one node fails, the survivor owns both VIPs.
DNS: lb.solvethenetwork.com → 10.20.30.100, 10.20.30.101
VIP-A 10.20.30.100 VIP-B 10.20.30.101
MASTER: lb-infrarunbook-01 MASTER: lb-infrarunbook-02
BACKUP: lb-infrarunbook-02 BACKUP: lb-infrarunbook-01
Prerequisites
- Two servers (physical or VM) running Debian 12 / Ubuntu 22.04 / RHEL 9 / AlmaLinux 9
- HAProxy 2.8+ already installed and configured identically on both nodes
- Nodes on the same Layer-2 broadcast domain (required for VRRP multicast or unicast)
- A free IP in the subnet for the VIP (10.20.30.100 in our examples)
- Root or sudo access on both nodes
Step 1 — Install Keepalived
Debian / Ubuntu
sudo apt update
sudo apt install -y keepalived
sudo systemctl enable keepalived
RHEL / AlmaLinux / Rocky
sudo dnf install -y keepalived
sudo systemctl enable keepalived
Step 2 — Allow Non-Local IP Binding
HAProxy must be able to bind to the VIP before it is assigned to the interface. Enable
net.ipv4.ip_nonlocal_bindon both nodes:
cat <<'EOF' | sudo tee /etc/sysctl.d/99-haproxy-vip.conf
net.ipv4.ip_nonlocal_bind = 1
net.ipv4.ip_forward = 0
EOF
sudo sysctl --system
Verify:
sysctl net.ipv4.ip_nonlocal_bind
# net.ipv4.ip_nonlocal_bind = 1
Step 3 — Configure HAProxy to Bind to the VIP
On both nodes, update the frontend to bind to the VIP address:
frontend ft_web
bind 10.20.30.100:80
bind 10.20.30.100:443 ssl crt /etc/haproxy/certs/solvethenetwork.com.pem alpn h2,http/1.1
default_backend bk_web_servers
backend bk_web_servers
balance roundrobin
option httpchk GET /healthz
http-check expect status 200
server web01 10.20.30.21:8080 check inter 3s fall 3 rise 2
server web02 10.20.30.22:8080 check inter 3s fall 3 rise 2
server web03 10.20.30.23:8080 check inter 3s fall 3 rise 2
Reload HAProxy on both nodes:
sudo haproxy -c -f /etc/haproxy/haproxy.cfg
sudo systemctl reload haproxy
Step 4 — Configure Keepalived: Active-Passive (Single VIP)
MASTER — lb-infrarunbook-01 (10.20.30.11)
cat <<'EOF' | sudo tee /etc/keepalived/keepalived.conf
global_defs {
router_id lb-infrarunbook-01
script_user root
enable_script_security
vrrp_garp_master_delay 5
vrrp_garp_master_repeat 3
}
vrrp_script chk_haproxy {
script "/usr/bin/killall -0 haproxy"
interval 2
weight -30
fall 3
rise 2
}
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 110
advert_int 1
nopreempt
authentication {
auth_type PASS
auth_pass Infra$2026!
}
unicast_src_ip 10.20.30.11
unicast_peer {
10.20.30.12
}
virtual_ipaddress {
10.20.30.100/24 dev eth0 label eth0:vip
}
track_script {
chk_haproxy
}
notify_master "/etc/keepalived/notify.sh MASTER"
notify_backup "/etc/keepalived/notify.sh BACKUP"
notify_fault "/etc/keepalived/notify.sh FAULT"
}
EOF
BACKUP — lb-infrarunbook-02 (10.20.30.12)
cat <<'EOF' | sudo tee /etc/keepalived/keepalived.conf
global_defs {
router_id lb-infrarunbook-02
script_user root
enable_script_security
vrrp_garp_master_delay 5
vrrp_garp_master_repeat 3
}
vrrp_script chk_haproxy {
script "/usr/bin/killall -0 haproxy"
interval 2
weight -30
fall 3
rise 2
}
vrrp_instance VI_1 {
state BACKUP
interface eth0
virtual_router_id 51
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass Infra$2026!
}
unicast_src_ip 10.20.30.12
unicast_peer {
10.20.30.11
}
virtual_ipaddress {
10.20.30.100/24 dev eth0 label eth0:vip
}
track_script {
chk_haproxy
}
notify_master "/etc/keepalived/notify.sh MASTER"
notify_backup "/etc/keepalived/notify.sh BACKUP"
notify_fault "/etc/keepalived/notify.sh FAULT"
}
EOF
Key Parameters Explained
- virtual_router_id 51 — Must be identical on both nodes and unique per VRRP group on the LAN.
- priority 110 vs 100 — Higher priority wins the election. The MASTER starts at 110.
- nopreempt — On the MASTER, prevents automatic failback. Useful to avoid flapping. Only effective when
state BACKUP
is NOT set (Keepalived treats both as BACKUP internally when nopreempt is used; the higher priority node wins initially). - weight -30 — If
chk_haproxy
fails, the node's effective priority drops by 30 (110 → 80), causing failover to the BACKUP (priority 100). - unicast_peer — Uses unicast instead of multicast (224.0.0.18), which is required in many cloud/virtual environments.
Step 5 — Health-Check Script (Advanced)
The simple
killall -0 haproxycheck only verifies the process is alive. A more robust check verifies HAProxy is actually serving traffic via the stats socket or a test request:
cat <<'SCRIPT' | sudo tee /etc/keepalived/check_haproxy.sh
#!/bin/bash
# Check 1: Process alive
/usr/bin/killall -0 haproxy 2>/dev/null || exit 1
# Check 2: Stats socket responds
echo "show info" | socat stdio /run/haproxy/admin.sock 2>/dev/null | grep -q "Uptime" || exit 1
# Check 3: VIP port accepting connections
/usr/bin/timeout 2 bash -c 'echo > /dev/tcp/10.20.30.100/80' 2>/dev/null || exit 1
exit 0
SCRIPT
sudo chmod 755 /etc/keepalived/check_haproxy.sh
Update the
vrrp_scriptblock on both nodes:
vrrp_script chk_haproxy {
script "/etc/keepalived/check_haproxy.sh"
interval 2
timeout 5
weight -30
fall 3
rise 2
}
Step 6 — Notification Script
Create a notification script to log state transitions and optionally send alerts:
cat <<'SCRIPT' | sudo tee /etc/keepalived/notify.sh
#!/bin/bash
STATE=$1
HOSTNAME=$(hostname)
TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')
logger -t keepalived-notify "$HOSTNAME transitioned to $STATE at $TIMESTAMP"
case $STATE in
MASTER)
# Send alert — example using curl to a webhook
curl -sf -X POST https://hooks.solvethenetwork.com/alerts \
-H 'Content-Type: application/json' \
-d "{\"text\": \"[$TIMESTAMP] $HOSTNAME became VRRP MASTER\"}" \
>/dev/null 2>&1 || true
;;
BACKUP)
curl -sf -X POST https://hooks.solvethenetwork.com/alerts \
-H 'Content-Type: application/json' \
-d "{\"text\": \"[$TIMESTAMP] $HOSTNAME became VRRP BACKUP\"}" \
>/dev/null 2>&1 || true
;;
FAULT)
curl -sf -X POST https://hooks.solvethenetwork.com/alerts \
-H 'Content-Type: application/json' \
-d "{\"text\": \"[$TIMESTAMP] $HOSTNAME entered FAULT state\"}" \
>/dev/null 2>&1 || true
;;
esac
exit 0
SCRIPT
sudo chmod 755 /etc/keepalived/notify.sh
Step 7 — Firewall Rules for VRRP
VRRP uses IP protocol 112. If you use unicast, also ensure the peer IP can reach the node on that protocol.
iptables
sudo iptables -A INPUT -p vrrp -s 10.20.30.11 -j ACCEPT
sudo iptables -A INPUT -p vrrp -s 10.20.30.12 -j ACCEPT
sudo iptables -A INPUT -d 224.0.0.18/32 -j ACCEPT
firewalld (RHEL/AlmaLinux)
sudo firewall-cmd --permanent --add-rich-rule='rule protocol value="vrrp" accept'
sudo firewall-cmd --permanent --add-port=80/tcp --add-port=443/tcp
sudo firewall-cmd --reload
nftables
sudo nft add rule inet filter input ip protocol vrrp accept
Step 8 — Start and Verify
Start Keepalived on both nodes:
# On lb-infrarunbook-01
sudo systemctl start keepalived
# On lb-infrarunbook-02
sudo systemctl start keepalived
Verify VIP ownership on the MASTER
ip addr show eth0
# Look for:
# inet 10.20.30.100/24 scope global secondary eth0:vip
Check Keepalived logs
sudo journalctl -u keepalived -f --no-pager
# Expected on MASTER:
# Keepalived_vrrp[12345]: (VI_1) Entering MASTER STATE
Test failover
# On lb-infrarunbook-01, stop HAProxy to trigger health-check failure
sudo systemctl stop haproxy
# Watch lb-infrarunbook-02 acquire the VIP
sudo journalctl -u keepalived -f --no-pager
# Expected: (VI_1) Entering MASTER STATE
# Verify VIP moved
ip addr show eth0 # on lb-infrarunbook-02
# Restore HAProxy on lb-infrarunbook-01
sudo systemctl start haproxy
# With nopreempt, lb-infrarunbook-02 stays MASTER until manually switched or restarted
Step 9 — Active-Active Configuration (Dual VIPs)
For active-active, define two VRRP instances. Each node is MASTER for one VIP and BACKUP for the other.
lb-infrarunbook-01
cat <<'EOF' | sudo tee /etc/keepalived/keepalived.conf
global_defs {
router_id lb-infrarunbook-01
script_user root
enable_script_security
}
vrrp_script chk_haproxy {
script "/etc/keepalived/check_haproxy.sh"
interval 2
timeout 5
weight -30
fall 3
rise 2
}
# VIP-A: This node is MASTER
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 110
advert_int 1
authentication {
auth_type PASS
auth_pass Infra$VIP1!
}
unicast_src_ip 10.20.30.11
unicast_peer {
10.20.30.12
}
virtual_ipaddress {
10.20.30.100/24 dev eth0 label eth0:vip1
}
track_script {
chk_haproxy
}
}
# VIP-B: This node is BACKUP
vrrp_instance VI_2 {
state BACKUP
interface eth0
virtual_router_id 52
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass Infra$VIP2!
}
unicast_src_ip 10.20.30.11
unicast_peer {
10.20.30.12
}
virtual_ipaddress {
10.20.30.101/24 dev eth0 label eth0:vip2
}
track_script {
chk_haproxy
}
}
EOF
lb-infrarunbook-02
cat <<'EOF' | sudo tee /etc/keepalived/keepalived.conf
global_defs {
router_id lb-infrarunbook-02
script_user root
enable_script_security
}
vrrp_script chk_haproxy {
script "/etc/keepalived/check_haproxy.sh"
interval 2
timeout 5
weight -30
fall 3
rise 2
}
# VIP-A: This node is BACKUP
vrrp_instance VI_1 {
state BACKUP
interface eth0
virtual_router_id 51
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass Infra$VIP1!
}
unicast_src_ip 10.20.30.12
unicast_peer {
10.20.30.11
}
virtual_ipaddress {
10.20.30.100/24 dev eth0 label eth0:vip1
}
track_script {
chk_haproxy
}
}
# VIP-B: This node is MASTER
vrrp_instance VI_2 {
state MASTER
interface eth0
virtual_router_id 52
priority 110
advert_int 1
authentication {
auth_type PASS
auth_pass Infra$VIP2!
}
unicast_src_ip 10.20.30.12
unicast_peer {
10.20.30.11
}
virtual_ipaddress {
10.20.30.101/24 dev eth0 label eth0:vip2
}
track_script {
chk_haproxy
}
}
EOF
HAProxy frontend for active-active
frontend ft_web
bind 10.20.30.100:80
bind 10.20.30.100:443 ssl crt /etc/haproxy/certs/solvethenetwork.com.pem alpn h2,http/1.1
bind 10.20.30.101:80
bind 10.20.30.101:443 ssl crt /etc/haproxy/certs/solvethenetwork.com.pem alpn h2,http/1.1
default_backend bk_web_servers
DNS Configuration
lb.solvethenetwork.com. 60 IN A 10.20.30.100
lb.solvethenetwork.com. 60 IN A 10.20.30.101
Step 10 — Preventing Split-Brain
Split-brain occurs when both nodes believe they are MASTER simultaneously, typically due to a network partition between the nodes while both can still reach clients.
Strategy 1: Unicast instead of Multicast
Already configured above. Unicast eliminates issues where multicast is blocked by switches or hypervisors.
Strategy 2: Track Interface
vrrp_instance VI_1 {
...
track_interface {
eth0 weight -40
}
}
If eth0 goes down, priority drops by 40, ensuring failover.
Strategy 3: VRRP Sync Groups
When using multiple VRRP instances (active-active), a sync group ensures all instances fail over together:
vrrp_sync_group SG_1 {
group {
VI_1
VI_2
}
notify_master "/etc/keepalived/notify.sh MASTER"
notify_backup "/etc/keepalived/notify.sh BACKUP"
notify_fault "/etc/keepalived/notify.sh FAULT"
}
Warning: Only use sync groups if you want both VIPs to move together. For true active-active (independent VIPs), do NOT use sync groups — let each VRRP instance fail over independently.
Strategy 4: Fencing Script
Add a check that pings the default gateway. If unreachable, the node should drop to FAULT to avoid serving stale responses:
cat <<'SCRIPT' | sudo tee /etc/keepalived/check_gateway.sh
#!/bin/bash
GATEWAY="10.20.30.1"
ping -c 2 -W 1 $GATEWAY &>/dev/null || exit 1
exit 0
SCRIPT
sudo chmod 755 /etc/keepalived/check_gateway.sh
vrrp_script chk_gateway {
script "/etc/keepalived/check_gateway.sh"
interval 5
weight -50
fall 2
rise 2
}
vrrp_instance VI_1 {
...
track_script {
chk_haproxy
chk_gateway
}
}
Step 11 — Preemption and Failback Control
nopreempt
When
nopreemptis set, once the original MASTER recovers, it does NOT automatically reclaim the VIP. This avoids double-failover flapping. The recovered node stays BACKUP until the current MASTER fails or is restarted.
vrrp_instance VI_1 {
state BACKUP # Both nodes set to BACKUP when using nopreempt
priority 110 # Higher priority wins initial election
nopreempt
...
}
Note:nopreemptonly works whenstateis set toBACKUP. Ifstate MASTERis used, the node will always try to preempt.
preempt_delay
Alternatively, use
preempt_delayto wait before reclaiming MASTER, giving the recovered node time to stabilize:
vrrp_instance VI_1 {
state BACKUP
priority 110
preempt_delay 300 # Wait 5 minutes before preempting
...
}
Step 12 — Monitoring and Troubleshooting
View VRRP state
sudo cat /tmp/keepalived.data
# Or dump runtime data:
sudo kill -USR1 $(pidof keepalived)
sudo cat /tmp/keepalived.data
View VRRP stats
sudo kill -USR2 $(pidof keepalived)
sudo cat /tmp/keepalived.stats
Verify VIP with arping
arping -I eth0 10.20.30.100
# The MAC address should match the current MASTER's eth0
Watch failover in real time
# Terminal 1: Watch VIP on lb-infrarunbook-01
watch -n1 'ip -4 addr show eth0 | grep vip'
# Terminal 2: Continuous curl to VIP
while true; do curl -s -o /dev/null -w "%{http_code} %{time_total}s\n" http://10.20.30.100/healthz; sleep 0.5; done
# Terminal 3: Stop HAProxy on current MASTER
sudo systemctl stop haproxy
Common issues
- Both nodes MASTER: Check that
virtual_router_id
andauth_pass
match. Verify VRRP traffic (protocol 112) is not blocked. - VIP not assigned: Ensure
ip_nonlocal_bind = 1
and the interface name in the config matches your actual interface (checkip link show
). - Script not running: Verify
enable_script_security
is set and the script owner matchesscript_user
. - Cloud environments: AWS, GCP, and Azure do not support VRRP multicast. Use unicast peers and consider Elastic/Floating IP APIs instead of gratuitous ARP.
Step 13 — Systemd Service Dependency
Ensure Keepalived starts after HAProxy:
sudo systemctl edit keepalived
Add the override:
[Unit]
After=haproxy.service
Requires=haproxy.service
sudo systemctl daemon-reload
sudo systemctl restart keepalived
Step 14 — Production Hardening Checklist
- ✅ Use unicast peers in environments where multicast is unreliable
- ✅ Set unique virtual_router_id per VRRP group — avoid collisions with other VRRP on the same VLAN
- ✅ Use strong auth_pass (up to 8 characters in VRRP v2)
- ✅ Enable nopreempt or preempt_delay to avoid flapping
- ✅ Monitor
keepalived
with your existing monitoring stack (Prometheus node_exporter, Zabbix agent, etc.) - ✅ Alert on VRRP state transitions via the
notify
scripts - ✅ Test failover monthly as part of your operational runbook
- ✅ Keep HAProxy configs identical on both nodes — use Ansible/Puppet/Chef for config management
- ✅ Log Keepalived to a dedicated syslog facility for easier auditing
Dedicated syslog facility
# /etc/keepalived/keepalived.conf
global_defs {
...
log_facility local5
}
# /etc/rsyslog.d/keepalived.conf
local5.* /var/log/keepalived.log
sudo systemctl restart rsyslog keepalived
Complete Reference: Minimal Active-Passive Setup
Below is a compact all-in-one config for quick deployment.
lb-infrarunbook-01 — /etc/keepalived/keepalived.conf
global_defs {
router_id lb-infrarunbook-01
script_user root
enable_script_security
log_facility local5
}
vrrp_script chk_haproxy {
script "/etc/keepalived/check_haproxy.sh"
interval 2
timeout 5
weight -30
fall 3
rise 2
}
vrrp_instance VI_1 {
state BACKUP
interface eth0
virtual_router_id 51
priority 110
advert_int 1
nopreempt
authentication {
auth_type PASS
auth_pass Infra$HA
}
unicast_src_ip 10.20.30.11
unicast_peer {
10.20.30.12
}
virtual_ipaddress {
10.20.30.100/24 dev eth0 label eth0:vip
}
track_interface {
eth0 weight -40
}
track_script {
chk_haproxy
}
notify_master "/etc/keepalived/notify.sh MASTER"
notify_backup "/etc/keepalived/notify.sh BACKUP"
notify_fault "/etc/keepalived/notify.sh FAULT"
}
lb-infrarunbook-02 — /etc/keepalived/keepalived.conf
global_defs {
router_id lb-infrarunbook-02
script_user root
enable_script_security
log_facility local5
}
vrrp_script chk_haproxy {
script "/etc/keepalived/check_haproxy.sh"
interval 2
timeout 5
weight -30
fall 3
rise 2
}
vrrp_instance VI_1 {
state BACKUP
interface eth0
virtual_router_id 51
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass Infra$HA
}
unicast_src_ip 10.20.30.12
unicast_peer {
10.20.30.11
}
virtual_ipaddress {
10.20.30.100/24 dev eth0 label eth0:vip
}
track_interface {
eth0 weight -40
}
track_script {
chk_haproxy
}
notify_master "/etc/keepalived/notify.sh MASTER"
notify_backup "/etc/keepalived/notify.sh BACKUP"
notify_fault "/etc/keepalived/notify.sh FAULT"
}
Frequently Asked Questions
1. What is VRRP and why is it used with HAProxy?
VRRP (Virtual Router Redundancy Protocol) allows multiple nodes to share a single virtual IP address. When paired with HAProxy, it ensures that if the active load balancer fails, a standby automatically takes over the VIP, keeping services available with minimal downtime.
2. What is the difference between active-passive and active-active HAProxy setups?
In active-passive, only one node handles traffic via a single VIP. The backup is idle until failover. In active-active, two VIPs are configured with DNS round-robin, so both nodes handle traffic simultaneously. If one fails, the survivor takes both VIPs.
3. How fast is Keepalived failover?
With the default
advert_int 1and typical
fall 3, failover occurs in approximately 3-4 seconds. Using sub-second advertisement intervals (e.g.,
advert_int 0.1with Keepalived 2.x) can reduce this to under 1 second.
4. Does Keepalived work in AWS, GCP, or Azure?
Standard VRRP with gratuitous ARP does not work in most cloud environments because they don't support Layer-2 multicast or ARP manipulation. Instead, use cloud-native floating IP APIs triggered by Keepalived notify scripts, or use cloud load balancers in front of HAProxy pairs.
5. What happens if both nodes think they are MASTER (split-brain)?
Both nodes will respond to the VIP, causing unpredictable routing and duplicate responses. Prevent this by using unicast peers, tracking the gateway interface, and ensuring VRRP traffic (protocol 112) is not blocked by firewalls.
6. Should I use nopreempt or preempt_delay?
Use
nopreemptif you want manual failback control — the recovered node stays BACKUP until the current MASTER fails. Use
preempt_delay(e.g., 300 seconds) if you want automatic failback after a stabilization period. Both prevent flapping.
7. Can I use more than two HAProxy nodes with Keepalived?
Yes. Add more nodes with decreasing priorities (e.g., 110, 100, 90). Only the highest-priority healthy node becomes MASTER. However, for simplicity, most production setups use exactly two nodes.
8. How do I keep HAProxy configuration in sync across nodes?
Use configuration management tools like Ansible, Puppet, or Chef. A simple approach is an Ansible playbook that templates
haproxy.cfgand
keepalived.confwith node-specific variables (priority, state, unicast_src_ip) and triggers a reload.
9. What is the maximum length of auth_pass in VRRP?
VRRP v2 authentication supports a maximum of 8 characters. Longer strings are silently truncated. Note that VRRP v3 (RFC 5798) deprecated authentication entirely. The auth_pass provides minimal security — rely on network segmentation for real protection.
10. How do I monitor Keepalived VRRP state transitions?
Use the
notifyscripts to send alerts to your monitoring system (PagerDuty, Slack, webhook). Additionally, monitor the process with your standard tools (Prometheus process exporter, Zabbix). Parse
/var/log/keepalived.logfor transition events and set up log-based alerts.
