InfraRunBook
    Back to articles

    SSH Brute Force Attack Detection and Mitigation

    Security
    Published: Apr 11, 2026
    Updated: Apr 11, 2026

    Learn how to detect and stop SSH brute force attacks on Linux servers by auditing auth logs, deploying fail2ban, hardening sshd_config, and enforcing key-based authentication.

    SSH Brute Force Attack Detection and Mitigation

    Symptoms

    You SSH into sw-infrarunbook-01 for a routine check and something feels off. The login banner takes a beat longer than usual to appear. You run a quick

    tail -f /var/log/auth.log
    and the output is scrolling faster than you can read it. Or maybe you got paged at 2 AM because a disk-usage alert fired — and the culprit turns out to be a 4 GB auth log that didn't exist yesterday.

    Here's what a system under active brute force looks like in practice:

    • /var/log/auth.log (Debian/Ubuntu) or /var/log/secure (RHEL/Rocky/AlmaLinux) is growing at an unusual rate — sometimes gigabytes per day
    • Thousands of "Failed password for invalid user" entries from a handful of rotating source IPs
    • Attempts targeting predictable usernames: root, admin, ubuntu, pi, test, deploy, git, oracle
    • SSH login delays for legitimate users during peak attack windows
    • fail2ban either not running, not installed, or showing zero banned IPs despite the noise
    • CPU and I/O spikes with no obvious process to blame
    • Source IPs tracing back to cloud provider ranges in regions you don't operate in

    This is a brute force attack. Automated scanners sweep the entire routable IPv4 space continuously, probing every host they find on port 22. If your server is reachable, it's being probed — the only question is whether your defenses are configured to absorb or stop it. Let's work through every reason this happens and exactly how to fix it.


    Root Cause 1: Auth Log Shows Repeated Failures and Nobody Is Watching

    Why It Happens

    The auth log is the ground truth for authentication events on a Linux system. Failed SSH attempts, PAM errors, sudo invocations — all of it ends up there. The problem isn't that the log exists; it's that most teams provision a server, open port 22, and never wire up any alerting against auth.log. The log fills silently, an attack runs for days or weeks, and nobody notices until disk fills or a compliance audit surfaces it.

    Attackers know this. Sustained low-and-slow campaigns are specifically designed to stay under the radar — a few hundred attempts per hour from rotating IPs. Each individual source IP stays quiet enough to avoid triggering naive rate-limit rules, but collectively they grind through millions of password combinations around the clock.

    How to Identify It

    Start with the raw failure count over the last 24 hours:

    grep "Failed password" /var/log/auth.log | wc -l

    On a healthy server this should be in the single digits to low hundreds. If you see output like this, you have a problem:

    147382

    Get a breakdown of the attacking IPs to understand the scope:

    grep "Failed password" /var/log/auth.log | awk '{print $(NF-3)}' | sort | uniq -c | sort -rn | head -20
      8421 185.224.128.43
      6203 45.33.32.156
      4891 103.99.0.122
      3317 198.51.100.77
      2108 45.142.212.100
      1892 194.165.16.11

    Then check which usernames are being targeted — this tells you whether it's a credential-stuffing campaign or just blind dictionary blasting:

    grep "Invalid user" /var/log/auth.log | awk '{print $8}' | sort | uniq -c | sort -rn | head -20
      9821 root
      4302 admin
      2198 ubuntu
      1847 pi
       983 test
       741 deploy
       620 git
       507 oracle

    How to Fix It

    Beyond observing the log reactively, you need active monitoring. Forward auth.log to a SIEM, configure logwatch, or use a simple alerting rule that fires when "Failed password" exceeds a threshold in a rolling window. Tools like Loki + Grafana, Graylog, or even a cron job that mails you a daily summary work fine. The key is visibility before the problem becomes a crisis.

    For immediate investigation when auth.log has already rotated, check compressed archives:

    zgrep "Failed password" /var/log/auth.log.*.gz | wc -l

    To see the attack timeline and identify peak windows:

    grep "Failed password" /var/log/auth.log | awk '{print $1, $2}' | uniq -c | sort -rn | head -20

    Root Cause 2: fail2ban Not Installed

    Why It Happens

    This one comes up constantly. Someone spins up a VPS, installs OpenSSH, opens port 22, and ships it. fail2ban isn't part of the default installation on any major distribution — you have to install it deliberately. In my experience, this is the single biggest gap between servers that handle brute force gracefully and servers that get hammered into the ground. Without fail2ban, every failed authentication attempt is free. There's no penalty for being wrong a thousand times in a row.

    fail2ban works by tailing log files for patterns and using iptables or nftables to temporarily ban offending IPs once a threshold is crossed. It's not a silver bullet — rotating botnets can exhaust its ban list — but it dramatically raises the cost of an attack and handles the long tail of automated scanners that dominate internet noise.

    How to Identify It

    which fail2ban-client

    No output means it's not installed. You can also check systemd directly:

    systemctl status fail2ban
    Unit fail2ban.service could not be found.

    Or if it's installed but not running or not enabled:

    ● fail2ban.service - Fail2Ban Service
         Loaded: loaded (/lib/systemd/system/fail2ban.service; disabled; vendor preset: enabled)
         Active: inactive (dead)

    How to Fix It

    On Debian/Ubuntu:

    apt install fail2ban -y
    systemctl enable fail2ban
    systemctl start fail2ban

    On RHEL/Rocky/AlmaLinux (fail2ban lives in EPEL):

    dnf install epel-release -y
    dnf install fail2ban -y
    systemctl enable fail2ban
    systemctl start fail2ban

    Don't modify /etc/fail2ban/jail.conf directly — that file gets overwritten on package upgrades. Create a local override instead:

    cat /etc/fail2ban/jail.local
    
    [DEFAULT]
    bantime  = 86400
    findtime  = 600
    maxretry = 5
    backend = systemd
    
    [sshd]
    enabled = true
    port    = ssh
    logpath = %(sshd_log)s

    This bans any IP that fails 5 times within 10 minutes for a full 24 hours. I push bantime to 86400 on any host with no legitimate reason to see auth failures from unknown sources. After applying the config, verify the jail is active:

    fail2ban-client status sshd
    Status for the jail: sshd
    |- Filter
    |  |- Currently failed: 3
    |  |- Total failed:     142
    |  `- File list:        /var/log/auth.log
    `- Actions
       |- Currently banned: 7
       |- Total banned:     31
       `- Banned IP list:   185.224.128.43 45.33.32.156 103.99.0.122 ...

    Root Cause 3: MaxAuthTries Too High

    Why It Happens

    The default value of MaxAuthTries in OpenSSH is 6. That means an attacker gets 6 password attempts per TCP connection before the server closes it. This sounds restrictive, but nothing stops the attacker from immediately opening a new TCP connection and trying 6 more. With enough threads, they can run thousands of attempts per minute against a single host even with the default in place.

    Worse, I've seen environments where an admin bumped MaxAuthTries to 20 or higher because they were seeing "Too many authentication failures" errors during debugging — a client-side issue caused by an SSH agent offering too many keys — and never walked it back after resolving the real problem. That setting then sits in production, handing attackers a wide-open window.

    How to Identify It

    grep -i "maxauthtries" /etc/ssh/sshd_config

    No output means the default of 6 is in effect. If you see something like this, it needs fixing immediately:

    MaxAuthTries 20

    Always check the effective running configuration rather than just the config file, because Include directives can pull in other files:

    sshd -T | grep maxauthtries
    maxauthtries 20

    How to Fix It

    Set MaxAuthTries to 3 in /etc/ssh/sshd_config. Combined with fail2ban, this means an attacker gets 3 guesses per connection, and fail2ban cuts the IP off once it's tripped that threshold enough times:

    MaxAuthTries 3

    Reload sshd without dropping existing sessions:

    systemctl reload sshd

    Verify the change is live:

    sshd -T | grep maxauthtries
    maxauthtries 3

    While you're in sshd_config, also tighten LoginGraceTime. The default is 120 seconds — two full minutes that a half-open unauthenticated connection can hold a slot in the daemon. Dropping it to 30 seconds reduces resource consumption during flood attacks:

    LoginGraceTime 30

    Root Cause 4: Root Login Allowed

    Why It Happens

    Modern OpenSSH defaults PermitRootLogin to "prohibit-password", which only allows root login via public key authentication. But I've seen countless production servers — especially older systems, systems migrated from aging cloud images, or boxes that started life as quick dev environments and quietly became important — where PermitRootLogin is set to "yes". That means root can log in with a password, and since every Linux system has a root account, attackers don't even need to guess a valid username. They already know one.

    The volume this generates is staggering. Even "prohibit-password" carries risk if key management is loose, which is why the safest posture is to disable root login entirely and require admins to SSH as a named user and escalate with sudo.

    How to Identify It

    grep -i "permitrootlogin" /etc/ssh/sshd_config
    PermitRootLogin yes

    Or from the effective running configuration:

    sshd -T | grep permitrootlogin
    permitrootlogin yes

    Confirm just how much noise root login is generating:

    grep "Failed password for root" /var/log/auth.log | wc -l
    53847

    That number alone explains why this matters.

    How to Fix It

    Before making this change, verify you have a non-root user with sudo access and a working SSH key. If you lock yourself out here, recovery requires console access — not fun at 3 AM. On sw-infrarunbook-01, confirm the admin account first:

    id infrarunbook-admin
    uid=1001(infrarunbook-admin) gid=1001(infrarunbook-admin) groups=1001(infrarunbook-admin),27(sudo)
    cat /home/infrarunbook-admin/.ssh/authorized_keys

    Once confirmed, edit /etc/ssh/sshd_config and set:

    PermitRootLogin no

    Reload sshd:

    systemctl reload sshd

    From a separate terminal session, confirm you can still log in as infrarunbook-admin with your key before closing the root session. This step is not optional — I have seen engineers lock themselves out of cloud VMs by skipping it.


    Root Cause 5: Key-Based Auth Not Enforced

    Why It Happens

    This is the architectural root of most SSH brute force problems. Password authentication over SSH means that as long as an attacker can reach your SSH port, they can attempt logins indefinitely — rate limits and fail2ban only slow them down. Key-based authentication eliminates the attack surface entirely. Without the private key, no amount of password guessing succeeds, period.

    Password auth ships enabled by default in OpenSSH because it's easier for new users to get started. But "easier" cuts both ways — it's easier for attackers too. Every server with a public IP should have PasswordAuthentication set to no. There's no legitimate argument for leaving it enabled on a production host.

    How to Identify It

    sshd -T | grep passwordauthentication
    passwordauthentication yes

    That single line means a brute force attack against your server has a chance of succeeding. Also check ChallengeResponseAuthentication, which can act as an alternate path to password-based auth on some PAM configurations:

    sshd -T | grep challengeresponse
    challengeresponseauthentication yes

    Both need to be disabled.

    How to Fix It

    Confirm your key-based login is working from a separate terminal before making this change. This is non-negotiable. Open /etc/ssh/sshd_config and set:

    PasswordAuthentication no
    ChallengeResponseAuthentication no
    UsePAM yes

    UsePAM yes stays enabled because PAM handles account restrictions, session setup, and access controls — it doesn't re-enable password auth when PasswordAuthentication is explicitly set to no.

    Reload and verify:

    systemctl reload sshd
    sshd -T | grep -E "passwordauthentication|challengeresponse"
    passwordauthentication no
    challengeresponseauthentication no

    Now test from a machine without an authorized key. You should see:

    infrarunbook-admin@192.168.10.50: Permission denied (publickey).

    That's exactly right. Password auth is gone. The brute force attack is still generating log noise, but it can't succeed — and fail2ban will silence even the noise once IPs cross the retry threshold.


    Root Cause 6: No AllowUsers or AllowGroups Restriction

    Why It Happens

    Even with password auth disabled, an unconstrained sshd_config allows any account on the system to attempt key-based login. If a compromised CI/CD pipeline accidentally writes an authorized key to a service account's home directory, or if an attacker gains a foothold through another vulnerability and plants a key, there's nothing at the sshd layer blocking that account from gaining SSH access.

    AllowUsers and AllowGroups enforce an explicit allowlist. Only accounts you name can SSH in — everything else is rejected before any other check runs. It's a cheap control with a high-value payoff.

    How to Identify It

    sshd -T | grep -E "allowusers|allowgroups"

    No output means there's no allowlist in effect. Any account on the system can attempt SSH authentication.

    How to Fix It

    In /etc/ssh/sshd_config, add an explicit allowlist. For a single admin account:

    AllowUsers infrarunbook-admin

    For team environments, group-based control scales better:

    AllowGroups sshusers
    groupadd sshusers
    usermod -aG sshusers infrarunbook-admin

    After reload, any SSH attempt from an account not in the allowlist returns:

    Permission denied (publickey).

    Attackers probing common usernames like ubuntu, pi, or deploy will hit this wall immediately, even if somehow those accounts exist on the system.


    Root Cause 7: No Firewall-Level Rate Limiting

    Why It Happens

    fail2ban reacts after the fact. It reads the log, detects a pattern, then issues a ban. During the window between the first failed attempt and the ban being applied, the attack continues unimpeded. Against a large botnet with thousands of IPs, fail2ban may be banning addresses faster than it can process them while the log still grows. Firewall-level rate limiting is a proactive layer that throttles new TCP connections before they even reach sshd — before any log entry is written, before any auth attempt is processed.

    How to Identify It

    iptables -L INPUT -n --line-numbers | grep -i ssh

    No rate-limiting rules means new connection attempts hit sshd unbounded. Every bot on the internet gets full, unthrottled access to your auth stack.

    How to Fix It

    Using iptables with the recent module, add a rate limit for port 22:

    iptables -A INPUT -p tcp --dport 22 -m state --state NEW -m recent --set --name SSH
    iptables -A INPUT -p tcp --dport 22 -m state --state NEW -m recent --update \
      --seconds 60 --hitcount 10 --name SSH -j DROP

    This drops any source IP that makes more than 10 new SSH connections in 60 seconds. Legitimate users won't notice — they don't open 10 fresh connections per minute. Brute force bots stall immediately.

    On systems using nftables (most modern distros ship with it):

    nft add rule inet filter input tcp dport 22 ct state new \
      meter ssh_meter { ip saddr limit rate 5/minute } accept
    nft add rule inet filter input tcp dport 22 ct state new drop

    Persist these rules through reboots using iptables-save, or manage them through your distro's preferred mechanism — ufw on Ubuntu, firewalld on RHEL-family systems. Don't set this and forget it; verify after any firewall management tool update that the rules survived.


    Root Cause 8: SSH Exposed on Default Port Without Obscurity Controls

    Why It Happens

    Port 22 is scanned continuously. Within minutes of a new public IP address appearing on the internet, automated scanners have already probed it for SSH. Moving SSH to a non-standard port won't stop a determined attacker — a full port scan will find it — but it eliminates the enormous volume of automated noise that targets port 22 specifically. In my experience, moving SSH off port 22 drops auth.log failure counts by 90% or more overnight. It's security through obscurity, it shouldn't be your only layer, but as one component of a defense-in-depth posture it's a free noise reduction you'd be foolish to skip.

    How to Identify It

    sshd -T | grep ^port
    port 22

    How to Fix It

    Edit /etc/ssh/sshd_config:

    Port 2222

    On SELinux-enabled systems, tell SELinux about the new port before reloading sshd:

    semanage port -a -t ssh_port_t -p tcp 2222

    Update your firewall to allow the new port:

    # ufw
    ufw allow 2222/tcp
    ufw delete allow 22/tcp
    
    # firewalld
    firewall-cmd --add-port=2222/tcp --permanent
    firewall-cmd --remove-service=ssh --permanent
    firewall-cmd --reload

    Reload sshd, then update your SSH client config (~/.ssh/config), any automation that connects to this host, jump host definitions, and Ansible inventory. If sw-infrarunbook-01 functions as a bastion, update downstream jump configurations on every client that routes through it. A port change touches more config than it looks like — take a few minutes to audit everything before the change lands.


    Prevention

    Defense in depth is the only real answer. No single control is sufficient. A well-hardened SSH configuration on sw-infrarunbook-01 looks like this in aggregate — this is the baseline I'd apply to any host facing the internet:

    # /etc/ssh/sshd_config — hardened baseline for solvethenetwork.com
    Port 2222
    Protocol 2
    PermitRootLogin no
    MaxAuthTries 3
    LoginGraceTime 30
    PasswordAuthentication no
    ChallengeResponseAuthentication no
    UsePAM yes
    AllowUsers infrarunbook-admin
    PubkeyAuthentication yes
    AuthorizedKeysFile .ssh/authorized_keys
    X11Forwarding no
    AllowAgentForwarding no
    AllowTcpForwarding no
    PrintMotd no
    AcceptEnv LANG LC_*
    Subsystem sftp /usr/lib/openssh/sftp-server

    Always validate sshd_config syntax before reloading — a syntax error here will prevent sshd from starting on next restart, which can lock you out permanently on a headless system:

    sshd -t && echo "Config OK"

    Beyond sshd_config, a complete prevention posture covers several additional areas. Key rotation: rotate SSH keys whenever a team member leaves. Audit all authorized_keys files across your fleet quarterly — keys accumulate faster than anyone tracks them, and forgotten keys are a persistent access risk.

    Centralized log shipping: get auth.log into a SIEM or centralized logging system. Correlating failed logins across multiple hosts lets you spot distributed campaigns that stay below per-host thresholds. A botnet spreading 50 attempts per host across 500 hosts won't trip fail2ban anywhere, but it lights up immediately in a cross-host view.

    Two-factor authentication: for bastion hosts and any server with elevated access, consider adding TOTP via libpam-google-authenticator or a similar module. Even a compromised private key can't authenticate without the second factor. This is particularly valuable for the hosts an attacker would most want to reach.

    SSH Certificate Authority: in environments managing more than a handful of hosts, replace static authorized_keys with an internal SSH CA. Issue short-lived certificates with defined principals and expiry windows. Revocation becomes a centralized operation rather than a file-editing exercise across every host in your fleet. OpenSSH has built-in CA support and it's more straightforward to deploy than people expect.

    Source IP allowlisting: if your SSH hosts don't need to be reachable from arbitrary internet IPs, restrict port 22 (or your custom port) to known source ranges at the firewall or cloud security group level. A management network, VPN egress range, or office IP block combined with an explicit deny-all is far stronger than any application-layer defense. Application-layer tools like fail2ban exist to handle the cases where network-level allowlisting isn't feasible — but network-level allowlisting should always be your first option when it is.

    The goal is to make your SSH surface so hostile to automated attacks that scanners move on to easier targets — and to make any meaningful breach attempt detectable before it becomes a crisis. With key-only auth enforced, fail2ban actively banning, root login disabled, MaxAuthTries at 3, and centralized log monitoring in place, you've addressed the vast majority of real-world brute force scenarios. Stack network-level controls on top of that and you're in genuinely good shape.

    Frequently Asked Questions

    How do I quickly check if my server is actively under SSH brute force attack?

    Run `grep 'Failed password' /var/log/auth.log | wc -l` to get the raw failure count. Anything in the tens of thousands indicates active brute forcing. Follow up with `grep 'Failed password' /var/log/auth.log | awk '{print $(NF-3)}' | sort | uniq -c | sort -rn | head -20` to identify the top attacking IPs and assess the scope.

    Will disabling password authentication break anything on my server?

    Only if you haven't set up key-based authentication first. Before setting PasswordAuthentication no, confirm that your SSH key is in ~/.ssh/authorized_keys for your admin user and that you can log in successfully from a separate terminal. Automated tools like Ansible, CI/CD pipelines, and monitoring agents that connect via SSH will need their keys added too. Test everything before applying the change to production.

    What is the difference between fail2ban and firewall rate limiting for SSH?

    fail2ban is reactive — it reads the log after failed attempts happen, then bans the offending IP. Firewall rate limiting is proactive — it throttles new TCP connections before they reach sshd, before any log entry is written. Both serve different purposes and complement each other. fail2ban handles persistent offenders over time; firewall rate limiting absorbs high-volume floods in the moment. Use both.

    Is changing the SSH port from 22 to a non-standard port actually worth doing?

    As one layer in a defense-in-depth posture, yes. It won't stop a targeted attacker who runs a full port scan, but it eliminates the enormous volume of automated scanners that exclusively probe port 22. In practice, moving SSH off port 22 typically drops auth.log failure rates by 90% or more. It's a free noise reduction — just don't rely on it as your primary security control.

    How do I unban an IP that fail2ban has banned by mistake?

    Use `fail2ban-client set sshd unbanip <IP_ADDRESS>` to immediately remove a specific IP from the sshd jail. To prevent a trusted IP from being banned in the future, add it to the ignoreip list in /etc/fail2ban/jail.local: `ignoreip = 127.0.0.1/8 192.168.1.0/24`. RFC 1918 ranges for your management network are good candidates for this list.

    Related Articles