Traefik Lets Encrypt Certificate Failing

Symptoms

You've deployed Traefik, pointed a domain at your server, and instead of a clean padlock you're staring at a certificate warning. Or worse — Traefik is silently failing to renew, and you only find out when a user emails to say the site is broken. The symptoms vary depending on where in the ACME lifecycle things went wrong:

Browser shows "Your connection is not private" with
NET::ERR_CERT_AUTHORITY_INVALID
Traefik logs contain
Unable to obtain ACME certificate for domains
or
Error obtaining certificate
The
acme.json
file is empty or has no certificate entries for your domain
Traefik dashboard shows the router active, but TLS is falling back to the default self-signed cert
A freshly deployed service shows
Certificate is not yet valid
immediately after startup
Renewals fail silently and the certificate expires without warning

In my experience, nearly every one of these failures traces back to a handful of predictable causes. Let's walk through each one systematically — why it happens, how to confirm it, and how to get past it.

Root Cause 1: ACME Challenge Failing

Why It Happens

Traefik uses the ACME protocol to request certificates from Let's Encrypt. With the default HTTP-01 challenge, Let's Encrypt fires a request to

http://solvethenetwork.com/.well-known/acme-challenge/<token>

and expects Traefik to serve back the correct token. If that request fails for any reason — a routing misconfiguration, middleware intercepting it, or a redirect chain eating the request before Traefik can respond — the challenge fails and you get no certificate.

I've seen this happen repeatedly on stacks where someone added a global redirect-to-HTTPS middleware on the entrypoint level. The ACME challenge arrives on port 80, gets redirected to 443, but 443 doesn't have a valid cert yet. It's a circular dependency. Traefik handles this with a special internal router called

acme-http@internal

that intercepts challenge requests before any user-defined middleware — but only if you haven't accidentally overridden it with a catch-all rule.

How to Identify It

Enable debug logging and watch the output during a certificate request:

traefik --log.level=DEBUG 2>&1 | grep -i acme

A failed HTTP-01 challenge looks like this in the logs:

time="2026-04-12T10:14:32Z" level=error msg="Unable to obtain ACME certificate for domains \"solvethenetwork.com\""
  reason="acme: Error -> One or more domains had a problem:
  [solvethenetwork.com] acme: error: 403 :: urn:ietf:params:acme:error:unauthorized ::
  Invalid response from http://solvethenetwork.com/.well-known/acme-challenge/Abc123XYZ..."

You can simulate exactly what Let's Encrypt does from any external machine:

curl -v http://solvethenetwork.com/.well-known/acme-challenge/test

If you get a redirect to HTTPS, a 404, or a connection error, the challenge path isn't reachable the way Let's Encrypt needs it to be.

How to Fix It

Don't apply your HTTPS redirect middleware at the entrypoint level. Apply it only to individual service routers. Traefik's

acme-http@internal

router handles challenge requests on port 80 automatically — the problem is when you define a global HTTP-to-HTTPS redirect in the static config that catches everything first. This config pattern is the culprit:

# traefik.yml — this blocks ACME challenges when applied globally
entryPoints:
  web:
    address: ":80"
    http:
      redirections:
        entryPoint:
          to: websecure
          scheme: https

Remove the

redirections

block from the entrypoint and move it to the router level instead. Define a dedicated redirect middleware and attach it only to your service routers via Docker labels:

traefik.http.middlewares.redirect-https.redirectscheme.scheme=https
traefik.http.middlewares.redirect-https.redirectscheme.permanent=true
traefik.http.routers.myapp-http.middlewares=redirect-https

This leaves port 80 open for ACME while still redirecting real user traffic to HTTPS.

Root Cause 2: DNS Not Propagated

Why It Happens

Let's Encrypt resolves your domain's A record from multiple geographic vantage points before issuing a certificate. If you just pointed

solvethenetwork.com

at the new IP of

sw-infrarunbook-01

and immediately triggered Traefik's ACME flow, there's a real chance Let's Encrypt is still seeing the old IP or getting NXDOMAIN from at least one of its resolvers.

DNS TTLs are the obvious culprit, but there's a less obvious one: some DNS providers have internal propagation delays that exceed their published TTL. I've seen providers advertise a 60-second TTL but take 5–10 minutes to push changes globally. Let's Encrypt will reject the challenge if even one of its resolvers can't reach your server at the resolved address.

How to Identify It

Check what different public resolvers currently see for your domain:

dig @8.8.8.8 solvethenetwork.com A +short
dig @1.1.1.1 solvethenetwork.com A +short
dig @9.9.9.9 solvethenetwork.com A +short

If you get different answers from different resolvers, propagation isn't complete yet. For a more authoritative check, query the nameservers directly:

# First find your authoritative nameservers:
dig solvethenetwork.com NS +short

# Then query one directly:
dig @ns1.provider.net solvethenetwork.com A +short

Also check the remaining TTL on the current record to estimate how long you have to wait:

dig solvethenetwork.com A | grep -i "IN.*A"

In Traefik logs, a DNS-related ACME failure usually appears as a timeout or an authorization error where Let's Encrypt reports it couldn't validate the domain at the expected IP.

How to Fix It

Wait for propagation. That's the real answer. Don't fight the TTL. The best mitigation is proactive: before making DNS changes, lower your TTL to 60 seconds well in advance — ideally 24 hours before the cutover — so that when you do switch the record, propagation completes quickly.

If you're already in this situation, confirm propagation is complete across all three resolvers above before restarting Traefik to trigger a fresh ACME request. Restarting before DNS is ready just wastes rate limit attempts, which brings us to the next problem.

Root Cause 3: Rate Limit Hit

Why It Happens

Let's Encrypt enforces rate limits that are surprisingly easy to hit in test environments or during repeated failed deployments. The limits that bite most often are: 5 duplicate certificate orders per week for the same set of domains, and 50 certificates per registered domain per week. If you've been iterating on a Traefik config — restarting the container, watching it fail, adjusting, restarting again — you can burn through all 5 duplicate-certificate attempts within an hour.

The registered domain limit is per eTLD+1, meaning all subdomains of

solvethenetwork.com

share the same weekly quota of 50 certificates. If you're managing many services under a single domain, you can approach this ceiling without realizing it.

How to Identify It

The rate limit error in Traefik logs is unmistakable:

time="2026-04-12T11:02:17Z" level=error msg="Unable to obtain ACME certificate"
  reason="acme: Error -> One or more domains had a problem:
  [solvethenetwork.com] acme: error: 429 :: urn:ietf:params:acme:error:rateLimited ::
  Error finalizing order :: too many certificates already issued for exact set of domains"

You can also audit how many certificates Traefik has already obtained by inspecting

acme.json

cat /etc/traefik/acme.json | python3 -m json.tool | grep -c '"domain"'

Cross-reference that count with Let's Encrypt's published rate limit thresholds. The 429 HTTP status code in the ACME error is the definitive signal — once you see it, you're done until the weekly window rolls over.

How to Fix It

Switch to the Let's Encrypt staging environment while you're testing. It has much higher limits and uses a separate CA — you'll get an untrusted certificate, but that's exactly what you want during troubleshooting. Update your resolver in

traefik.yml

certificatesResolvers:
  letsencrypt:
    acme:
      email: infrarunbook-admin@solvethenetwork.com
      storage: /etc/traefik/acme.json
      caServer: https://acme-staging-v02.api.letsencrypt.org/directory
      httpChallenge:
        entryPoint: web

Once the staging certificate appears in the browser (even as untrusted), your entire ACME flow is working correctly. Then swap back to the production CA URL, delete

acme.json

to force a fresh issuance, and restart Traefik. If you're already rate-limited in production, there's no shortcut — you have to wait. The window is rolling and tied to the timestamps of the failed requests in your logs, so check those to estimate when you'll be clear.

Root Cause 4: Port 80 Not Accessible

Why It Happens

The HTTP-01 challenge requires port 80 to be publicly reachable from the internet. This sounds obvious but fails in environments where port 80 is blocked at the cloud firewall, not published from the Docker container, or simply not bound because the Traefik entrypoint was never defined. Cloud providers don't open inbound ports by default. Security groups and firewall rules have to be explicitly configured, and this step gets skipped more often than you'd think.

The Docker publishing issue is another common one. The container is listening on port 80 internally, but because the

ports

mapping is missing from the Compose file, traffic from the internet never reaches it. Traefik appears to be running fine from inside the host, which makes this confusing to diagnose without external testing.

How to Identify It

Test port 80 connectivity from a machine that isn't

sw-infrarunbook-01

nc -zv solvethenetwork.com 80
curl -v --max-time 10 http://solvethenetwork.com/

A connection timeout confirms port 80 isn't reachable externally. On the server itself, verify Traefik is actually listening:

ss -tlnp | grep :80

You should see output like this:

LISTEN 0 128 0.0.0.0:80 0.0.0.0:* users:(("traefik",pid=12345,fd=10))

If nothing appears, Traefik isn't binding port 80 at all. Check the Docker port mappings:

docker inspect traefik | python3 -m json.tool | grep -A5 '"Ports"'

How to Fix It

Make sure your Docker Compose has both ports explicitly published:

services:
  traefik:
    image: traefik:v3.0
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - /etc/traefik/traefik.yml:/etc/traefik/traefik.yml
      - /etc/traefik/acme.json:/etc/traefik/acme.json

Then open the firewall on the host. If you're using

ufw

ufw allow 80/tcp
ufw allow 443/tcp
ufw reload
ufw status verbose

On cloud providers, update the security group or VPC firewall rule to allow inbound TCP port 80 from

0.0.0.0/0

. After making these changes, re-run the external connectivity test before triggering another ACME request. Don't waste another rate-limit attempt until you've confirmed port 80 responds.

Root Cause 5: Wrong Email in Config

Why It Happens

The email address in your ACME configuration is used by Let's Encrypt for expiry notifications and account registration. A malformed address causes account registration to fail outright. A valid-looking but wrong address won't block issuance, but it means you'll never receive expiry warnings — which is how certificates quietly expire in production.

This one is subtle because Let's Encrypt doesn't verify email by sending a confirmation link. But it does validate the format during account registration, and I've seen teams copy-paste a template config that still has a placeholder like

admin@changeme.local

or an empty string, then wonder why the ACME flow fails at the very first step.

How to Identify It

Grep your config for the email field:

grep -i email /etc/traefik/traefik.yml

Expected output:

      email: infrarunbook-admin@solvethenetwork.com

Also inspect

acme.json

to see what email was used when the ACME account was originally registered — this can differ from what's currently in the config if the file predates a config change:

python3 -c "
import json, sys
with open('/etc/traefik/acme.json') as f:
    data = json.load(f)
for resolver, content in data.items():
    reg = content.get('Account', {}).get('Registration', {})
    print(resolver, ':', reg.get('body', {}).get('contact', 'no contact found'))
"

A malformed email causes an explicit error during account registration:

time="2026-04-12T09:45:11Z" level=error msg="Unable to obtain ACME certificate"
  reason="acme: error: 400 :: urn:ietf:params:acme:error:invalidEmail ::
  Error creating new account :: contact email \"admin@\" is invalid"

How to Fix It

Correct the email in

traefik.yml

, then wipe

acme.json

and restart Traefik to force a fresh ACME account registration with the correct address. The file must be truncated rather than deleted if your volume mount expects it to exist:

sudo truncate -s 0 /etc/traefik/acme.json
sudo chmod 600 /etc/traefik/acme.json

Or if you prefer to recreate it cleanly:

sudo rm /etc/traefik/acme.json
sudo touch /etc/traefik/acme.json
sudo chmod 600 /etc/traefik/acme.json

The

chmod 600

step isn't optional. Traefik logs a warning and may refuse to use

acme.json

if the file is world-readable, because it contains private key material. This is correct behavior — treat it as a feature, not an obstacle.

Root Cause 6: acme.json Permission or Ownership Issues

Why It Happens

Even if everything else is configured correctly, Traefik will fail to persist certificates if

acme.json

has wrong permissions or ownership. Traefik refuses to write to a file that's world-readable because of the private keys it stores. Conversely, if the file is owned by root but Traefik runs as a non-root UID inside the container, write attempts fail silently and the ACME flow appears to succeed in logs but produces nothing on disk.

How to Identify It

ls -la /etc/traefik/acme.json
# Correct output:
-rw------- 1 root root 4096 Apr 12 10:00 /etc/traefik/acme.json

The warning in Traefik logs when permissions are too open:

level=warning msg="The ACME certificate storage file /etc/traefik/acme.json
has been created with permissions 644, please use chmod 600"

How to Fix It

chmod 600 /etc/traefik/acme.json
chown root:root /etc/traefik/acme.json

If Traefik runs as a specific non-root user inside the container, find that UID and set ownership accordingly:

docker exec traefik id
# uid=65532(nonroot) gid=65532(nonroot)
sudo chown 65532:65532 /etc/traefik/acme.json

Restart Traefik after correcting permissions and verify the file size grows as certificates are written.

Root Cause 7: Resolver Name Mismatch

Why It Happens

Traefik requires that the

certResolver

value on your router label or dynamic config exactly matches the resolver name defined under

certificatesResolvers

in your static config. It's case-sensitive. Define the resolver as

letsencrypt

traefik.yml

but label your container with

certresolver=letsEncrypt

, and Traefik will skip certificate issuance for that router entirely without logging a meaningful error. The router shows as active in the dashboard, but TLS falls back to the self-signed default cert.

How to Identify It

Query the Traefik API to inspect the router's TLS config:

curl -s http://sw-infrarunbook-01:8080/api/http/routers | python3 -m json.tool | grep -A10 '"tls"'

A correctly configured router shows:

"tls": {
    "certResolver": "letsencrypt"
}

certResolver

is an empty string or the field is absent, the label value didn't match any defined resolver.

How to Fix It

Align the label value exactly with the resolver name in your static config:

# traefik.yml defines:
certificatesResolvers:
  letsencrypt:
    acme:
      email: infrarunbook-admin@solvethenetwork.com
      storage: /etc/traefik/acme.json
      httpChallenge:
        entryPoint: web

# Docker Compose label must match exactly:
traefik.http.routers.myapp-secure.tls.certresolver=letsencrypt

After fixing the label, redeploy the container. Traefik picks up the change and triggers a certificate request on the next router reload.

Prevention

Most of these failures are entirely preventable with a consistent deployment checklist. Here's what I build into every Traefik setup from the start.

Always validate with staging first. Before pointing production traffic anywhere near a new Traefik instance, set the

caServer

to the Let's Encrypt staging URL and confirm the full ACME flow completes. You'll get an untrusted certificate, but if the browser shows a cert issued by "Fake LE Intermediate X1," every critical path — DNS resolution, port 80 routing, challenge serving, acme.json writes — has been validated without spending production rate limits.

Confirm DNS propagation before deployment. Make it a formal step in your runbook. Query at least three public resolvers and verify they all return the correct IP for

solvethenetwork.com

before starting Traefik. Lower your TTL 24 hours ahead of a DNS cutover if you have that luxury.

Set acme.json permissions in your provisioning scripts. Don't rely on Traefik to create the file with correct permissions. Create it yourself during host setup:

install -m 600 -o root -g root /dev/null /etc/traefik/acme.json

Monitor certificate expiry proactively. Don't rely on Let's Encrypt expiry emails as your only alert. If Traefik's Prometheus metrics endpoint is enabled, alert on the

traefik_tls_certs_not_after

gauge:

# Alert when any cert expires in fewer than 14 days:
(traefik_tls_certs_not_after - time()) / 86400 < 14

This gives you visibility across every domain Traefik manages, and you'll catch renewal failures before they become production incidents.

Consider DNS-01 challenges for wildcard certificates. If you're managing many subdomains under

solvethenetwork.com

, switching to DNS-01 challenges eliminates the port 80 dependency entirely and lets you issue wildcard certs (

*.solvethenetwork.com

) that cover all subdomains under a single certificate. Most major DNS providers have Traefik-compatible plugins available. The trade-off is that you need API credentials for your DNS provider in the Traefik config, so store those in a secret manager rather than directly in the Compose file.

Review the Traefik changelog after every major upgrade. Certificate resolver configuration has changed between v1, v2, and v3. Fields have moved, defaults have changed, and deprecated keys sometimes stop working silently. After any major Traefik upgrade, validate your static config against the new schema before assuming certificates will continue to renew cleanly in the background.

Certificate failures are frustrating precisely because they're often silent until something breaks for users. Staging validation, pre-deployment DNS checks, locked-down

acme.json

permissions, and proactive expiry monitoring together eliminate nearly every surprise I've seen in production Traefik deployments.

Traefik Lets Encrypt Certificate Failing

Symptoms

Root Cause 1: ACME Challenge Failing

Why It Happens

How to Identify It

How to Fix It

Root Cause 2: DNS Not Propagated

Why It Happens

How to Identify It

How to Fix It

Root Cause 3: Rate Limit Hit

Why It Happens

How to Identify It

How to Fix It

Root Cause 4: Port 80 Not Accessible

Why It Happens

How to Identify It

How to Fix It

Root Cause 5: Wrong Email in Config

Why It Happens

How to Identify It

How to Fix It

Root Cause 6: acme.json Permission or Ownership Issues

Why It Happens

How to Identify It

How to Fix It

Root Cause 7: Resolver Name Mismatch

Why It Happens

How to Identify It

How to Fix It

Prevention

Related Articles

Frequently Asked Questions

Why does Traefik keep showing a self-signed certificate instead of a Let's Encrypt cert?

How do I check if Let's Encrypt has rate limited my domain?

Does Traefik automatically renew Let's Encrypt certificates?

Why does the ACME challenge fail even though port 80 is open?

How do I force Traefik to request a new certificate immediately?

Related Articles