Structured Logging JSON Best Practices

What Structured Logging Actually Is

Structured logging is simple in concept: instead of writing log lines as unformatted human-readable strings, you emit each log entry as a discrete data object — almost always JSON — where every piece of information lives in a named field. That's it. But the implications of that shift are enormous.

A traditional log line might look like this:

2024-03-15 14:22:31 INFO api-gateway - POST /api/v2/orders completed in 142ms for user 8841 [192.168.10.45]

A structured equivalent looks like this:

{
  "timestamp": "2024-03-15T14:22:31.482Z",
  "level": "info",
  "service": "api-gateway",
  "host": "sw-infrarunbook-01",
  "method": "POST",
  "path": "/api/v2/orders",
  "duration_ms": 142,
  "user_id": 8841,
  "client_ip": "192.168.10.45",
  "status_code": 200
}

Both convey the same information. But only one of them lets you run a query like

duration_ms > 500 AND service = "api-gateway"

and get meaningful results in seconds. The free-text version requires regex parsing that someone has to write and maintain — and it will break the moment a developer changes the wording of that log message. I've seen this happen more times than I can count, usually during an incident when everyone is already stressed.

How JSON Structured Logging Works

The Core Schema

Every structured log entry should carry a baseline set of fields regardless of context. After running logs through Elasticsearch, Loki, and CloudWatch Logs Insights across several production environments, I've landed on this minimum viable schema as a solid foundation:

{
  "timestamp": "2024-03-15T14:22:31.482Z",
  "level": "info",
  "service": "payments-api",
  "host": "sw-infrarunbook-01",
  "env": "production",
  "message": "Payment authorization succeeded",
  "trace_id": "7f3a9c1b-4e2d-4f8a-b1c3-9d8e7f6a5b4c",
  "span_id": "a3f2c1b9"
}

The

timestamp

field should always be in ISO 8601 format with UTC timezone. Don't use Unix epoch integers — they're harder to read during an incident and require conversion in most query interfaces. The

level

field should be lowercase and consistent: use

debug

info

warn

error

, and

fatal

. Pick one convention and enforce it across every service. Mixing

WARNING

with

warn

and

WARN

across your stack is a small thing that creates a large amount of pain at query time.

Correlation IDs and Distributed Tracing

The

trace_id

field is where structured logging starts earning its keep in distributed systems. When a request enters your system at the API gateway running on

sw-infrarunbook-01

, you generate a trace ID and propagate it through every downstream call — via HTTP headers, message queue metadata, gRPC metadata, whatever transport you're using. Every service that touches that request emits its log entries with the same

trace_id

Now when something goes wrong at 2am and you're looking at an alert from your payments service, you can pull all logs for that trace across every service with a single query. In my experience, this single practice cuts incident investigation time by more than half. Without it, you're manually correlating timestamps across different log streams and hoping the clocks are synchronized. They often aren't.

Contextual Fields and Bound Loggers

Good structured logging libraries support the concept of a logging context — a set of fields that get attached to every log entry within a given scope. In Go, this is done with

context.Context

and a logger like

zerolog

zap

. In Python, the

structlog

library handles this cleanly with bound loggers. The pattern looks like this in practice:

# structlog bound logger pattern (Python)
log = structlog.get_logger().bind(
    service="inventory-service",
    host="sw-infrarunbook-01",
    env="production",
    request_id="req-9f3a1c2d"
)

log.info("stock_check_started", sku="WIDGET-42", warehouse_id=7)
log.info("stock_check_completed", sku="WIDGET-42", available_units=143, duration_ms=28)

Both log entries automatically carry

service

host

env

, and

request_id

without repeating them in every call. This is the right way to do it. The alternative — passing a dozen fields manually on every log call — leads to inconsistency and missing fields under pressure, which is exactly when you need those fields most.

Handling Nested Objects

One question that comes up constantly is whether to nest objects inside your JSON logs. Keep it flat where possible, and use one level of nesting only when it meaningfully groups related data. Here's the contrast:

# Cluttered flat structure
{
  "http_method": "GET",
  "http_path": "/api/orders",
  "http_status": 200,
  "http_duration_ms": 88,
  "db_query": "SELECT * FROM orders WHERE user_id = ?",
  "db_duration_ms": 34,
  "db_rows_returned": 12
}

# One level of grouping — cleaner and still queryable
{
  "http": {
    "method": "GET",
    "path": "/api/orders",
    "status": 200,
    "duration_ms": 88
  },
  "db": {
    "query": "SELECT * FROM orders WHERE user_id = ?",
    "duration_ms": 34,
    "rows_returned": 12
  }
}

Deeper nesting than one level causes headaches. Most log query languages handle one level fine — Elasticsearch flattens it to

http.status

, Loki's LogQL can filter on it with

| json

. But three levels deep and you're fighting every query tool you'll ever use. Keep it sane.

Why It Matters in Production

If you're running a single service on a single host, structured logging is a convenience. Once you cross into multiple services, multiple hosts, or any kind of container orchestration, it becomes a necessity. The difference between diagnosing an incident in ten minutes versus two hours often comes down to whether you can query your logs or whether you're grepping through files.

Consider a real scenario: your monitoring system alerts that error rates on solvethenetwork.com's checkout flow have spiked to 8% over the past five minutes. With structured logs flowing into a centralized system, you can immediately run:

# Loki LogQL — filter errors and extract trace context
{service="checkout-service", env="production"}
  | json
  | level="error"
  | line_format "{{.message}} | user={{.user_id}} | trace={{.trace_id}}"

Within seconds you see that all the errors share a

payment_gateway

field value of

stripe-eu

, suggesting a regional gateway issue rather than an application bug. You pivot to the payment service logs filtered by that same gateway value, confirm the pattern, and you're opening a vendor incident ticket — not staring at a wall of unstructured text trying to spot a trend manually.

Indexing Strategy and Storage Cost

There's another angle worth understanding: how your log storage and indexing system handles structured data. Elasticsearch creates inverted indexes on every field, which makes field-level queries fast but also means high-cardinality fields like

user_id

trace_id

consume significant index memory. This is a real operational concern at scale, not a theoretical one.

The practical solution is being intentional about log levels. Debug-level logs often contain the high-cardinality detail you need for deep debugging — keep them at a sampling rate or suppress them in production unless you're actively investigating. Info-level logs should contain enough context to understand what happened without logging every field of every object in your system.

Grafana Loki takes a different approach: it indexes only labels — a small set of fields you designate — and stores the rest as raw log content parsed at query time. This trades query performance for storage cost, and it works well when your label cardinality is low. Knowing which system you're targeting should influence how you structure your fields.

Real-World Examples from Production

Application Error Logging

Error logs deserve special attention because they're the ones you're reading at 3am. They need to contain enough information to reproduce and understand the issue without requiring you to re-run the failing request. Here's what a well-structured error log looks like:

{
  "timestamp": "2024-03-15T03:14:22.819Z",
  "level": "error",
  "service": "order-processor",
  "host": "sw-infrarunbook-01",
  "env": "production",
  "message": "Failed to reserve inventory for order",
  "trace_id": "b2c3d4e5-f6a7-4b8c-9d0e-1f2a3b4c5d6e",
  "span_id": "c3d4e5f6",
  "order_id": "ord-88821",
  "user_id": 44291,
  "sku": "WIDGET-42",
  "requested_qty": 3,
  "error": {
    "type": "InsufficientStockError",
    "message": "Available: 1, Requested: 3",
    "stack": "order_processor.reserve_inventory:142 | order_processor.process_order:88 | worker.handle_message:31"
  }
}

Notice that the stack trace lives inside an

error

object rather than as a raw multiline string. Multiline strings are the enemy of structured logging. They break line-by-line parsing in most log shippers — Filebeat, Fluentd, Fluent Bit — and require explicit multiline parsing configuration that's easy to get wrong. Collapsing the stack trace into a pipe-delimited single-line string or a structured object avoids that entire class of problem entirely.

Infrastructure and Deployment Logging

System-level logging for infrastructure tooling follows the same principles. Here's a log entry from a deployment automation script running on

sw-infrarunbook-01

under the

infrarunbook-admin

account:

{
  "timestamp": "2024-03-15T10:33:07.002Z",
  "level": "info",
  "service": "deploy-agent",
  "host": "sw-infrarunbook-01",
  "operator": "infrarunbook-admin",
  "action": "deploy_started",
  "target_service": "payments-api",
  "target_version": "v2.14.1",
  "target_hosts": ["10.10.1.21", "10.10.1.22", "10.10.1.23"],
  "strategy": "rolling",
  "canary_pct": 10,
  "deploy_id": "deploy-20240315-103307"
}

That

deploy_id

field is your correlation handle for grouping all log entries tied to this specific deployment. If something breaks during the rollout, you filter by

deploy_id

and get the complete picture: which hosts succeeded, which failed, what the error was, and how long each step took. This is the same trace ID concept applied to operational workflows rather than request flows.

Security and Audit Events

Security events need their own structured schema and typically warrant a separate log stream or index. Every authentication event, authorization decision, privilege escalation, and configuration change should log the actor, the action, the target, the outcome, and the source IP:

{
  "timestamp": "2024-03-15T09:14:55.331Z",
  "level": "warn",
  "service": "auth-service",
  "host": "sw-infrarunbook-01",
  "event_type": "authentication_failed",
  "actor": "infrarunbook-admin",
  "source_ip": "192.168.1.85",
  "target_resource": "/admin/api/users",
  "failure_reason": "invalid_mfa_token",
  "attempt_count": 3
}

Structured security logs integrate directly into SIEM correlation rules. When every field is discrete and queryable, alerting on behavioral patterns becomes straightforward: five authentication failures from the same

source_ip

within sixty seconds triggers an alert, no regex required.

Common Misconceptions

"JSON logs are too verbose and expensive"

This comes up from teams that haven't looked at their actual log volume costs carefully. Yes, JSON has more bytes per log entry than a terse text line. But log shippers like Fluent Bit compress output before forwarding, and most log storage systems compress stored data significantly. In practice, JSON log storage costs maybe 15-20% more than equivalent unstructured logs — and the operational savings from faster incident resolution dwarf that cost within the first week of a real incident. The verbosity argument doesn't survive contact with a production outage.

"We're a small team, we don't need this"

I've heard this right before an on-call engineer spent six hours grepping through text logs trying to find why three users couldn't complete checkout on a Friday evening. Structured logging is a one-time setup cost with permanent returns. The time to implement it is before your first bad incident, not during or after it.

"We log everything so we'll always have what we need"

Logging everything indiscriminately is its own failure mode. It bloats storage, inflates costs, and creates noise that buries the signals you actually care about. Worse, unrestricted logging is a PII and compliance risk — you'll inevitably log something you shouldn't have, like a password field or a raw request body containing payment data. Structure your logging intentionally: define what each log level means for your system, document your core field schema, and treat sensitive data explicitly with redaction before it hits your log pipeline.

"We'll add structure later when we scale"

Later never comes. Retrofitting structured logging into existing services while they're in production is painful. Every service ends up with slightly different conventions, every team uses slightly different field names, and now you have to normalize them at the aggregation layer or live with inconsistency forever. Build the schema first — even if it's just a shared logging library with sensible defaults. It takes a day. The inconsistency it prevents takes months to fix.

Defining a Field Schema Your Team Will Actually Use

The most valuable thing you can do today is define and document a standard field schema for your organization's logs. It doesn't need to be exhaustive. Start with the fields that appear in every service and lock those down. Here's a practical starting template:

# Mandatory fields — every log entry, every service
timestamp       string   ISO 8601 UTC
level           string   debug|info|warn|error|fatal
service         string   Logical service name
host            string   Hostname (e.g., sw-infrarunbook-01)
env             string   production|staging|development

# Strongly recommended
message         string   Human-readable event description
trace_id        string   Distributed trace correlation ID
span_id         string   Current span within the trace

# HTTP context (when applicable)
http.method     string   GET, POST, PUT, DELETE, etc.
http.path       string   URL path only — never log full URL with query params
http.status     int      HTTP response code
http.duration_ms int     Response time in milliseconds

# Error context (when level is error or fatal)
error.type      string   Error class or type name
error.message   string   Error description
error.stack     string   Collapsed single-line stack trace

Enforce this schema through a shared internal logging library, not documentation alone. Documentation gets ignored. A library that only allows you to create compliant log entries gets used. Write it once, publish it to your internal package registry, make it the default in your service templates, and you've solved the consistency problem permanently.

Structured logging with JSON isn't a nice-to-have. In any system with more than one moving part, it's the difference between running a professional engineering operation and flying blind. The tooling is mature, the libraries cover every major language, and the return on investment shows up the first time something breaks in production and you actually know what happened — and why.

Structured Logging JSON Best Practices

What Structured Logging Actually Is

How JSON Structured Logging Works

The Core Schema

Correlation IDs and Distributed Tracing

Contextual Fields and Bound Loggers

Handling Nested Objects

Why It Matters in Production

Indexing Strategy and Storage Cost

Real-World Examples from Production

Application Error Logging

Infrastructure and Deployment Logging

Security and Audit Events

Common Misconceptions

"JSON logs are too verbose and expensive"

"We're a small team, we don't need this"

"We log everything so we'll always have what we need"

"We'll add structure later when we scale"

Defining a Field Schema Your Team Will Actually Use

Related Articles

Frequently Asked Questions

What is the minimum set of fields every JSON log entry should contain?

Should JSON log fields be nested or flat?

How do I handle multiline stack traces in JSON logs?

Is JSON logging too expensive for high-volume production systems?

What is a trace ID and why does it belong in every log entry?

How should I enforce a consistent log schema across multiple teams?

Related Articles