InfraRunBook
    Back to articles

    Structured Logging JSON Best Practices

    Logging
    Published: Apr 8, 2026
    Updated: Apr 8, 2026

    Structured JSON logging transforms your logs from a text archive into a queryable operational tool. Learn the schema design, field conventions, and real-world patterns that make incident response faster and more reliable.

    Structured Logging JSON Best Practices

    What Structured Logging Actually Is

    Structured logging is simple in concept: instead of writing log lines as unformatted human-readable strings, you emit each log entry as a discrete data object — almost always JSON — where every piece of information lives in a named field. That's it. But the implications of that shift are enormous.

    A traditional log line might look like this:

    2024-03-15 14:22:31 INFO api-gateway - POST /api/v2/orders completed in 142ms for user 8841 [192.168.10.45]

    A structured equivalent looks like this:

    {
      "timestamp": "2024-03-15T14:22:31.482Z",
      "level": "info",
      "service": "api-gateway",
      "host": "sw-infrarunbook-01",
      "method": "POST",
      "path": "/api/v2/orders",
      "duration_ms": 142,
      "user_id": 8841,
      "client_ip": "192.168.10.45",
      "status_code": 200
    }

    Both convey the same information. But only one of them lets you run a query like

    duration_ms > 500 AND service = "api-gateway"
    and get meaningful results in seconds. The free-text version requires regex parsing that someone has to write and maintain — and it will break the moment a developer changes the wording of that log message. I've seen this happen more times than I can count, usually during an incident when everyone is already stressed.

    How JSON Structured Logging Works

    The Core Schema

    Every structured log entry should carry a baseline set of fields regardless of context. After running logs through Elasticsearch, Loki, and CloudWatch Logs Insights across several production environments, I've landed on this minimum viable schema as a solid foundation:

    {
      "timestamp": "2024-03-15T14:22:31.482Z",
      "level": "info",
      "service": "payments-api",
      "host": "sw-infrarunbook-01",
      "env": "production",
      "message": "Payment authorization succeeded",
      "trace_id": "7f3a9c1b-4e2d-4f8a-b1c3-9d8e7f6a5b4c",
      "span_id": "a3f2c1b9"
    }

    The

    timestamp
    field should always be in ISO 8601 format with UTC timezone. Don't use Unix epoch integers — they're harder to read during an incident and require conversion in most query interfaces. The
    level
    field should be lowercase and consistent: use
    debug
    ,
    info
    ,
    warn
    ,
    error
    , and
    fatal
    . Pick one convention and enforce it across every service. Mixing
    WARNING
    with
    warn
    and
    WARN
    across your stack is a small thing that creates a large amount of pain at query time.

    Correlation IDs and Distributed Tracing

    The

    trace_id
    field is where structured logging starts earning its keep in distributed systems. When a request enters your system at the API gateway running on
    sw-infrarunbook-01
    , you generate a trace ID and propagate it through every downstream call — via HTTP headers, message queue metadata, gRPC metadata, whatever transport you're using. Every service that touches that request emits its log entries with the same
    trace_id
    .

    Now when something goes wrong at 2am and you're looking at an alert from your payments service, you can pull all logs for that trace across every service with a single query. In my experience, this single practice cuts incident investigation time by more than half. Without it, you're manually correlating timestamps across different log streams and hoping the clocks are synchronized. They often aren't.

    Contextual Fields and Bound Loggers

    Good structured logging libraries support the concept of a logging context — a set of fields that get attached to every log entry within a given scope. In Go, this is done with

    context.Context
    and a logger like
    zerolog
    or
    zap
    . In Python, the
    structlog
    library handles this cleanly with bound loggers. The pattern looks like this in practice:

    # structlog bound logger pattern (Python)
    log = structlog.get_logger().bind(
        service="inventory-service",
        host="sw-infrarunbook-01",
        env="production",
        request_id="req-9f3a1c2d"
    )
    
    log.info("stock_check_started", sku="WIDGET-42", warehouse_id=7)
    log.info("stock_check_completed", sku="WIDGET-42", available_units=143, duration_ms=28)

    Both log entries automatically carry

    service
    ,
    host
    ,
    env
    , and
    request_id
    without repeating them in every call. This is the right way to do it. The alternative — passing a dozen fields manually on every log call — leads to inconsistency and missing fields under pressure, which is exactly when you need those fields most.

    Handling Nested Objects

    One question that comes up constantly is whether to nest objects inside your JSON logs. Keep it flat where possible, and use one level of nesting only when it meaningfully groups related data. Here's the contrast:

    # Cluttered flat structure
    {
      "http_method": "GET",
      "http_path": "/api/orders",
      "http_status": 200,
      "http_duration_ms": 88,
      "db_query": "SELECT * FROM orders WHERE user_id = ?",
      "db_duration_ms": 34,
      "db_rows_returned": 12
    }
    
    # One level of grouping — cleaner and still queryable
    {
      "http": {
        "method": "GET",
        "path": "/api/orders",
        "status": 200,
        "duration_ms": 88
      },
      "db": {
        "query": "SELECT * FROM orders WHERE user_id = ?",
        "duration_ms": 34,
        "rows_returned": 12
      }
    }

    Deeper nesting than one level causes headaches. Most log query languages handle one level fine — Elasticsearch flattens it to

    http.status
    , Loki's LogQL can filter on it with
    | json
    . But three levels deep and you're fighting every query tool you'll ever use. Keep it sane.

    Why It Matters in Production

    If you're running a single service on a single host, structured logging is a convenience. Once you cross into multiple services, multiple hosts, or any kind of container orchestration, it becomes a necessity. The difference between diagnosing an incident in ten minutes versus two hours often comes down to whether you can query your logs or whether you're grepping through files.

    Consider a real scenario: your monitoring system alerts that error rates on solvethenetwork.com's checkout flow have spiked to 8% over the past five minutes. With structured logs flowing into a centralized system, you can immediately run:

    # Loki LogQL — filter errors and extract trace context
    {service="checkout-service", env="production"}
      | json
      | level="error"
      | line_format "{{.message}} | user={{.user_id}} | trace={{.trace_id}}"

    Within seconds you see that all the errors share a

    payment_gateway
    field value of
    stripe-eu
    , suggesting a regional gateway issue rather than an application bug. You pivot to the payment service logs filtered by that same gateway value, confirm the pattern, and you're opening a vendor incident ticket — not staring at a wall of unstructured text trying to spot a trend manually.

    Indexing Strategy and Storage Cost

    There's another angle worth understanding: how your log storage and indexing system handles structured data. Elasticsearch creates inverted indexes on every field, which makes field-level queries fast but also means high-cardinality fields like

    user_id
    or
    trace_id
    consume significant index memory. This is a real operational concern at scale, not a theoretical one.

    The practical solution is being intentional about log levels. Debug-level logs often contain the high-cardinality detail you need for deep debugging — keep them at a sampling rate or suppress them in production unless you're actively investigating. Info-level logs should contain enough context to understand what happened without logging every field of every object in your system.

    Grafana Loki takes a different approach: it indexes only labels — a small set of fields you designate — and stores the rest as raw log content parsed at query time. This trades query performance for storage cost, and it works well when your label cardinality is low. Knowing which system you're targeting should influence how you structure your fields.

    Real-World Examples from Production

    Application Error Logging

    Error logs deserve special attention because they're the ones you're reading at 3am. They need to contain enough information to reproduce and understand the issue without requiring you to re-run the failing request. Here's what a well-structured error log looks like:

    {
      "timestamp": "2024-03-15T03:14:22.819Z",
      "level": "error",
      "service": "order-processor",
      "host": "sw-infrarunbook-01",
      "env": "production",
      "message": "Failed to reserve inventory for order",
      "trace_id": "b2c3d4e5-f6a7-4b8c-9d0e-1f2a3b4c5d6e",
      "span_id": "c3d4e5f6",
      "order_id": "ord-88821",
      "user_id": 44291,
      "sku": "WIDGET-42",
      "requested_qty": 3,
      "error": {
        "type": "InsufficientStockError",
        "message": "Available: 1, Requested: 3",
        "stack": "order_processor.reserve_inventory:142 | order_processor.process_order:88 | worker.handle_message:31"
      }
    }

    Notice that the stack trace lives inside an

    error
    object rather than as a raw multiline string. Multiline strings are the enemy of structured logging. They break line-by-line parsing in most log shippers — Filebeat, Fluentd, Fluent Bit — and require explicit multiline parsing configuration that's easy to get wrong. Collapsing the stack trace into a pipe-delimited single-line string or a structured object avoids that entire class of problem entirely.

    Infrastructure and Deployment Logging

    System-level logging for infrastructure tooling follows the same principles. Here's a log entry from a deployment automation script running on

    sw-infrarunbook-01
    under the
    infrarunbook-admin
    account:

    {
      "timestamp": "2024-03-15T10:33:07.002Z",
      "level": "info",
      "service": "deploy-agent",
      "host": "sw-infrarunbook-01",
      "operator": "infrarunbook-admin",
      "action": "deploy_started",
      "target_service": "payments-api",
      "target_version": "v2.14.1",
      "target_hosts": ["10.10.1.21", "10.10.1.22", "10.10.1.23"],
      "strategy": "rolling",
      "canary_pct": 10,
      "deploy_id": "deploy-20240315-103307"
    }

    That

    deploy_id
    field is your correlation handle for grouping all log entries tied to this specific deployment. If something breaks during the rollout, you filter by
    deploy_id
    and get the complete picture: which hosts succeeded, which failed, what the error was, and how long each step took. This is the same trace ID concept applied to operational workflows rather than request flows.

    Security and Audit Events

    Security events need their own structured schema and typically warrant a separate log stream or index. Every authentication event, authorization decision, privilege escalation, and configuration change should log the actor, the action, the target, the outcome, and the source IP:

    {
      "timestamp": "2024-03-15T09:14:55.331Z",
      "level": "warn",
      "service": "auth-service",
      "host": "sw-infrarunbook-01",
      "event_type": "authentication_failed",
      "actor": "infrarunbook-admin",
      "source_ip": "192.168.1.85",
      "target_resource": "/admin/api/users",
      "failure_reason": "invalid_mfa_token",
      "attempt_count": 3
    }

    Structured security logs integrate directly into SIEM correlation rules. When every field is discrete and queryable, alerting on behavioral patterns becomes straightforward: five authentication failures from the same

    source_ip
    within sixty seconds triggers an alert, no regex required.

    Common Misconceptions

    "JSON logs are too verbose and expensive"

    This comes up from teams that haven't looked at their actual log volume costs carefully. Yes, JSON has more bytes per log entry than a terse text line. But log shippers like Fluent Bit compress output before forwarding, and most log storage systems compress stored data significantly. In practice, JSON log storage costs maybe 15-20% more than equivalent unstructured logs — and the operational savings from faster incident resolution dwarf that cost within the first week of a real incident. The verbosity argument doesn't survive contact with a production outage.

    "We're a small team, we don't need this"

    I've heard this right before an on-call engineer spent six hours grepping through text logs trying to find why three users couldn't complete checkout on a Friday evening. Structured logging is a one-time setup cost with permanent returns. The time to implement it is before your first bad incident, not during or after it.

    "We log everything so we'll always have what we need"

    Logging everything indiscriminately is its own failure mode. It bloats storage, inflates costs, and creates noise that buries the signals you actually care about. Worse, unrestricted logging is a PII and compliance risk — you'll inevitably log something you shouldn't have, like a password field or a raw request body containing payment data. Structure your logging intentionally: define what each log level means for your system, document your core field schema, and treat sensitive data explicitly with redaction before it hits your log pipeline.

    "We'll add structure later when we scale"

    Later never comes. Retrofitting structured logging into existing services while they're in production is painful. Every service ends up with slightly different conventions, every team uses slightly different field names, and now you have to normalize them at the aggregation layer or live with inconsistency forever. Build the schema first — even if it's just a shared logging library with sensible defaults. It takes a day. The inconsistency it prevents takes months to fix.


    Defining a Field Schema Your Team Will Actually Use

    The most valuable thing you can do today is define and document a standard field schema for your organization's logs. It doesn't need to be exhaustive. Start with the fields that appear in every service and lock those down. Here's a practical starting template:

    # Mandatory fields — every log entry, every service
    timestamp       string   ISO 8601 UTC
    level           string   debug|info|warn|error|fatal
    service         string   Logical service name
    host            string   Hostname (e.g., sw-infrarunbook-01)
    env             string   production|staging|development
    
    # Strongly recommended
    message         string   Human-readable event description
    trace_id        string   Distributed trace correlation ID
    span_id         string   Current span within the trace
    
    # HTTP context (when applicable)
    http.method     string   GET, POST, PUT, DELETE, etc.
    http.path       string   URL path only — never log full URL with query params
    http.status     int      HTTP response code
    http.duration_ms int     Response time in milliseconds
    
    # Error context (when level is error or fatal)
    error.type      string   Error class or type name
    error.message   string   Error description
    error.stack     string   Collapsed single-line stack trace

    Enforce this schema through a shared internal logging library, not documentation alone. Documentation gets ignored. A library that only allows you to create compliant log entries gets used. Write it once, publish it to your internal package registry, make it the default in your service templates, and you've solved the consistency problem permanently.

    Structured logging with JSON isn't a nice-to-have. In any system with more than one moving part, it's the difference between running a professional engineering operation and flying blind. The tooling is mature, the libraries cover every major language, and the return on investment shows up the first time something breaks in production and you actually know what happened — and why.

    Frequently Asked Questions

    What is the minimum set of fields every JSON log entry should contain?

    At minimum, every log entry should include: timestamp (ISO 8601 UTC), level (debug/info/warn/error/fatal), service name, hostname, and environment. Adding trace_id and span_id is strongly recommended for any distributed system so you can correlate logs across services during incident investigation.

    Should JSON log fields be nested or flat?

    Prefer flat structures with one level of nesting for logically grouped data, such as http.method and http.status. Avoid nesting deeper than one level — most log query tools handle one level well, but deep nesting creates friction in Elasticsearch, Loki LogQL, and similar systems.

    How do I handle multiline stack traces in JSON logs?

    Never write raw multiline stack traces as log values. They break line-by-line parsing in log shippers like Filebeat and Fluent Bit. Instead, collapse the stack trace into a single-line pipe-delimited string or structure it as a nested error object with a single-line stack field.

    Is JSON logging too expensive for high-volume production systems?

    Not in practice. Log shippers compress output before forwarding, and storage backends like Elasticsearch and Loki compress stored chunks. The overhead is typically 15-20% compared to unstructured logs, which is easily offset by reduced incident investigation time and simpler alerting configuration.

    What is a trace ID and why does it belong in every log entry?

    A trace ID is a unique identifier generated when a request enters your system, propagated through every downstream service call via HTTP headers or message metadata. Including it in every log entry lets you retrieve the complete request history across all services with a single query, dramatically speeding up incident response in distributed architectures.

    How should I enforce a consistent log schema across multiple teams?

    Build a shared internal logging library that wraps your preferred structured logging tool (such as zerolog, zap, or structlog) and only exposes methods that produce schema-compliant output. Include it in your service templates and internal scaffolding tools. Relying solely on documentation doesn't work — the library has to make compliance the path of least resistance.

    Related Articles