Logging Standards

Logs are our black box recorder. Every request—from browser click to database write—must emit structured, correlated events so incidents can be diagnosed within minutes. If it cannot be observed, it cannot be supported.

Structured JSON
Trace context
Span → DB linkage
PII-safe

Why logging is critical

  • Customer trust: precise logs explain outages and speed recovery.
  • Compliance: audit trails show who did what and when—required for regulators.
  • Performance: slow transactions surface via span timing without attaching debuggers.
  • Security: anomalies (unexpected IP, auth failures) are detectable only with fine-grained logs.
  • On-call: clean logs reduce MTTR and prevent “cannot reproduce” at 2 a.m.
Logging is a feature, not an afterthought. Stories are incomplete until logging and tracing are covered in the ACs.

End-to-end request tracking

Every inbound HTTP request receives a traceId (W3C Trace Context). The UI propagates it as traceparent header, the backend attaches spanId, and JDBC interceptors log the same identifiers when hitting the database. This lets us replay a request chronologically:

  • UI logs button clicks + APIs called with traceId.
  • API gateway / Backend logs controllers, services, and downstream calls with traceId/spanId.
  • Persistence layer logs SQL latency + table names + traceId.
  • Observability (OTel) exports the same trace to Grafana Tempo; Kibana dashboards correlate logs via field search.
The rule: no log = it didn’t happen. Requests without traceId are automatically rejected in CI tests.

Operational guardrails

  • PII is masked at source; log entries are scrubbed via Logback value filters.
  • Retention: 30 days hot (OpenSearch), 365 days cold (object storage) for audit events.
  • On-call runbooks document Kibana/Grafana queries by module and correlation ID.
  • CI automatically fails if new endpoints lack traceId propagation tests.
  • Weekly “log review” ensures critical paths emit INFO/WARN/ERROR with actionable fields.