Logging Standards
Logs are our black box recorder. Every request—from browser click to database write—must emit structured, correlated events so incidents can be diagnosed within minutes. If it cannot be observed, it cannot be supported.
Structured JSON
Trace context
Span → DB linkage
PII-safe
Why logging is critical
- Customer trust: precise logs explain outages and speed recovery.
- Compliance: audit trails show who did what and when—required for regulators.
- Performance: slow transactions surface via span timing without attaching debuggers.
- Security: anomalies (unexpected IP, auth failures) are detectable only with fine-grained logs.
- On-call: clean logs reduce MTTR and prevent “cannot reproduce” at 2 a.m.
Logging is a feature, not an afterthought. Stories are incomplete until logging and tracing are covered in the ACs.
End-to-end request tracking
Every inbound HTTP request receives a traceId (W3C Trace Context). The UI propagates it as
traceparent header, the backend attaches spanId, and JDBC interceptors log the same
identifiers when hitting the database. This lets us replay a request chronologically:
- UI logs button clicks + APIs called with
traceId. - API gateway / Backend logs controllers, services, and downstream calls with
traceId/spanId. - Persistence layer logs SQL latency + table names +
traceId. - Observability (OTel) exports the same trace to Grafana Tempo; Kibana dashboards correlate logs via field search.
The rule: no log = it didn’t happen. Requests without
traceId are automatically rejected in CI tests.
Operational guardrails
- PII is masked at source; log entries are scrubbed via Logback value filters.
- Retention: 30 days hot (OpenSearch), 365 days cold (object storage) for audit events.
- On-call runbooks document Kibana/Grafana queries by module and correlation ID.
- CI automatically fails if new endpoints lack
traceIdpropagation tests. - Weekly “log review” ensures critical paths emit INFO/WARN/ERROR with actionable fields.