Bamboo v0.4 Observability & Resilience Preparation¶
This planning note captures the design targets for the v0.4 release: a Prometheus export surface, resilience middleware, and operational hooks that keep OpenSwoole workers healthy. It establishes contracts before code lands so the middleware pipeline (introduced in v0.3) and the runtime can absorb the changes with minimal refactoring.
/metrics
HTTP endpoint¶
- The route will respond on
GET /metrics
and returntext/plain; version=0.0.4
(the Prometheus text exposition media type). The handler will be registered through the router like any other controller so it benefits from request logging, authentication, or rate limiting middleware when configured. Bamboo\\Core\\ResponseEmitter
will emit metric payloads verbatim without JSON encoding. The handler returns a PSR-7 response with the correct Content-Type header and buffered body so the existing emitter can stream it through OpenSwoole without changes.- The endpoint will expose process-level counters (requests, errors, in-flight connections), timers/histograms for request latency, and gauges for worker state. Instrumentation points will be added in the HTTP kernel and middleware pipeline to increment/update metrics before the response is emitted.
Prometheus text format contract¶
- Responses conform to the Prometheus 0.0.4 text exposition format, including
# HELP
/# TYPE
preamble lines, snake_case metric names with namespace prefixes (e.g.,bamboo_http_requests_total
), and UTF-8 encoded labels. - Histograms will include
_count
,_sum
, and bucket lines. Latency buckets will default to[0.05, 0.1, 0.25, 0.5, 1, 2.5, 5]
seconds, configurable throughetc/metrics.php
. - To avoid per-worker duplication, samples will be aggregated across workers using a shared storage backend rather than per-request formatting.
- Error conditions must yield HTTP
503
responses with an explanatory body so Prometheus scrape failures are obvious. The collector should degrade gracefully if storage backends are unavailable (returning503
or emitting an empty body).
Metrics collection plumbing¶
- Add the Composer dependency
promphp/prometheus_client_php:^2.7
to obtain theCollectorRegistry
, counters, gauges, and histograms. The project already requires APCu; the Prometheus client can use itsAPC
adapter for local development andRedis
storage in clustered deployments. - A new service provider (or optional
bamboo/metrics-prometheus
module) will register the collector registry within the container, expose a middleware to observe request timing, and provide helper services for custom instrumentation. - OpenSwoole workers need a shared collector to avoid per-worker caches. Use the Prometheus
Prometheus\\Storage\\InMemory
driver backed bySwoole\Table
or Redis to aggregate counts. The bootstrap sequence will initialize the storage so forked workers inherit the connection/table descriptor. - The
/metrics
handler will pull samples from the registry and write the body usingPrometheus\\Renderer\\TextFormat
. Because theResponseEmitter
streams raw text, no special casing is required beyond setting the Content-Type header.
Timeout middleware strategy¶
- Implement a
TimeoutMiddleware
that wraps the downstream handler inOpenSwoole\Coroutine::withTimeout()
(or usesSwoole\Timer::after()
fallback when coroutines are unavailable). When the timeout elapses, it aborts the request, increments abamboo_http_timeouts_total
counter, and returns a504
response. - Configuration for default/global timeouts will live in
etc/middleware.php
under atimeouts
group so routes can opt in or override with alias syntax defined in the v0.3 middleware document. - The middleware will emit timing data around the downstream call so latencies are recorded even when the timeout trips. Metrics instrumentation should use the shared collector registered in the container.
Circuit breaker middleware strategy¶
- Introduce a
CircuitBreakerMiddleware
that monitors upstream failures using a rolling window stored inSwoole\Table
or APCu. When thresholds are exceeded (configurable viaetc/middleware.php
), the middleware short-circuits requests and returns503 Service Unavailable
with a retry-after hint. - Middleware ordering matters: circuit breakers must run before expensive work so they will be placed early in the global stack (before authentication or route handlers) but after logging so failures are still recorded.
- State transitions (closed → open → half-open) will publish gauges/counters to Prometheus so operators can correlate breaker activity with upstream outages.
Graceful shutdown & health checks¶
- OpenSwoole exposes
on('Shutdown')
andon('WorkerExit')
hooks. Register listeners that flush pending metrics, mark workers as unhealthy in the readiness registry, and close Redis connections. The HTTP server bootstrap (bootstrap/server.php
) will wire these callbacks during application boot. - Implement lightweight
/healthz
(liveness) and/readyz
(readiness) endpoints. The liveness check returns200
as long as the worker loop is running; readiness reports200
only when critical dependencies (Redis, database, Prometheus storage) are reachable. - A shared
HealthRegistry
service will track worker readiness state. It uses aSwoole\Table
or cache entry updated on start/stop events. The Prometheus exporter will exposebamboo_worker_ready
gauges sourced from this registry. - Graceful shutdown will tie into the CLI command
http.serve
so signals sent to the managed OpenSwoole server first mark readiness as false, stop accepting new connections, wait for in-flight requests to complete (observed via the shared collector), and finally exit.
Integration summary¶
- Dependencies – Composer require
promphp/prometheus_client_php
and enable Redis or Swoole Table storage. Optional extension:ext-apcu
for local storage. - ResponseEmitter – No functional changes; handlers must return PSR responses with the correct headers so the emitter can stream metrics text and health payloads unchanged.
- Middleware pipeline – Timeout and circuit breaker middleware will be registered via
etc/middleware.php
using the v0.3 alias/group conventions, ensuring deterministic ordering with other global middleware. Instrumentation hooks in each middleware feed the shared collector. - Operational hooks – Bootstrap wiring listens for OpenSwoole worker events to maintain readiness state and flush metrics before exit, while new health endpoints give orchestrators clear liveness signals.