diff --git a/pip/pip-487.md b/pip/pip-487.md new file mode 100644 index 0000000000000..daa7c322fe173 --- /dev/null +++ b/pip/pip-487.md @@ -0,0 +1,146 @@ +# PIP-487: Add event count metrics for InflightReadsLimiter acquire and release operations + +# Background knowledge + +The **InflightReadsLimiter** is a component in the managed ledger cache layer that limits the total amount of +in-flight data read from storage (BookKeeper) or cache. It works as a semaphore over bytes: each read +operation acquires a certain number of byte "permits" before proceeding, and releases them once the data has +been delivered to the client. This prevents excessive memory pressure when many consumers request large +amounts of data simultaneously. + +Key concepts: +- `maxReadsInFlightSize`: the total byte capacity of the limiter. +- `remainingBytes`: the current number of free bytes available for new read requests. +- Permits can exceed `maxReadsInFlightSize`; in that case the request is capped at the maximum capacity. +- When no permits are available, acquire requests are queued and fulfilled as permits become available (via + release operations). + +Existing metrics: +- `pulsar.broker.managed_ledger.inflight.read.limit` — the configured maximum capacity (bytes). +- `pulsar.broker.managed_ledger.inflight.read.usage` — the current used and free bytes. + +The code resides in `InflightReadsLimiter.java` within the `managed-ledger` module. + +# Motivation + +The existing usage metric (`pulsar.broker.managed_ledger.inflight.read.usage`) reports the instantaneous +free/used bytes via an OTEL observable counter (callback-based). While this is useful for understanding +current utilization, it is not sufficient for alerting on a **permits leak** — a scenario where a bug +prevents acquired permits from ever being released, causing `remainingBytes` to stay at zero permanently. + +**Why the existing metrics are insufficient for alerting:** + +Consider a metrics scrape interval of 30 seconds. If `remainingBytes` is observed as 0 at multiple scrape +points, this could be explained by: + +1. **A permits leak (bug):** permits were acquired but never released, so `remainingBytes` is truly stuck + at 0. +2. **High but legitimate read pressure:** many read requests are continuously acquiring and releasing + permits, and by chance the scrapes always happen to catch `remainingBytes` at 0. + +With only instantaneous usage data, operators cannot distinguish between these two scenarios. However, with an **event count metric** that increments on every release, combined with the existing +`remainingBytes`, an operator can accurately detect a leak: + +- If `remainingBytes` is 0 **and** `release.count` has not increased for an extended period, a permits + leak is likely — permits were acquired but are never being returned. + +# Goals + +## In Scope + +- Add a cumulative counter metric that increments each time `remainingBytes` decreases (i.e., permits are + acquired). +- Add a cumulative counter metric that increments each time `remainingBytes` increases (i.e., permits are + released). +- Enable operators to combine these event counters with the existing usage metric to accurately detect + permits leaks. + +# High Level Design + +Add two new OTEL `LongCounter` metrics to `InflightReadsLimiter`: + +| Metric | Trigger | Type | +|--------|---------|------| +| `acquire.count` | Incremented each time `remainingBytes` is decreased | Cumulative counter | +| `release.count` | Incremented each time `remainingBytes` is increased | Cumulative counter | + +These are **event counters** — each individual acquire or release event increments the counter by 1, +regardless of the number of bytes involved. This allows operators to compare the rate of acquire vs. +release events (e.g., via `rate()` in Prometheus) to detect imbalances. + +When the limiter is disabled (`maxReadsInFlightSize <= 0`), the counters are still registered but never +incremented, since the `acquire()` method short-circuits and `release()` becomes a no-op in the disabled +state. + +## Public-facing Changes + +### Public API + +No changes. + +### Binary protocol + +No changes. + +### Configuration + +No changes. + +### CLI + +No changes. + +### Metrics + +| Full name | Description | Attributes | Unit | +|-----------|-------------|------------|------| +| `pulsar.broker.managed_ledger.inflight.read.acquire.count` | The number of times inflight read permits were acquired, decreasing the remaining bytes. | _(none)_ | `{event}` | +| `pulsar.broker.managed_ledger.inflight.read.release.count` | The number of times inflight read permits were released, increasing the remaining bytes. | _(none)_ | `{event}` | + +# Monitoring + +**Alerting on a permits leak:** + +Operators can set up an alert that fires when: + +1. `pulsar.broker.managed_ledger.inflight.read.usage{state="free"} == 0` (no free capacity), **AND** +2. `pulsar.broker.managed_ledger.inflight.read.release.count` has not increased for an extended period. + +A specific Prometheus alert rule example: + +```promql +pulsar_broker_managed_ledger_inflight_read_usage_free == 0 +and +rate(pulsar_broker_managed_ledger_inflight_read_release_count_total[5m]) == 0 +``` + +This fires when free bytes are stuck at zero and no releases have occurred in the last 5 minutes. +Adjust the time window based on expected workload and scraping interval. + +# Security Considerations + +No security implications. These are read-only metrics exposed via the existing OpenTelemetry metrics +infrastructure, which follows the same authentication and authorization as all other broker metrics. + +# Backward & Forward Compatibility + +## Upgrade + +No special upgrade steps required. The new metrics will become available immediately upon broker restart +with the new version. + +## Downgrade / Rollback + +Rolling back to a previous version is safe. The new metrics simply disappear; no metric names are changed +or removed. + +## Pulsar Geo-Replication Upgrade & Downgrade/Rollback Considerations + +No geo-replication impact. Metrics are per-broker and local. + +# General Notes + +# Links + +* Mailing List discussion thread: https://lists.apache.org/thread/nl1ropc9zd2ttxj06f2s0oxjdcg59sqk +* Mailing List voting thread: