Metric bloat in _node/stats when a pipeline fails to start with config.reload.automatic: true
Summary
When a pipeline fails to start (e.g., due to an input plugin configuration error like an invalid SSL certificate) and config.reload.automatic: true is enabled, Logstash continually retries starting the pipeline. During this failure loop, the pipeline's metrics are not cleared from the metric store. This causes the metrics to accumulate with each retry, rapidly bloating the _node/stats API response.
A customer observed this behavior on Logstash 8.19.5, where the _node/stats API output grew to ~28MB, with a single failing pipeline accumulating over 150,000 plain codec metric entries. This caused the monitoring API calls to take over 1 minute to execute.
This is highly related to issue #16202 (fixed in PR #16264 for Logstash 8.14.3), which addressed metric accumulation during pipeline reloads. However, as noted in a comment on that PR:
"JavaPipeline#clear_pipeline_metrics is only called for JavaPipeline#shutdown, and does not occur when a pipeline fails to start."
This explains why the issue persists for pipelines that fail to start rather than reloading successfully.
Steps to Reproduce
- Configure Logstash with
config.reload.automatic: true.
- Create a pipeline with a configuration error that prevents it from starting (e.g., a beats input with an invalid SSL certificate path).
- Start Logstash.
- Observe the continuous failure and retry loop in the logs.
- Poll the
_node/stats API and observe the codecs metrics array for the failing pipeline continually growing with new dynamically generated IDs (e.g., plain_<uuid>).
Expected Behavior
When a pipeline fails to start and is subsequently retried by the automatic reloader, its partially accumulated metrics (such as instantiated codecs) should be cleared from the metric store so they do not leak and bloat the _node/stats API payload.
Actual Behavior
The metrics are not cleared on startup failure. Every retry generates new metric entries, leading to unbounded growth of the _node/stats response size.
Environment
- Logstash Version: 8.19.5
- Operating System: Linux
- Config:
config.reload.automatic: true
Supporting Evidence
Log snippet showing the failure loop:
[2026-04-15T08:34:19,239][INFO ][logstash.inputs.beats ][eicp_ds_nonprod-euwest1-65445] Starting input listener {:address=>"0.0.0.0:65445"}
[2026-04-15T08:34:19,240][ERROR][logstash.inputs.beats ][eicp_ds_nonprod-euwest1-65445] SSL configuration invalid {:exception=>Java::JavaLang::IllegalArgumentException, :message=>"File does not contain valid private key: /etc/logstash/server.p8.key", :cause=>{:exception=>Java::JavaIo::IOException, :message=>"overrun, bytes = 1195"}}
[2026-04-15T08:34:20,213][INFO ][logstash.javapipeline ][eicp_ds_nonprod-euwest1-65445] Pipeline terminated {"pipeline.id"=>"eicp_ds_nonprod-euwest1-65445"}
[2026-04-15T08:34:24,361][INFO ][logstash.inputs.beats ][eicp_ds_nonprod-euwest1-65445] Starting input listener {:address=>"0.0.0.0:65445"}
[2026-04-15T08:34:24,362][ERROR][logstash.inputs.beats ][eicp_ds_nonprod-euwest1-65445] SSL configuration invalid ...
[2026-04-15T08:34:25,338][INFO ][logstash.javapipeline ][eicp_ds_nonprod-euwest1-65445] Pipeline terminated {"pipeline.id"=>"eicp_ds_nonprod-euwest1-65445"}
Metrics output showing duplicated codec IDs for the single pipeline:
{
"encode": { "duration_in_millis": 0, "writes_in": 0 },
"decode": { "duration_in_millis": 0, "out": 0, "writes_in": 0 },
"id": "plain_29113491-507d-49b3-ace1-7a2bea39b8fe",
"name": "plain"
}
{
"encode": { "duration_in_millis": 0, "writes_in": 0 },
"decode": { "duration_in_millis": 0, "out": 0, "writes_in": 0 },
"id": "plain_27202272-c8ec-4a47-b87b-a7d940c79b25",
"name": "plain"
}
(In the reported case, this array grew to over 156,000 entries.)
Metric bloat in
_node/statswhen a pipeline fails to start withconfig.reload.automatic: trueSummary
When a pipeline fails to start (e.g., due to an input plugin configuration error like an invalid SSL certificate) and
config.reload.automatic: trueis enabled, Logstash continually retries starting the pipeline. During this failure loop, the pipeline's metrics are not cleared from the metric store. This causes the metrics to accumulate with each retry, rapidly bloating the_node/statsAPI response.A customer observed this behavior on Logstash 8.19.5, where the
_node/statsAPI output grew to ~28MB, with a single failing pipeline accumulating over 150,000plaincodec metric entries. This caused the monitoring API calls to take over 1 minute to execute.This is highly related to issue #16202 (fixed in PR #16264 for Logstash 8.14.3), which addressed metric accumulation during pipeline reloads. However, as noted in a comment on that PR:
This explains why the issue persists for pipelines that fail to start rather than reloading successfully.
Steps to Reproduce
config.reload.automatic: true._node/statsAPI and observe thecodecsmetrics array for the failing pipeline continually growing with new dynamically generated IDs (e.g.,plain_<uuid>).Expected Behavior
When a pipeline fails to start and is subsequently retried by the automatic reloader, its partially accumulated metrics (such as instantiated codecs) should be cleared from the metric store so they do not leak and bloat the
_node/statsAPI payload.Actual Behavior
The metrics are not cleared on startup failure. Every retry generates new metric entries, leading to unbounded growth of the
_node/statsresponse size.Environment
config.reload.automatic: trueSupporting Evidence
Log snippet showing the failure loop:
Metrics output showing duplicated codec IDs for the single pipeline:
{ "encode": { "duration_in_millis": 0, "writes_in": 0 }, "decode": { "duration_in_millis": 0, "out": 0, "writes_in": 0 }, "id": "plain_29113491-507d-49b3-ace1-7a2bea39b8fe", "name": "plain" } { "encode": { "duration_in_millis": 0, "writes_in": 0 }, "decode": { "duration_in_millis": 0, "out": 0, "writes_in": 0 }, "id": "plain_27202272-c8ec-4a47-b87b-a7d940c79b25", "name": "plain" }(In the reported case, this array grew to over 156,000 entries.)