Skip to content

Metric bloat in _node/stats when a pipeline fails to start with config.reload.automatic: true #19009

@nerophon

Description

@nerophon

Metric bloat in _node/stats when a pipeline fails to start with config.reload.automatic: true

Summary

When a pipeline fails to start (e.g., due to an input plugin configuration error like an invalid SSL certificate) and config.reload.automatic: true is enabled, Logstash continually retries starting the pipeline. During this failure loop, the pipeline's metrics are not cleared from the metric store. This causes the metrics to accumulate with each retry, rapidly bloating the _node/stats API response.

A customer observed this behavior on Logstash 8.19.5, where the _node/stats API output grew to ~28MB, with a single failing pipeline accumulating over 150,000 plain codec metric entries. This caused the monitoring API calls to take over 1 minute to execute.

This is highly related to issue #16202 (fixed in PR #16264 for Logstash 8.14.3), which addressed metric accumulation during pipeline reloads. However, as noted in a comment on that PR:

"JavaPipeline#clear_pipeline_metrics is only called for JavaPipeline#shutdown, and does not occur when a pipeline fails to start."

This explains why the issue persists for pipelines that fail to start rather than reloading successfully.

Steps to Reproduce

  1. Configure Logstash with config.reload.automatic: true.
  2. Create a pipeline with a configuration error that prevents it from starting (e.g., a beats input with an invalid SSL certificate path).
  3. Start Logstash.
  4. Observe the continuous failure and retry loop in the logs.
  5. Poll the _node/stats API and observe the codecs metrics array for the failing pipeline continually growing with new dynamically generated IDs (e.g., plain_<uuid>).

Expected Behavior

When a pipeline fails to start and is subsequently retried by the automatic reloader, its partially accumulated metrics (such as instantiated codecs) should be cleared from the metric store so they do not leak and bloat the _node/stats API payload.

Actual Behavior

The metrics are not cleared on startup failure. Every retry generates new metric entries, leading to unbounded growth of the _node/stats response size.

Environment

  • Logstash Version: 8.19.5
  • Operating System: Linux
  • Config: config.reload.automatic: true

Supporting Evidence

Log snippet showing the failure loop:

[2026-04-15T08:34:19,239][INFO ][logstash.inputs.beats    ][eicp_ds_nonprod-euwest1-65445] Starting input listener {:address=>"0.0.0.0:65445"}
[2026-04-15T08:34:19,240][ERROR][logstash.inputs.beats    ][eicp_ds_nonprod-euwest1-65445] SSL configuration invalid {:exception=>Java::JavaLang::IllegalArgumentException, :message=>"File does not contain valid private key: /etc/logstash/server.p8.key", :cause=>{:exception=>Java::JavaIo::IOException, :message=>"overrun, bytes = 1195"}}
[2026-04-15T08:34:20,213][INFO ][logstash.javapipeline    ][eicp_ds_nonprod-euwest1-65445] Pipeline terminated {"pipeline.id"=>"eicp_ds_nonprod-euwest1-65445"}
[2026-04-15T08:34:24,361][INFO ][logstash.inputs.beats    ][eicp_ds_nonprod-euwest1-65445] Starting input listener {:address=>"0.0.0.0:65445"}
[2026-04-15T08:34:24,362][ERROR][logstash.inputs.beats    ][eicp_ds_nonprod-euwest1-65445] SSL configuration invalid ...
[2026-04-15T08:34:25,338][INFO ][logstash.javapipeline    ][eicp_ds_nonprod-euwest1-65445] Pipeline terminated {"pipeline.id"=>"eicp_ds_nonprod-euwest1-65445"}

Metrics output showing duplicated codec IDs for the single pipeline:

{
  "encode": { "duration_in_millis": 0, "writes_in": 0 },
  "decode": { "duration_in_millis": 0, "out": 0, "writes_in": 0 },
  "id": "plain_29113491-507d-49b3-ace1-7a2bea39b8fe",
  "name": "plain"
}
{
  "encode": { "duration_in_millis": 0, "writes_in": 0 },
  "decode": { "duration_in_millis": 0, "out": 0, "writes_in": 0 },
  "id": "plain_27202272-c8ec-4a47-b87b-a7d940c79b25",
  "name": "plain"
}

(In the reported case, this array grew to over 156,000 entries.)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions