[fix][broker] Release entry on GetLastMessageId when parseMessageMetadata throws#26089
[fix][broker] Release entry on GetLastMessageId when parseMessageMetadata throws#26089SongOf wants to merge 1 commit into
Conversation
lhotari
left a comment
There was a problem hiding this comment.
LGTM. @SongOf Just curios, did you actually run a memory leak issue or is this based on AI analysis of the code base?
For the maven build in branch-4.2 there's a setup for detecting Netty buffer leaks in test runs with reporting in a way that leaks get extracted to a separate report. This currently reports a lot of false positives since test code itself contains buffer leaks.
It would be useful to fix all leaks and then make the build fail if leaks are detected. The solution hasn't been yet ported from the maven build to the gradle build.
From the AI assistant |
Motivation
When a consumer/reader issues
GetLastMessageId, the broker handles it inServerCnx#getLargestBatchIndexWhenPossible: it reads the entry at the lastposition and parses its message metadata to compute the largest batch index.
The entry was released only after the metadata was parsed:
If Commands.parseMessageMetadata() (or getNumMessagesInBatch()) throws on a
corrupt/truncated entry, entry.release() is skipped, leaking the Entry and
its backing direct ByteBuf on every such GetLastMessageId request. The
exception is then reported to the client as a generic MetadataError, which
masks the leak.
Modifications
finally block so the entry is released on every path (success, metadata
parse failure, or any other exception), without changing the error reported
to the client.
so asyncReadEntry returns an entry wrapping a deliberately corrupt 2-byte
buffer (which makes parseMessageMetadata throw at readUnsignedInt), drives
the request via consumer.getLastMessageId(), and asserts the entry's
ByteBuf refCnt drops to 0. The test fails on the unpatched code
(refCnt == 1, i.e. leaked) and passes after the fix.
Verifying this change
This change added tests and can be verified as follows:
reproduces the leak by injecting a corrupt entry so parseMessageMetadata
throws, and asserts the entry's ByteBuf is released (refCnt == 0). It
fails for the real reason on the pre-fix code and passes after it.
Does this pull request potentially affect one of the following parts:
This is an internal behavior change and does not change any public API, schema,
configuration, wire protocol, REST endpoint, CLI option, or metric.