Skip to content

STF-322: Bounded transport-failure retry in WebServiceClient#619

Open
oschwald wants to merge 4 commits intomainfrom
greg/stf-322
Open

STF-322: Bounded transport-failure retry in WebServiceClient#619
oschwald wants to merge 4 commits intomainfrom
greg/stf-322

Conversation

@oschwald
Copy link
Copy Markdown
Member

@oschwald oschwald commented May 2, 2026

Summary

  • Adds WebServiceClient.Builder.maxRetries(int) (defaults to 1) so a single stale-keep-alive Connection reset / Broken pipe / connect-failure is retried transparently.
  • Predicate is narrow: retries HttpConnectTimeoutException, ConnectException, and SocketException matching Connection reset / Broken pipe (walks the cause chain, since the JDK wraps these in a generic IOException). Never retries request-phase HttpTimeoutException, 4xx, or 5xx.
  • Restores the interrupt flag in both InterruptedException rewrap paths. Updates README and CHANGELOG; opt out via .maxRetries(0).

Fixes STF-322. Parallel PR for geoip2-java uses a byte-identical predicate.

Test plan

  • mvn checkstyle:check clean
  • mvn verify — 235/235 pass (24 existing + 8 new WebServiceClientTest cases)
  • Reviewer sanity-checks the predicate ordering (HttpConnectTimeoutException MUST be checked before its parent HttpTimeoutException)
  • Reviewer confirms README guidance points customers at jdk.httpclient.keepalive.timeout as the complementary workaround

🤖 Generated with Claude Code

oschwald and others added 3 commits May 2, 2026 02:43
When the JDK HttpClient pool reuses an idle connection that an intermediary
(load balancer, proxy, NAT) has silently closed, the next send() fails with
"Connection reset" or "Broken pipe". A single retry recovers transparently
without exposing this race to callers. The default keep-alive timeout in the
JDK is longer than many intermediaries' idle timeout, so this mismatch is the
common case.

The retry predicate is intentionally narrow: SocketException
("Connection reset" / "Broken pipe"), ConnectException, and
HttpConnectTimeoutException only. Request-phase HttpTimeoutException, 4xx,
and 5xx responses are NOT retried -- the request may have been processed and
retrying could cause unwanted side effects. The predicate walks the cause
chain because the JDK frequently wraps the underlying SocketException in a
generic IOException ("HTTP/1.1 header parser received no bytes").

Customers can opt out via .maxRetries(0). Default is 1 (one retry, two total
attempts). The interrupt flag is restored before rewrapping
InterruptedException, and a pre-set interrupt short-circuits the predicate.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Cover all 8 scenarios: connection-reset retry on score and reportTransaction
endpoints, no retry on HttpTimeoutException, retry on connect timeout
(deterministic via a closed local ServerSocket), no retry on 4xx/5xx,
.maxRetries(0) opt-out, and pre-interrupt short-circuit.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a transport-failure retry mechanism to the WebServiceClient, defaulting to one retry for specific IOExceptions such as connection resets, broken pipes, and connect timeouts. The implementation includes a new maxRetries configuration in the WebServiceClient.Builder, updated documentation in the README and CHANGELOG, and comprehensive test coverage for various retry scenarios. Feedback was provided regarding a potential integer overflow in the sendWithRetry method; if maxRetries is set to Integer.MAX_VALUE, the loop condition fails and results in a NullPointerException because lastException remains null. A refactor was suggested to use a while loop to avoid this overflow and simplify the logic.

Comment on lines +390 to +403
IOException lastException = null;
int attempts = maxRetries + 1;
for (int i = 0; i < attempts; i++) {
try {
return httpClient.send(request, HttpResponse.BodyHandlers.ofInputStream());
} catch (IOException e) {
if (!isRetriableTransportFailure(e) || i == attempts - 1) {
throw e;
}
lastException = e;
}
}
// Unreachable: loop either returns or throws.
throw lastException;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation of sendWithRetry has a potential NullPointerException due to integer overflow and contains redundant logic.

If maxRetries is set to Integer.MAX_VALUE, the expression maxRetries + 1 overflows to Integer.MIN_VALUE. Consequently, the loop condition i < attempts (i.e., 0 < -2147483648) evaluates to false, the loop is skipped entirely, and the method proceeds to throw lastException;. Since lastException is initialized to null, a NullPointerException is thrown instead of the intended transport error.

Additionally, the lastException variable is redundant because the loop is designed to either return a successful response or throw the current exception e on the final attempt or upon encountering a non-retriable failure. Refactoring to a while loop avoids the overflow risk and simplifies the logic.

    private HttpResponse<InputStream> sendWithRetry(HttpRequest request)
        throws IOException, InterruptedException {
        int attempts = 0;
        while (true) {
            try {
                return httpClient.send(request, HttpResponse.BodyHandlers.ofInputStream());
            } catch (IOException e) {
                if (!isRetriableTransportFailure(e) || attempts >= maxRetries) {
                    throw e;
                }
                attempts++;
            }
        }
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant