Skip to content

STF-322: Bounded transport-failure retry in WebServiceClient#693

Open
oschwald wants to merge 5 commits intomainfrom
greg/stf-322
Open

STF-322: Bounded transport-failure retry in WebServiceClient#693
oschwald wants to merge 5 commits intomainfrom
greg/stf-322

Conversation

@oschwald
Copy link
Copy Markdown
Member

@oschwald oschwald commented May 2, 2026

Summary

  • Adds WebServiceClient.Builder.maxRetries(int) (defaults to 1) so a single stale-keep-alive Connection reset / Broken pipe / connect-failure is retried transparently.
  • Predicate is narrow: retries HttpConnectTimeoutException, ConnectException, and SocketException matching Connection reset / Broken pipe (walks the cause chain, since the JDK wraps these in a generic IOException). Never retries request-phase HttpTimeoutException, 4xx, or 5xx.
  • Restores the interrupt flag in the InterruptedException rewrap path. Updates README and CHANGELOG; opt out via .maxRetries(0).

Fixes STF-322. Parallel PR for minfraud-api-java uses a byte-identical predicate (maxmind/minfraud-api-java#619).

Test plan

  • mvn checkstyle:check clean
  • mvn verify -Dgpg.skip=true — 102/102 pass (28 existing + 9 new WebServiceClientTest cases)
  • Reviewer sanity-checks the predicate ordering (HttpConnectTimeoutException MUST be checked before its parent HttpTimeoutException)
  • Reviewer confirms README guidance points customers at jdk.httpclient.keepalive.timeout as the complementary workaround

🤖 Generated with Claude Code

oschwald and others added 3 commits May 2, 2026 02:43
When the JDK HttpClient pool reuses an idle connection that an intermediary
(load balancer, proxy, NAT) has silently closed, the next send() fails with
"Connection reset" or "Broken pipe". A single retry recovers transparently
without exposing this race to callers. The default keep-alive timeout in the
JDK is longer than many intermediaries' idle timeout, so this mismatch is the
common case.

The retry predicate is intentionally narrow: SocketException
("Connection reset" / "Broken pipe"), ConnectException, and
HttpConnectTimeoutException only. Request-phase HttpTimeoutException, 4xx,
and 5xx responses are NOT retried -- the request may have been processed and
retrying could cause unwanted side effects. The predicate walks the cause
chain because the JDK frequently wraps the underlying SocketException in a
generic IOException ("HTTP/1.1 header parser received no bytes").

Customers can opt out via .maxRetries(0). Default is 1 (one retry, two total
attempts). The interrupt flag is restored before rewrapping
InterruptedException, and a pre-set interrupt short-circuits the predicate.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Cover all 9 scenarios: connection-reset retry on country, city, and insights
endpoints, no retry on HttpTimeoutException, retry on connect timeout
(deterministic via a closed local ServerSocket), no retry on 4xx/5xx,
.maxRetries(0) opt-out, and pre-interrupt short-circuit.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a configurable retry mechanism for transport-level failures in the WebServiceClient, defaulting to one retry for issues like connection resets or broken pipes. The implementation includes updates to the Builder for maxRetries configuration, detailed documentation in the README, and comprehensive test coverage for various failure scenarios. Feedback suggests refactoring the retry loop to prevent potential integer overflow and adopting case-insensitive string matching for exception messages to improve robustness across different environments.

Comment on lines +425 to +441
private HttpResponse<InputStream> sendWithRetry(HttpRequest request)
throws IOException, InterruptedException {
IOException lastException = null;
int attempts = maxRetries + 1;
for (int i = 0; i < attempts; i++) {
try {
return httpClient.send(request, HttpResponse.BodyHandlers.ofInputStream());
} catch (IOException e) {
if (!isRetriableTransportFailure(e) || i == attempts - 1) {
throw e;
}
lastException = e;
}
}
// Unreachable: loop either returns or throws.
throw lastException;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation of sendWithRetry is susceptible to an integer overflow if maxRetries is set to Integer.MAX_VALUE. In such a case, attempts would become Integer.MIN_VALUE, the loop condition i < attempts would be immediately false, and the method would throw a NullPointerException when reaching throw lastException because lastException remains null. Additionally, the lastException variable and the final throw statement are redundant as the loop is designed to either return a response or throw the last encountered exception. Refactoring the loop to check the retry count directly within the catch block improves robustness and clarity.

    private HttpResponse<InputStream> sendWithRetry(HttpRequest request)
        throws IOException, InterruptedException {
        for (int i = 0; ; i++) {
            try {
                return httpClient.send(request, HttpResponse.BodyHandlers.ofInputStream());
            } catch (IOException e) {
                if (i >= maxRetries || !isRetriableTransportFailure(e)) {
                    throw e;
                }
            }
        }
    }

if (t instanceof SocketException) {
String msg = t.getMessage();
return msg != null
&& (msg.contains("Connection reset") || msg.contains("Broken pipe"));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The message check for SocketException is case-sensitive. While "Connection reset" and "Broken pipe" are standard messages, they can vary in casing or include additional text depending on the operating system and JDK version (e.g., "connection reset by peer"). Using a case-insensitive check would make the retry logic more robust across different environments.

Suggested change
&& (msg.contains("Connection reset") || msg.contains("Broken pipe"));
&& (msg.toLowerCase(java.util.Locale.ROOT).contains("connection reset") || msg.toLowerCase(java.util.Locale.ROOT).contains("broken pipe"));

oschwald and others added 2 commits May 2, 2026 02:52
Mirrors minfraud-api-java's mise.toml and mise.lock so Java and Maven
versions are pinned and auto-installed on directory entry.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant