STF-322: Bounded transport-failure retry in WebServiceClient#693
STF-322: Bounded transport-failure retry in WebServiceClient#693
Conversation
When the JDK HttpClient pool reuses an idle connection that an intermediary
(load balancer, proxy, NAT) has silently closed, the next send() fails with
"Connection reset" or "Broken pipe". A single retry recovers transparently
without exposing this race to callers. The default keep-alive timeout in the
JDK is longer than many intermediaries' idle timeout, so this mismatch is the
common case.
The retry predicate is intentionally narrow: SocketException
("Connection reset" / "Broken pipe"), ConnectException, and
HttpConnectTimeoutException only. Request-phase HttpTimeoutException, 4xx,
and 5xx responses are NOT retried -- the request may have been processed and
retrying could cause unwanted side effects. The predicate walks the cause
chain because the JDK frequently wraps the underlying SocketException in a
generic IOException ("HTTP/1.1 header parser received no bytes").
Customers can opt out via .maxRetries(0). Default is 1 (one retry, two total
attempts). The interrupt flag is restored before rewrapping
InterruptedException, and a pre-set interrupt short-circuits the predicate.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Cover all 9 scenarios: connection-reset retry on country, city, and insights endpoints, no retry on HttpTimeoutException, retry on connect timeout (deterministic via a closed local ServerSocket), no retry on 4xx/5xx, .maxRetries(0) opt-out, and pre-interrupt short-circuit. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
There was a problem hiding this comment.
Code Review
This pull request introduces a configurable retry mechanism for transport-level failures in the WebServiceClient, defaulting to one retry for issues like connection resets or broken pipes. The implementation includes updates to the Builder for maxRetries configuration, detailed documentation in the README, and comprehensive test coverage for various failure scenarios. Feedback suggests refactoring the retry loop to prevent potential integer overflow and adopting case-insensitive string matching for exception messages to improve robustness across different environments.
| private HttpResponse<InputStream> sendWithRetry(HttpRequest request) | ||
| throws IOException, InterruptedException { | ||
| IOException lastException = null; | ||
| int attempts = maxRetries + 1; | ||
| for (int i = 0; i < attempts; i++) { | ||
| try { | ||
| return httpClient.send(request, HttpResponse.BodyHandlers.ofInputStream()); | ||
| } catch (IOException e) { | ||
| if (!isRetriableTransportFailure(e) || i == attempts - 1) { | ||
| throw e; | ||
| } | ||
| lastException = e; | ||
| } | ||
| } | ||
| // Unreachable: loop either returns or throws. | ||
| throw lastException; | ||
| } |
There was a problem hiding this comment.
The current implementation of sendWithRetry is susceptible to an integer overflow if maxRetries is set to Integer.MAX_VALUE. In such a case, attempts would become Integer.MIN_VALUE, the loop condition i < attempts would be immediately false, and the method would throw a NullPointerException when reaching throw lastException because lastException remains null. Additionally, the lastException variable and the final throw statement are redundant as the loop is designed to either return a response or throw the last encountered exception. Refactoring the loop to check the retry count directly within the catch block improves robustness and clarity.
private HttpResponse<InputStream> sendWithRetry(HttpRequest request)
throws IOException, InterruptedException {
for (int i = 0; ; i++) {
try {
return httpClient.send(request, HttpResponse.BodyHandlers.ofInputStream());
} catch (IOException e) {
if (i >= maxRetries || !isRetriableTransportFailure(e)) {
throw e;
}
}
}
}| if (t instanceof SocketException) { | ||
| String msg = t.getMessage(); | ||
| return msg != null | ||
| && (msg.contains("Connection reset") || msg.contains("Broken pipe")); |
There was a problem hiding this comment.
The message check for SocketException is case-sensitive. While "Connection reset" and "Broken pipe" are standard messages, they can vary in casing or include additional text depending on the operating system and JDK version (e.g., "connection reset by peer"). Using a case-insensitive check would make the retry logic more robust across different environments.
| && (msg.contains("Connection reset") || msg.contains("Broken pipe")); | |
| && (msg.toLowerCase(java.util.Locale.ROOT).contains("connection reset") || msg.toLowerCase(java.util.Locale.ROOT).contains("broken pipe")); |
Mirrors minfraud-api-java's mise.toml and mise.lock so Java and Maven versions are pinned and auto-installed on directory entry. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Summary
WebServiceClient.Builder.maxRetries(int)(defaults to 1) so a single stale-keep-aliveConnection reset/Broken pipe/ connect-failure is retried transparently.HttpConnectTimeoutException,ConnectException, andSocketExceptionmatchingConnection reset/Broken pipe(walks the cause chain, since the JDK wraps these in a genericIOException). Never retries request-phaseHttpTimeoutException, 4xx, or 5xx.InterruptedExceptionrewrap path. Updates README and CHANGELOG; opt out via.maxRetries(0).Fixes STF-322. Parallel PR for
minfraud-api-javauses a byte-identical predicate (maxmind/minfraud-api-java#619).Test plan
mvn checkstyle:checkcleanmvn verify -Dgpg.skip=true— 102/102 pass (28 existing + 9 newWebServiceClientTestcases)HttpConnectTimeoutExceptionMUST be checked before its parentHttpTimeoutException)jdk.httpclient.keepalive.timeoutas the complementary workaround🤖 Generated with Claude Code