STF-322: Bounded transport-failure retry in WebServiceClient#619
STF-322: Bounded transport-failure retry in WebServiceClient#619
Conversation
When the JDK HttpClient pool reuses an idle connection that an intermediary
(load balancer, proxy, NAT) has silently closed, the next send() fails with
"Connection reset" or "Broken pipe". A single retry recovers transparently
without exposing this race to callers. The default keep-alive timeout in the
JDK is longer than many intermediaries' idle timeout, so this mismatch is the
common case.
The retry predicate is intentionally narrow: SocketException
("Connection reset" / "Broken pipe"), ConnectException, and
HttpConnectTimeoutException only. Request-phase HttpTimeoutException, 4xx,
and 5xx responses are NOT retried -- the request may have been processed and
retrying could cause unwanted side effects. The predicate walks the cause
chain because the JDK frequently wraps the underlying SocketException in a
generic IOException ("HTTP/1.1 header parser received no bytes").
Customers can opt out via .maxRetries(0). Default is 1 (one retry, two total
attempts). The interrupt flag is restored before rewrapping
InterruptedException, and a pre-set interrupt short-circuits the predicate.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Cover all 8 scenarios: connection-reset retry on score and reportTransaction endpoints, no retry on HttpTimeoutException, retry on connect timeout (deterministic via a closed local ServerSocket), no retry on 4xx/5xx, .maxRetries(0) opt-out, and pre-interrupt short-circuit. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
There was a problem hiding this comment.
Code Review
This pull request introduces a transport-failure retry mechanism to the WebServiceClient, defaulting to one retry for specific IOExceptions such as connection resets, broken pipes, and connect timeouts. The implementation includes a new maxRetries configuration in the WebServiceClient.Builder, updated documentation in the README and CHANGELOG, and comprehensive test coverage for various retry scenarios. Feedback was provided regarding a potential integer overflow in the sendWithRetry method; if maxRetries is set to Integer.MAX_VALUE, the loop condition fails and results in a NullPointerException because lastException remains null. A refactor was suggested to use a while loop to avoid this overflow and simplify the logic.
| IOException lastException = null; | ||
| int attempts = maxRetries + 1; | ||
| for (int i = 0; i < attempts; i++) { | ||
| try { | ||
| return httpClient.send(request, HttpResponse.BodyHandlers.ofInputStream()); | ||
| } catch (IOException e) { | ||
| if (!isRetriableTransportFailure(e) || i == attempts - 1) { | ||
| throw e; | ||
| } | ||
| lastException = e; | ||
| } | ||
| } | ||
| // Unreachable: loop either returns or throws. | ||
| throw lastException; |
There was a problem hiding this comment.
The current implementation of sendWithRetry has a potential NullPointerException due to integer overflow and contains redundant logic.
If maxRetries is set to Integer.MAX_VALUE, the expression maxRetries + 1 overflows to Integer.MIN_VALUE. Consequently, the loop condition i < attempts (i.e., 0 < -2147483648) evaluates to false, the loop is skipped entirely, and the method proceeds to throw lastException;. Since lastException is initialized to null, a NullPointerException is thrown instead of the intended transport error.
Additionally, the lastException variable is redundant because the loop is designed to either return a successful response or throw the current exception e on the final attempt or upon encountering a non-retriable failure. Refactoring to a while loop avoids the overflow risk and simplifies the logic.
private HttpResponse<InputStream> sendWithRetry(HttpRequest request)
throws IOException, InterruptedException {
int attempts = 0;
while (true) {
try {
return httpClient.send(request, HttpResponse.BodyHandlers.ofInputStream());
} catch (IOException e) {
if (!isRetriableTransportFailure(e) || attempts >= maxRetries) {
throw e;
}
attempts++;
}
}
}
Summary
WebServiceClient.Builder.maxRetries(int)(defaults to 1) so a single stale-keep-aliveConnection reset/Broken pipe/ connect-failure is retried transparently.HttpConnectTimeoutException,ConnectException, andSocketExceptionmatchingConnection reset/Broken pipe(walks the cause chain, since the JDK wraps these in a genericIOException). Never retries request-phaseHttpTimeoutException, 4xx, or 5xx.InterruptedExceptionrewrap paths. Updates README and CHANGELOG; opt out via.maxRetries(0).Fixes STF-322. Parallel PR for
geoip2-javauses a byte-identical predicate.Test plan
mvn checkstyle:checkcleanmvn verify— 235/235 pass (24 existing + 8 newWebServiceClientTestcases)HttpConnectTimeoutExceptionMUST be checked before its parentHttpTimeoutException)jdk.httpclient.keepalive.timeoutas the complementary workaround🤖 Generated with Claude Code