Add configurable queue size offset to adaptive routing hybrid score by timothy-e · Pull Request #18288 · apache/pinot

timothy-e · 2026-04-22T13:49:59Z

Adaptive routing uses the following formula to score

JsonProperty("hybridScore")
public double computeHybridScore() {
  double estimatedQSize = _numInFlightRequests + _inFlighRequestsEMA.getAverage();
  return Math.pow(estimatedQSize, _hybridScoreExponent) * _latencyMsEMA.getAverage();
}

EMA = exponentially weighted moving average.

This term is pretty unstable if the number of inflight requests is low (e.g. Stripe's test cluster often has 0 inflight requests, even at 80 RPS). I was thinking that this formula might be better as (1 + A + B)^3 * C , so the server latency is always accounted for in the score.

Looking at the paper that was the original source for the formula, their "queue-size estimation" included a +1 term. It seems like it was a bug in the original adaptive routing implementation. This +1 term was also mentioned in the original feature design doc.

To avoid causing any behavioural regressions for existing users of adaptive routing, instead of just adding a +1 to the formula, we should add a parameter defaulting to 0.

On high concurrency clusters, this should have marginal impact. (if num_inflight_requests ~= 100, now it would be 101). On low concurrency clusters, we should have more meaningful scores. This way, multiple idle servers would still have a score ~= their latency, instead of all having a score of 0.

We tested on an internal cluster and saw the adaptive routing scores stabilize on our low concurrency test cluster.

…pache#601) cc stripe-private-oss-forks/pinot-reviewers r? dang saiswapnilar Adaptive routing uses the following formula to score ``` JsonProperty("hybridScore") public double computeHybridScore() { double estimatedQSize = _numInFlightRequests + _inFlighRequestsEMA.getAverage(); return Math.pow(estimatedQSize, _hybridScoreExponent) * _latencyMsEMA.getAverage(); } ``` `EMA` = exponentially weighted moving average. This term is pretty unstable if the number of inflight requests is low (e.g. canary often has 0 inflight requests, even at 80 RPS). I was thinking that this formula might be better as (1 + A + B)^3 * C , so the server latency is always accounted for in the score. Looking at the [paper](https://www.usenix.org/system/files/conference/nsdi15/nsdi15-paper-suresh.pdf) that was the original source for the formula, their "queue-size estimation" included a +1 term. It seems like it was a bug in the original adaptive routing implementation. To avoid causing any behavioural regressions for existing users of adaptive routing, instead of just adding a `+1` to the formula, we should add a parameter defaulting to 0. On high concurrency clusters, this should have marginal impact. (if num_inflight_requests ~= 100, now it would be 101). On low concurrency clusters, we should have more meaningful scores. This way, multiple idle servers would still have a score ~= their latency, instead of all having a score of 0. https://git.corp.stripe.com/stripe-internal/mint/pull/2092658 Stripe-Original-Repo: stripe-private-oss-forks/pinot Stripe-Monotonic-Timestamp: v2/2026-04-17T20:39:51Z/0 Stripe-Original-PR: https://git.corp.stripe.com/stripe-private-oss-forks/pinot/pull/601

codecov-commenter · 2026-04-22T14:41:16Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 63.64%. Comparing base (9cd44c4) to head (2864494).
⚠️ Report is 7 commits behind head on master.

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #18288      +/-   ##
============================================
+ Coverage     63.59%   63.64%   +0.05%     
  Complexity     1659     1659              
============================================
  Files          3244     3244              
  Lines        197390   197392       +2     
  Branches      30555    30555              
============================================
+ Hits         125528   125631     +103     
+ Misses        61826    61718     -108     
- Partials      10036    10043       +7

Flag	Coverage Δ
custom-integration1	`100.00% <ø> (ø)`
integration	`100.00% <ø> (ø)`
integration1	`100.00% <ø> (ø)`
integration2	`0.00% <ø> (ø)`
java-11	`63.61% <100.00%> (+0.05%)`	⬆️
java-21	`63.56% <100.00%> (-0.01%)`	⬇️
temurin	`63.64% <100.00%> (+0.05%)`	⬆️
unittests	`63.64% <100.00%> (+0.05%)`	⬆️
unittests1	`55.57% <100.00%> (+<0.01%)`	⬆️
unittests2	`35.09% <100.00%> (+0.05%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

timothy-e · 2026-04-24T18:38:53Z

@Jackie-Jiang @yashmayya can you take a look at this?

timothy-e mentioned this pull request Apr 22, 2026

docs: describe adaptive routing queue size offset pinot-contrib/pinot-docs#758

Open

timothy-e marked this pull request as ready for review April 22, 2026 20:10

timothye-gh mentioned this pull request Apr 22, 2026

Adaptive server selection support for multi-stage engine #16253

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add configurable queue size offset to adaptive routing hybrid score#18288

Add configurable queue size offset to adaptive routing hybrid score#18288
timothy-e wants to merge 1 commit intoapache:masterfrom
timothy-e:timothy-e-mse-formula

timothy-e commented Apr 22, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented Apr 22, 2026 •

edited

Loading

Uh oh!

timothy-e commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

timothy-e commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

timothy-e commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

timothy-e commented Apr 22, 2026 •

edited

Loading

codecov-commenter commented Apr 22, 2026 •

edited

Loading