Skip to content

Add POST /query/sql/validateSyntax broker endpoint#18281

Open
deeppatel710 wants to merge 1 commit intoapache:masterfrom
deeppatel710:feature/broker-validate-syntax
Open

Add POST /query/sql/validateSyntax broker endpoint#18281
deeppatel710 wants to merge 1 commit intoapache:masterfrom
deeppatel710:feature/broker-validate-syntax

Conversation

@deeppatel710
Copy link
Copy Markdown
Contributor

Summary

Adds a lightweight, parse-only syntax validation endpoint to the broker, as proposed in #17615.

The new POST /query/sql/validateSyntax endpoint runs the query through Pinot's Calcite-based SQL parser (RequestUtils.parseQuery) without consulting table metadata, performing semantic validation, or executing the query. It supports both single-stage and multi-stage queries and all PinotSqlType values (DQL/DML/DDL/DCL).

Motivation

Per #17615: Calcite upgrades occasionally introduce new reserved keywords that cause parse failures against previously-valid queries (see #16295 for a prior incident). This endpoint lets operators replay a corpus of queries against a canary broker to detect parser regressions before rolling a new Calcite version into production, without needing any table setup, schema, or query execution.

There are related endpoints, but none solve this use case:

  • POST /query/sql/queryFingerprint (broker) — parses but also generates a fingerprint, and only accepts DQL.
  • POST /validateMultiStageQuery (controller) — does more than parsing and is multi-stage-only.

Contract

Request:

POST /query/sql/validateSyntax
{"sql": "SELECT * FROM t WHERE id > 10"}

Response (valid):

200 OK
{"valid": true, "sqlType": "DQL"}

Response (invalid syntax):

200 OK
{"valid": false, "errorMessage": "..."}

Response (missing sql field):

400 Bad Request
{"error": "Payload is missing the query string field 'sql'"}

Design note: both valid and invalid parse outcomes return HTTP 200 with the result in the body. This keeps clients from having to distinguish transport errors from parse errors when replaying query corpuses at scale. 4xx/5xx are reserved for malformed requests and uncaught server errors.

Changes

  • pinot-broker/.../PinotClientRequest.java — new endpoint method and private validateSqlSyntax(ObjectNode) helper.
  • pinot-broker/.../SqlSyntaxValidationResponse.java — new response POJO with valid / sqlType / errorMessage fields.
  • pinot-broker/.../PinotClientRequestTest.java — tests for single-stage valid, multi-stage valid, invalid syntax, and missing sql field.

Test plan

  • Unit tests for single-stage valid query (returns DQL, no error)
  • Unit tests for multi-stage valid query (SET useMultistageEngine=true; ...)
  • Unit tests for invalid SQL (SELECT FROM WHERE) — 200 OK with valid=false and error message
  • Unit tests for missing sql field — 400 Bad Request
  • Manual smoke test against a running broker (reviewer discretion)

Closes #17615

Adds a lightweight, parse-only syntax validation endpoint to the broker.
It runs the query through Pinot's Calcite-based SQL parser
(RequestUtils.parseQuery) without consulting table metadata, performing
semantic validation, or executing the query. Supports both single-stage
and multi-stage queries and all PinotSqlType values (DQL/DML/DDL/DCL).

The endpoint returns HTTP 200 with a SqlSyntaxValidationResponse body
for both valid and invalid queries; clients inspect the `valid` flag and
`errorMessage` to distinguish. HTTP 400 is returned only when the
request payload is missing the `sql` field.

The motivation (per issue apache#17615) is to let operators detect parser
regressions early in Calcite upgrade cycles -- similar to the one seen
in apache#16295 -- by replaying a corpus of queries against a canary broker
without needing any table setup or query execution.
@deeppatel710
Copy link
Copy Markdown
Contributor Author

@Jackie-Jiang would you have a chance to take a look when you get a moment? This adds the parse-only POST /query/sql/validateSyntax endpoint proposed in #17615 so operators can replay query corpuses against a canary broker to catch Calcite-upgrade parser regressions (like #16295) before rollout. Happy to iterate on the response shape, endpoint path, or the 200-for-invalid-parse choice if you'd prefer a different contract.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 22, 2026

Codecov Report

❌ Patch coverage is 70.00000% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.66%. Comparing base (5d48eeb) to head (2d63949).
⚠️ Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
...pinot/broker/api/resources/PinotClientRequest.java 55.00% 9 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18281      +/-   ##
============================================
+ Coverage     63.59%   63.66%   +0.07%     
  Complexity     1658     1658              
============================================
  Files          3243     3244       +1     
  Lines        197343   197373      +30     
  Branches      30547    30548       +1     
============================================
+ Hits         125494   125658     +164     
+ Misses        61814    61673     -141     
- Partials      10035    10042       +7     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.63% <70.00%> (+0.06%) ⬆️
java-21 63.57% <70.00%> (+<0.01%) ⬆️
temurin 63.66% <70.00%> (+0.07%) ⬆️
unittests 63.66% <70.00%> (+0.07%) ⬆️
unittests1 55.59% <ø> (+0.03%) ⬆️
unittests2 35.10% <70.00%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add broker endpoint to validate parsing of queries

2 participants