feat: AQE DPP for native Parquet scans with broadcast reuse#4112
Draft
mbutrovich wants to merge 31 commits intoapache:mainfrom
Draft
feat: AQE DPP for native Parquet scans with broadcast reuse#4112mbutrovich wants to merge 31 commits intoapache:mainfrom
mbutrovich wants to merge 31 commits intoapache:mainfrom
Conversation
Open
6 tasks
…t exchangeReuseEnabled and onlyInBroadcast, create aggregate SubqueryExec for case 3.
# Conflicts: # spark/src/main/spark-4.1/org/apache/comet/shims/ShimSubqueryBroadcast.scala
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Partially addresses #3510. Closes #4045 (V1 Parquet AQE DPP). Related PRs: #4011 (non-AQE DPP), #4053 (scalar subquery pushdown +
CometReuseSubquery), #4037 (non-AQE DPP edge case tests), #4033 (AQE DPP for Iceberg, draft).Rationale for this change
Under AQE (the default), Spark creates
SubqueryAdaptiveBroadcastExec(SAB) for DPP. Spark'sPlanAdaptiveDynamicPruningFiltersconverts these by findingBroadcastHashJoinExecin the plan. After Comet replaces it withCometBroadcastHashJoinExec, Spark's rule can't find a match and replaces DPP withLiteral.TrueLiteral, disabling partition pruning. Previously, theisAqeDynamicPruningFilterrejection caused the scan to fall back to Spark entirely, losing native acceleration for all DPP queries under AQE.What changes are included in this PR?
Two-phase SAB conversion
Spark's
PlanAdaptiveDynamicPruningFiltersruns before customqueryStageOptimizerRulesand converts SABs toTrueLiteral. We work around this in two phases:CometSubqueryAdaptiveBroadcastExecso Spark's pattern match doesn't recognize them. Only wraps inCometNativeScanExecnodes (non-Comet scans keep the original SAB for Spark to handle).PlanAdaptiveDynamicPruningFiltersdecision tree:exchangeReuseEnabled+ matching broadcast join:CometSubqueryBroadcastExecwired toBroadcastQueryStageExecfor broadcast reuseonlyInBroadcast=true:Literal.TrueLiteral(DPP disabled)onlyInBroadcast=false: aggregateSubqueryExec(DPP via separate execution, matching Spark'sPlanAdaptiveDynamicPruningFilterslines 68-79)Cross-stage broadcast search
Spark's
PlanAdaptiveDynamicPruningFiltersis constructed withrootPlan = this(the current ASPE), giving each ASPE its own rule instance. CustomqueryStageOptimizerRulesviainjectQueryStageOptimizerRuleare shared across all ASPEs without a per-ASPE rootPlan.We approximate this with two searches:
planarg toapply()): same-stage joins and scalar subqueries where scan and join are under one exchangeWhen the broadcast is not yet materialized (cross-stage case), we follow Spark's
PlanAdaptiveDynamicPruningFilterspattern (lines 44-64): construct a new broadcast exchange, wrap in a new ASPE, and let AQE'sstageCachecanonicalization ensure the broadcast runs once.Subquery deduplication via shared cache
Our rule runs after Spark's
ReuseAdaptiveSubquery(which can't see our subqueries because they don't exist yet). We register DPP subqueries directly inAdaptiveExecutionContext.subqueryCache, matchingReuseAdaptiveSubquery's behavior for cross-plan reuse (e.g., main query and scalar subquery with identical DPP).Dual-filter resolution
CometNativeScanExec.partitionFiltersandCometScanExec.partitionFilterscontain separateInSubqueryExecinstances.CometExecRuleonly wraps the outer filters (the innerCometScanExecis@transient).CometPlanAdaptiveDynamicPruningFiltersconverts both:CometSubqueryAdaptiveBroadcastExec(wrapped, outer) andSubqueryAdaptiveBroadcastExec(unwrapped, inner).Spark 3.4 fallback
injectQueryStageOptimizerRuleis unavailable on 3.4. SAB wrapping is gated onisSpark35Plus. On 3.4, AQE DPP scans fall back to Spark so Spark's rule handles them natively.Broadcast fallback cases
BroadcastHashJoinExec, createsSubqueryBroadcastExecvia shim.Literal.TrueLiteralor aggregateSubqueryExecdepending ononlyInBroadcast.BroadcastQueryStageExec.planmay beReusedExchangeExecwhen AQE reuses exchanges across plans. The rule unwraps it to verify the underlying exchange type.Other changes
CometBroadcastExchangeExec: handles non-Comet children (e.g.,LocalTableScanafter AQE re-optimization of empty broadcasts) by wrapping inCometSparkToColumnarExec.CometNativeScanExec.doCanonicalize: strips DPP filters fromoriginalPlanto prevent stale SABs from blocking exchange reuse.CometShuffleExchangeExec.doCanonicalize: excludesoriginalPlanfrom canonical form (matchesCometBroadcastExchangeExec).CometScanUtils.filterUnusedDynamicPruningExpressions: strips unconverted SABs in addition toTrueLiteral, matching Spark'sFileSourceScanExec.filterUnusedDynamicPruningExpressions.ShimPrepareExecutedPlan: new shim forQueryExecution.prepareExecutedPlan(3-arg on 3.x/4.0, 2-arg on 4.1+).CometDppFallbackRepro3949Suite,CometShuffleFallbackStickinessSuite) updated to disable native scan to preserve thestageContainsDPPScanstickiness code path.IgnoreComet(#4045)tags from Spark'sDynamicPartitionPruningSuitediffs (SPARK-32509, SPARK-34637). Tests ported toCometExecSuite.How are these changes tested?
16 new AQE DPP tests in
CometExecSuitecovering BHJ/SMJ/empty broadcast/dual filters/exchange reuse/non-atomic types/cross-stage search/scalar subquery deduplication/SPARK-32509/SPARK-34637/SPARK-39447. All tests have version-specific assertions (3.5+ native path vs 3.4 fallback). Existing non-AQE DPP tests renamed to consistent"[non-AQE|AQE] DPP: <scenario>"format.