Skip to content

fix(datafusion): keep hot runs in one session#833

Open
kumarUjjawal wants to merge 1 commit intoClickHouse:mainfrom
kumarUjjawal:fix/datafusion_hot_run
Open

fix(datafusion): keep hot runs in one session#833
kumarUjjawal wants to merge 1 commit intoClickHouse:mainfrom
kumarUjjawal:fix/datafusion_hot_run

Conversation

@kumarUjjawal
Copy link
Copy Markdown

Summary

This fixes the partitioned DataFusion runner so hot runs are really hot. See apache/datafusion#21696

Before this change, the script started a new datafusion-cli process for every try. That made each try use a cold process, so tries 2 and 3 did not keep the warmed metadata cache.

What changed

  • drop OS cache once before each query
  • build one SQL file with create.sql and the same query 3 times
  • run that file in one datafusion-cli session
  • skip the setup Elapsed lines from create.sql
  • keep failed later tries as null instead of mixing in setup timings

This follows the same main idea used by the DuckDB partitioned runner: one process per query, with repeated runs inside that process.

Thank you for review 🙏

Thank You for Your Contribution!

We appreciate your effort and contribution to the project. To ensure that your Pull Request (PR) adheres to our guidelines, please ensure to review the rules mentioned in our contribution guidelines:

ClickHouse/ClickBench Contribution Rules

Thank you for your attention to these details and for helping us maintain the quality and integrity of the project.

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 20, 2026

CLA assistant check
All committers have signed the CLA.

@kumarUjjawal
Copy link
Copy Markdown
Author

@qoega If I could get your eyes on this?

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 27, 2026

Hi @kumarUjjawal -- thank you. How did you test this change?

Maybe it would be a good exercise to try and get DataFusion 53 numbers with this script? That way we both double check the script works and update the results for DF 53

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 27, 2026

I can try and help do this (run for DF 53) if needed

@kumarUjjawal
Copy link
Copy Markdown
Author

kumarUjjawal commented Apr 27, 2026

Hi @kumarUjjawal -- thank you. How did you test this change?

Maybe it would be a good exercise to try and get DataFusion 53 numbers with this script? That way we both double check the script works and update the results for DF 53

I asked Codex to build a small bash script for local test.

It used a fake datafusion-cli output:

  • one datafusion-cli session per query
  • setup timings from create.sql are skipped
  • if the shared session fails after try 1, the output is [t1, null, null]

If you could help run the DF 53 bench that would be great. 🙏

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 27, 2026

Will try later this week -- basically we need to run it on aws.

@kumarUjjawal
Copy link
Copy Markdown
Author

Thank you!

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 30, 2026

I am somewhat overwhelmed this week with other things. I'll see what I can do later.

Maybe @pmcgleenon has some time to review / rerun the numbers (he used to be our star number runner 🦾 )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants