Skip to content

[spark] Fix empty projection causing Invalid metadata length for COUNT(*)/COUNT(1)#3227

Open
Kaixuan-Duan wants to merge 1 commit intoapache:mainfrom
Kaixuan-Duan:issue-2724-empty-projection
Open

[spark] Fix empty projection causing Invalid metadata length for COUNT(*)/COUNT(1)#3227
Kaixuan-Duan wants to merge 1 commit intoapache:mainfrom
Kaixuan-Duan:issue-2724-empty-projection

Conversation

@Kaixuan-Duan
Copy link
Copy Markdown

@Kaixuan-Duan Kaixuan-Duan commented Apr 28, 2026

Purpose

Linked issue: close #2724

When Spark pushes down an empty column projection for COUNT(*)/COUNT(1) queries, the Fluss server fails with IllegalStateException("Invalid metadata length") in FileLogProjection.project(), causing the client to retry indefinitely and the query to hang.

This PR fixes the issue from two sides:

Server side: reject empty projection early with a clear InvalidColumnProjectionException instead of crashing with an internal error.
Spark connector side: fall back to projecting the first column when Spark pushes down an empty projection, so the row count is preserved without fetching unnecessary data.

Brief change log

  • FileLogProjection#setCurrentProjection: add a guard that throws InvalidColumnProjectionException when selectedFieldPositions is empty.

  • FileLogProjectionTest: add testEmptyProjectionRejectsWithClearError to verify the server-side guard.

  • FlussBatch#projection / FlussMicroBatchStream#projection: when readSchema yields an empty projection, fall back to Array(0) (first column).

  • SparkLogTableReadTest: add COUNT(*) and COUNT(1) end-to-end tests for log tables.

  • SparkPrimaryKeyTableReadTest: add COUNT(*) end-to-end test for primary key tables.

Tests

./mvnw -pl fluss-common -DskipTests=false -Dtest=FileLogProjectionTest#testEmptyProjectionRejectsWithClearError test

./mvnw -pl fluss-spark/fluss-spark-ut -am install -DskipTests
./mvnw -pl fluss-spark/fluss-spark-ut -Dsuites='org.apache.fluss.spark.SparkLogTableReadTest' test
./mvnw -pl fluss-spark/fluss-spark-ut -Dsuites='org.apache.fluss.spark.SparkPrimaryKeyTableReadTest' test

API and Format

Documentation

SchemaGetter schemaGetter,
ArrowCompressionInfo compressionInfo,
int[] selectedFieldPositions) {
// Empty projection (selectedFieldPositions.length == 0) is currently not supported on the
Copy link
Copy Markdown
Contributor

@Yohahaha Yohahaha Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to also verify this behavior and the fix in the Flink connector if needed.

@luoyuxia
Copy link
Copy Markdown
Contributor

luoyuxia commented Apr 30, 2026

@Kaixuan-Duan Hi, seems is it same with #2725. cc @beryllw

}
}

test("Spark Read: COUNT(*) without filter") {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when there is a filter?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[spark] Empty projection in batch read throws Invalid metadata length with COUNT(*)/COUNT(1)

3 participants