Skip to content

Fix UB in ColumnStringBlock::AppendUnsafe when appending empty string_view#489

Open
iliaal wants to merge 1 commit intoClickHouse:masterfrom
iliaal:fix-empty-stringview-memcpy-ub
Open

Fix UB in ColumnStringBlock::AppendUnsafe when appending empty string_view#489
iliaal wants to merge 1 commit intoClickHouse:masterfrom
iliaal:fix-empty-stringview-memcpy-ub

Conversation

@iliaal
Copy link
Copy Markdown

@iliaal iliaal commented Apr 25, 2026

Summary

ColumnStringBlock::AppendUnsafe calls memcpy(pos, str.data(), str.size()) without short-circuiting on str.size() == 0. When the input string_view was built from an empty std::string, str.data() can be nullptr, and glibc declares memcpy's source pointer with __attribute__((nonnull)) regardless of the size argument. UBSan flags every empty-string append:

runtime error: null pointer passed as argument 2, which is declared to never be null
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior clickhouse/columns/string.cpp:139:21

The bug is benign in practice because libc no-ops memcpy(_, NULL, 0), but the false positive fires on any workload that round-trips empty values through String or LowCardinality(String) columns. That makes -fsanitize=undefined builds noisy and hides real findings.

Patch

One-line guard around the memcpy. Behavior is unchanged for non-empty inputs, and the return value matches the previous shape: a string_view pointing at pos with length 0 for empty inputs.

Repro

I hit this in a downstream PHP extension that wraps the library (iliaal/php_clickhouse). Inserting an empty value into a LowCardinality(String) column under -fsanitize=address,undefined reproduces it on every read path. Happy to add a unit test in ut/ if you'd prefer it gated by make test rather than relying on existing coverage.

memcpy's source pointer is declared nonnull regardless of the size
argument. When AppendUnsafe is called with an empty string_view that
was constructed from an empty std::string, str.data() can be null
(the standard allows it), so UBSan flags every empty-string append:

  runtime error: null pointer passed as argument 2,
    which is declared to never be null
  SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior
    clickhouse/columns/string.cpp:139:21

Guard the memcpy with a zero-size short-circuit. Behavior is
unchanged for non-empty inputs and the returned string_view still
points at pos with length 0 for empty inputs.
@iliaal iliaal requested review from mzitnik and slabko as code owners April 25, 2026 21:06
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 25, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes UBSan-reported undefined behavior in ColumnString::Block::AppendUnsafe when appending an empty std::string_view whose .data() may be nullptr, avoiding memcpy being called with a null source pointer even when the size is 0.

Changes:

  • Add a guard to skip memcpy/size update when str.size() == 0.
  • Document why the guard is needed (UBSan + memcpy nonnull attribute behavior).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 137 to +141
const auto pos = &data_[size];

memcpy(pos, str.data(), str.size());
size += str.size();
// memcpy's source pointer is declared nonnull regardless of the
// size argument, so an empty string_view backed by std::string()
// (where data() may be null) trips UBSan on every empty append.
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const auto pos = &data_[size]; can be undefined behavior when size == capacity (e.g., appending an empty string when the block is full, or when capacity==0), because data_[size] conceptually dereferences one-past-the-end. Prefer computing the pointer via data_.get() + size (and consider applying the same change in the other Block helpers that use &data_[size]), so zero-length operations don’t rely on out-of-bounds operator[] evaluation.

Copilot uses AI. Check for mistakes.
Comment on lines +140 to +141
// size argument, so an empty string_view backed by std::string()
// (where data() may be null) trips UBSan on every empty append.
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment suggests an empty std::string may have data() == nullptr, but standard std::string::data() is generally non-null even when empty. The nullptr case is definitely possible for an empty std::string_view (e.g., default-constructed std::string_view{} or a view built from a nullable C string), so it would be clearer/safer to describe the issue as “empty string_view can have null .data()” rather than tying it specifically to std::string().

Suggested change
// size argument, so an empty string_view backed by std::string()
// (where data() may be null) trips UBSan on every empty append.
// size argument, so an empty string_view with a null data() pointer
// trips UBSan on every empty append.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants