[python] Implement partial-update merge engine in pypaimon#7745
[python] Implement partial-update merge engine in pypaimon#7745TheR1sing3un wants to merge 4 commits intoapache:masterfrom
Conversation
``MergeEngine.PARTIAL_UPDATE`` is exposed in ``core_options.py`` and
accepts ``merge-engine: partial-update`` as a table option, but the
read path never reads that option — ``sort_merge_reader.py`` hardcodes
``DeduplicateMergeFunction()``. So a user who creates a PK table with
``merge-engine: partial-update`` and writes overlapping rows whose
non-null columns differ gets silently deduplicated results instead of
the expected per-field merge: their data is wrong, with no error or
warning. The same is true for ``aggregation`` and ``first-row`` —
both are silently degraded to dedupe today.
This change ports the core ``PartialUpdateMergeFunction`` semantics
from Java
(paimon-core/.../mergetree/compact/PartialUpdateMergeFunction.java) and
wires the Python read path to dispatch on ``merge-engine``:
* New ``pypaimon/read/reader/partial_update_merge_function.py``: on
each ``add(kv)`` copy non-null fields of ``kv.value`` into an
accumulator; ``get_result()`` returns a fresh KeyValue with the
merged row. Result is built into a brand-new tuple so the merge
output is decoupled from upstream's reused KeyValue instances.
* ``SortMergeReaderWithMinHeap.__init__`` gains an optional
``merge_function`` kwarg; default still ``DeduplicateMergeFunction()``
so any direct callers (none in-tree) are unchanged.
* ``MergeFileSplitRead.section_reader_supplier`` selects the merge
function based on ``self.table.options.merge_engine()``:
DEDUPLICATE -> DeduplicateMergeFunction (unchanged)
PARTIAL_UPDATE -> PartialUpdateMergeFunction
AGGREGATE / FIRST_ROW -> NotImplementedError (was silent dedupe)
Out of scope, intentionally:
* Per-field aggregator overrides (``fields.<name>.aggregate-function``)
* Sequence-group support (``fields.<name>.sequence-group``)
* ``ignore-delete`` / ``partial-update.remove-record-on-*`` options
* AGGREGATE / FIRST_ROW merge engine implementations
DELETE / UPDATE_BEFORE rows raise ``NotImplementedError`` at ``add()``
time so we can't silently corrupt data with a half-implemented contract.
Tests:
* ``test_partial_update_merge_function.py`` — 11 unit cases covering
single insert, two-way overlapping merges, three-way merges, later-
null-does-not-clobber, reset between keys, get_result-before-any-
add, UPDATE_AFTER acceptance, DELETE / UPDATE_BEFORE refusal, and
result decoupling from input kv (proves we're not aliasing
upstream's reused KeyValue).
* ``test_partial_update_e2e.py`` — 8 cases: two-write merge, three-
write merge, disjoint keys unaffected, later-non-null wins, later-
null preserves earlier value, deduplicate engine unchanged
(regression), and aggregation / first-row raise NotImplementedError.
Verified by checking out ``origin/master``'s ``sort_merge_reader.py`` /
``split_read.py`` and rerunning ``test_partial_update_e2e.py``: master
fails the 4 partial-update merge cases (silent dedupe) and the 2
aggregation / first-row "raises" cases (silent dedupe instead of
raising); fix passes all 8.
…f-scope options Address review on r3168491328: previously `_build_merge_function()` dispatched on `merge-engine: partial-update` alone, so a table that ALSO configured sequence-group / per-field aggregator / ignore-delete / partial-update.remove-record-on-* would fall into the simple PartialUpdateMergeFunction and silently drop those semantics -- exactly the same silent-corruption pattern this PR exists to close, just reshaped from "silent dedupe" to "silent half-partial-update". Now the PARTIAL_UPDATE branch first scans the table options for any of the unsupported keys: * fields.<name>.sequence-group * fields.<name>.aggregate-function * fields.default-aggregate-function * ignore-delete (and the partial-update./first-row./deduplicate. prefixed aliases) when truthy * partial-update.remove-record-on-delete when truthy * partial-update.remove-record-on-sequence-group when truthy If any are set, raise NotImplementedError naming every offending key so the user can either drop them or escalate. Same shape as the existing AGGREGATE / FIRST_ROW raise. Tests: 7 new e2e cases in test_partial_update_e2e.py, one per option plus a regression case asserting `ignore-delete: false` (explicitly disabled) still passes through to the merge function.
…eNonNullFields Java PartialUpdateMergeFunction.updateNonNullFields (line 177-188) raises IllegalArgumentException when an input field is null and the schema marks that field NOT NULL. The Python port previously absorbed such inputs silently, letting writes whose first value was null on a NOT NULL field land null in the accumulator. Changes: * PartialUpdateMergeFunction.__init__ takes an optional `nullables` list parallel to value indices. When given, every add() checks each null input against `nullables[i]` and raises ValueError on a NOT NULL field, matching Java semantics on every row (not just the first). When omitted, behaviour is unchanged (back-compat for direct callers). * MergeFileSplitRead snapshots the raw value-side schema as `value_fields` before _create_key_value_fields wraps it, then hands `[f.type.nullable for f in self.value_fields]` to the merge function. * Five new unit cases in test_partial_update_merge_function.py: first row null on NOT NULL raises, subsequent row null on NOT NULL raises, null on nullable field is absorbed, length-mismatch nullables raises, omitting nullables preserves the previous lenient behaviour. Result: with the existing guard in _build_merge_function (which refuses out-of-scope options) and the NOT NULL enforcement here, the simple last-non-null path is now feature-equivalent to Java's updateNonNullFields + getResult on the supported subset.
|
Could you add coverage for partial-update rows that land in the same data file, e.g. two |
|
Nit: thanks for adding the coverage. I wonder if you can make it a bit more focused though — some unit and |
Thank you for your suggestion. I will pin you again after it's done |
…tedFailure Reviewer asked to cover rows that land in the same data file -- multiple write_arrow() calls before a single prepare_commit(). Adding the cases revealed the writer-side / read-side gap upstream of this PR: KeyValueDataWriter._merge_data only does concat+sort (no merge function applied), so the flushed file holds duplicate primary keys; on read, _build_split_from_pack treats any single-file group as raw_convertible and routes through the fast path, skipping SortMergeReader and the merge-engine dispatch this PR adds. Fixing it requires either a merge buffer in KeyValueDataWriter (mirroring Java SortBufferWriteBuffer / MergeTreeWriter) or a tighter raw_convertible check that proves intra-file PK uniqueness -- both are write-path / scan-path restructuring outside this read-side merge-engine port. The two new cases are kept as unittest.expectedFailure so the gap stays visible and converts to passing regressions when the writer-side fix lands.
Verified -- and the gap is independent of this PR. I added two coverage cases per your suggestion (commit 0d58859): To rule out anything this PR introduced, I ran the same workload on The flushed data file contains 2 rows for the same PK; on read, Root cause split across the write and read paths:
Fixing it requires either (a) an in-memory merge buffer in The expectedFailure cases here will turn into passing regressions |
|
@XiaoHongbo-Hope Hi, thank you for your point. However, this issue is not strictly related to this pr. It is a general problem. I would rather propose a separate pr to solve it. These two PRS are independent and can be advanced in parallel |
Purpose
pypaimonexposesMergeEngine.PARTIAL_UPDATEincore_options.pyand acceptsmerge-engine: partial-updateas a table option, but the read path never reads that option —pypaimon/read/reader/sort_merge_reader.pyhardcodesDeduplicateMergeFunction(). So a user who:merge-engine: partial-update,[{id: 1, a: 'A', b: null}], then another with[{id: 1, a: null, b: 'B'}],gets
[{id: 1, a: null, b: 'B'}](silently deduplicated to the latest row) instead of the expected[{id: 1, a: 'A', b: 'B'}](per-field merge of non-null values). No error, no warning, just wrong data. The same is true today formerge-engine: aggregationandmerge-engine: first-row— both are silently degraded to dedupe.This PR ports the core
PartialUpdateMergeFunctionsemantics from Java (paimon-core/.../mergetree/compact/PartialUpdateMergeFunction.java) and wires the Python read path to dispatch onmerge-engine. End state:merge-engine: partial-updateworks correctly for the common case (no DELETE rows, no sequence-group config, no per-field aggregator overrides).merge-engine: deduplicateis unchanged.merge-engine: aggregation/first-rowraise an explicitNotImplementedErrorinstead of silently behaving as dedupe — fail loud now that we have an obvious place for users to escalate from.Changes
New:
pypaimon/read/reader/partial_update_merge_function.pyThe
reset/add/get_resultprotocol matchesDeduplicateMergeFunctionexactly soSortMergeReaderdoesn't change.Modified:
pypaimon/read/reader/sort_merge_reader.pySortMergeReaderWithMinHeap.__init__gains an optionalmerge_functionkwarg (defaults toDeduplicateMergeFunction()so any direct callers are unchanged).Modified:
pypaimon/read/split_read.pyMergeFileSplitRead.section_reader_suppliernow picks the merge function based onself.table.options.merge_engine():Out of scope (deliberate, called out in code comments)
fields.<name>.aggregate-function=...) — needs aFieldAggregatorframework.fields.<name>.sequence-group=...) — needs aUserDefinedSeqComparator-equivalent.ignore-delete,partial-update.remove-record-on-delete,partial-update.remove-record-on-sequence-group— depend on the above.NotImplementedErrorso we never silently corrupt data with a half-implemented contract.Linked issue
N/A — surfaced when verifying that
merge-engine: partial-updateactually does what the option name implies in pypaimon.Tests
pypaimon/tests/test_partial_update_merge_function.py— 11 unit cases driving the merge function with syntheticKeyValueinstances. Covers: single insert, two-way overlapping merge (overwrite vs fill-null), three-way merge composition, later-null-does-not-clobber, reset between keys,get_resultbefore any add, UPDATE_AFTER acceptance, DELETE / UPDATE_BEFORE refusal, and result decoupling from input kv (proves the result is built into a fresh tuple so upstream's reused-KeyValue pattern doesn't corrupt us).pypaimon/tests/test_partial_update_e2e.py— 8 end-to-end cases on real PK tables. Covers: two-write merge (A,_+_,B→A,B), three-write left-to-right composition, disjoint keys unaffected, later-non-null wins over earlier non-null, later-null preserves earlier value, deduplicate-engine-unchanged (regression), aggregation / first-row raiseNotImplementedError.Master-vs-fix verification — checked out
origin/master'ssort_merge_reader.py/split_read.pyover the new tests:Regression:
pytest pypaimon/tests/{reader_primary_key_test,reader_split_generator_test,reader_append_only_test,test_partial_update_*}.py→ 50 passed, 7 failed. The 7 failures are all pre-existing lance / vortex environment issues unrelated to this PR; the dedupe path is unchanged.flake8 --config=dev/cfg.iniclean.API and format
Public API additions:
pypaimon.read.reader.partial_update_merge_function.PartialUpdateMergeFunction— new class.SortMergeReaderWithMinHeap.__init__gets an optionalmerge_functionkwarg (back-compat default).No public API removals or signature breaks. No file format change.
Behaviour change: tables with
merge-engine: partial-updatenow produce per-field merged results (was: silently deduplicated). Tables withmerge-engine: aggregationorfirst-rownow raiseNotImplementedError(was: silently deduplicated). Both are correctness fixes — the previous behaviour was producing wrong data with no signal to the user.Documentation
The new module carries a docstring covering the algorithm, the out-of-scope list, and the link back to the Java reference. The dispatch in
MergeFileSplitRead._build_merge_functioncarries an inline comment explaining why we now raise on AGGREGATE / FIRST_ROW.Generative AI disclosure
Drafted with assistance from an AI coding tool; the algorithm follows
org.apache.paimon.mergetree.compact.PartialUpdateMergeFunctionand the soundness contract is exercised end-to-end by the regression tests above (which fail on master and pass post-fix).