Skip to content

[python] Fix FileSystemBranchManager from-tag and fast-forward path computation#7756

Merged
JingsongLi merged 3 commits intoapache:masterfrom
TheR1sing3un:py-fix-branch-manager-snapshot-path
May 3, 2026
Merged

[python] Fix FileSystemBranchManager from-tag and fast-forward path computation#7756
JingsongLi merged 3 commits intoapache:masterfrom
TheR1sing3un:py-fix-branch-manager-snapshot-path

Conversation

@TheR1sing3un
Copy link
Copy Markdown
Member

@TheR1sing3un TheR1sing3un commented May 1, 2026

Purpose

Fix a pre-existing bug in FileSystemBranchManager._copy_with_branch that made create_branch(tag_name=...) and fast_forward raise SameFileError: the SnapshotManager it produced still pointed at the main-branch snapshot directory, so copy_file(src, dst) collapsed to src == dst.

The fix mirrors Java SnapshotManager.copyWithBranch (paimon-core/.../utils/SnapshotManager.java:89-95): the Python SnapshotManager now carries an explicit branch field and a copy_with_branch(branch_name) factory, so its snapshot_dir / get_snapshot_path(...) resolve to {table_path}/branch/branch-{name}/snapshot/... for non-main branches. FileSystemBranchManager._copy_with_branch then dispatches to the per-manager factories (SnapshotManager.copy_with_branch / SchemaManager.copy_with_branch) instead of blindly reconstructing a main-branch SnapshotManager.

SnapshotLoader is rebranched in lockstep so REST-path catalog loads target the requested branch rather than falling back to the main-branch identifier (mirrors Java SnapshotLoaderImpl.copyWithBranch).

In scope

  • pypaimon/snapshot/snapshot_manager.py: constructor accepts an optional branch argument (defaults to inheriting table.current_branch()); new branch field; branch-aware snapshot_dir / latest_file; new copy_with_branch(branch_name) factory that also rebranches the snapshot loader.
  • pypaimon/snapshot/snapshot_loader.py: new copy_with_branch(branch) returning a loader whose Identifier carries the new branch (mirrors Java SnapshotLoaderImpl.copyWithBranch).
  • pypaimon/branch/filesystem_branch_manager.py: _copy_with_branch dispatches to manager.copy_with_branch(branch) for SnapshotManager / SchemaManager (the latter already had the factory). TagManager is still rebuilt directly because the Python TagManager has no copy_with_branch factory yet.
  • pypaimon/tests/branch_manager_test.py: two new test classes -- SnapshotManagerBranchAwarenessTest pins down the branch-aware path computation and the loader rebranching, and FileSystemBranchManagerEndToEndTest regresses both the from-tag happy path and the fast-forward happy path that previously raised SameFileError.
  • pypaimon/tests/snapshot_manager_test.py: existing Mock() tables now explicitly set current_branch.return_value = "main", since the constructor now calls table.current_branch() for the default branch.

Out of scope

Tests

From paimon-python/:

pytest pypaimon/tests/branch_manager_test.py -v             # 19 passed (14 existing + 5 new)
pytest pypaimon/tests/snapshot_manager_test.py -v           # 3 passed (mock fixture fix)
pytest pypaimon/tests/branch/ pypaimon/tests/rest/rest_branch_test.py \
       pypaimon/tests/api/test_branch_dto_serde.py \
       pypaimon/tests/filesystem_catalog_tag_test.py -q     # 87 passed
flake8 --config=dev/cfg.ini <touched files>                  # clean

The two new end-to-end tests in FileSystemBranchManagerEndToEndTest (test_create_branch_from_tag_lands_files_under_branch_dir and test_fast_forward_after_create_branch_from_tag) deterministically reproduce the SameFileError on master and pass on this branch. test_copy_with_branch_rebranches_snapshot_loader covers the new loader rebranching.

Anti-divergence checklist

  • SnapshotManager.copy_with_branch matches Java SnapshotManager.copyWithBranch semantics: a fresh manager whose path accessors flow through BranchManager.branch_path(table_root, branch), plus a snapshot loader rebranched via SnapshotLoader.copy_with_branch(branch) when one is present.
  • SnapshotLoader.copy_with_branch matches Java SnapshotLoaderImpl.copyWithBranch: same catalog_loader, new Identifier(database, object, branch).
  • branch == "main" keeps existing paths bit-identical (branch_path(p, "main") == p), so all current SnapshotManager(table) call sites stay zero-regression.
  • FileStoreTable.snapshot_manager() is unchanged; the new optional branch parameter defaults to table.current_branch() which is already the production behavior.

Generative AI disclosure

Drafted with assistance from a generative AI tool. All code, tests, and Java alignment were reviewed and validated by the contributor.

…omputation

FileSystemBranchManager._copy_with_branch returned a SnapshotManager
that still pointed at the main-branch snapshot directory, so
create_branch(tag_name=...) and fast_forward hit copy_file(src, dst)
with src == dst and raised SameFileError.

Mirror Java SnapshotManager.copyWithBranch (utils/SnapshotManager.java)
on the Python side: SnapshotManager now carries an explicit branch
field and a copy_with_branch factory, so its snapshot_dir resolves to
{table_path}/branch/branch-{name}/snapshot for non-main branches. The
dispatch in _copy_with_branch then delegates to the per-manager
copy_with_branch factories.

Adds SnapshotManagerBranchAwarenessTest for the path computation and
FileSystemBranchManagerEndToEndTest for the from-tag and fast-forward
happy paths that previously raised SameFileError.
Mirror Java SnapshotLoaderImpl.copyWithBranch (tag/SnapshotLoaderImpl.java):
when SnapshotManager.copy_with_branch produces a branch-aware manager,
swap its loader for one whose Identifier carries the new branch, so
catalog-backed snapshot loads target the requested branch instead of
falling back to the main-branch identifier.

On the FileSystemCatalog path the loader is None and this is inert.
Use get_database_name() / get_table_name() instead of reaching into
the .database / .object fields, to match Java SnapshotLoaderImpl's
identifier.getDatabaseName() / identifier.getTableName() calls.
Copy link
Copy Markdown
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@JingsongLi JingsongLi merged commit 718f292 into apache:master May 3, 2026
6 checks passed
TheR1sing3un added a commit to TheR1sing3un/incubator-paimon that referenced this pull request May 3, 2026
Switch SnapshotManager.__init__ from (table) to the Java-aligned
(file_io, table_path, branch=None, snapshot_loader=None) so the class
no longer depends on FileStoreTable. The new manager carries its own
file_io / table_path / branch / snapshot_loader fields, mirroring
paimon-core/.../utils/SnapshotManager.java.

FileStoreTable.snapshot_manager() remains the canonical factory and
now wires those four basics. All raw SnapshotManager(table) call sites
across production and tests are migrated to table.snapshot_manager().
copy_with_branch is rewritten to construct the rebranched manager
directly via the new constructor (no field-swap).

Mock-style tests that patched the SnapshotManager class to intercept
its instances now wire the mock through table.snapshot_manager.return_value,
which matches how production code obtains its instance.

Follow-up to apache#7756.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants