Skip to content

Fix cloud-wal-restore on compressed WAL + sibling backup label#1190

Closed
adam8157 wants to merge 2 commits into
EnterpriseDB:masterfrom
adam8157:fix-cloud-wal-restore-backup-match
Closed

Fix cloud-wal-restore on compressed WAL + sibling backup label#1190
adam8157 wants to merge 2 commits into
EnterpriseDB:masterfrom
adam8157:fix-cloud-wal-restore-backup-match

Conversation

@adam8157

@adam8157 adam8157 commented May 22, 2026

Copy link
Copy Markdown
Contributor
  • CloudWalDownloader._get_wals_to_download matched the requested WAL by path.startswith(requested_wal_path), which also matched the sibling <wal>.<offset>.backup label file. Combined with the early-break introduced in 5625881, recovery against a compressed backup would exit the iteration on the .backup.gz entry (which sorts before .gz lexicographically) and never reach the real WAL — FATAL: could not locate required checkpoint record.
  • Match by basename equality after stripping any compression suffix; accept both <wal> and <wal>.partial. Sibling .backup files are now harmlessly skipped via _validate_wal_path. The "requested WAL missing → return []" invariant from 5625881 is preserved via a found_requested flag enforced after the loop.
  • Adds a regression test covering the compressed-WAL + sibling-backup-label case.

References: BAR-1305.

@adam8157 adam8157 requested a review from a team as a code owner May 22, 2026 04:47
adam8157 added 2 commits May 22, 2026 13:09
Commit 5625881 tightened CloudWalDownloader._get_wals_to_download so
that hitting an invalid ``is_requested_wal`` entry exits the iteration
immediately, in order to fail fast when a WAL was requested but only
its ``<wal>.<offset>.backup`` label exists in the bucket.

That logic used ``path.startswith(requested_wal_path)`` to decide
``is_requested_wal``, which also matches the related ``.backup`` label
file. With compression enabled, both files live in the bucket as
``<wal>.gz`` and ``<wal>.<offset>.backup.gz``, and the label sorts
before the WAL in lexicographic order (``.`` 0x2e + ``0`` 0x30 vs ``.``
0x2e + ``g`` 0x67). So when recovery requests the start-checkpoint WAL
of a compressed backup, the iteration hits the ``.backup.gz`` file
first, ``_validate_wal_path`` rejects it as a backup file, and the new
``break`` exits before ever reaching the real ``.gz`` WAL — leaving
Postgres unable to fetch the start-checkpoint WAL and recovery to abort
with ``FATAL: could not locate required checkpoint record``.

Match the requested WAL by comparing the basename (after stripping any
compression suffix) against ``wal_name``, so a sibling ``.backup`` file
no longer trips the ``is_requested_wal`` branch and is harmlessly
skipped via ``_validate_wal_path``.

The invariant from 5625881 — "when only the ``.backup`` label exists
and the actual WAL is missing, return an empty list rather than
unrelated subsequent WALs" — is still preserved, but enforced after the
iteration: a ``found_requested`` flag tracks whether the requested WAL
was actually located, and the method returns ``[]`` if not. This keeps
the prefetch path simple while still failing fast for the caller.

References: BAR-1305.
Signed-off-by: Adam Lee <adam8157@gmail.com>
Cover the regression scenario fixed by the previous commit: when the
bucket holds both ``<wal>.gz`` and ``<wal>.<offset>.backup.gz`` for the
requested WAL, the sorted iteration must walk past the backup-label
entry and return the real compressed WAL — not exit early on a
``startswith`` match against the label file.

The existing ``test_get_wals_to_download_exits_early_when_requested_wal_is_invalid``
test continues to cover the complementary case: when only the ``.backup``
label exists and the actual WAL is missing, the method still returns an
empty list.

References: BAR-1305.
Signed-off-by: Adam Lee <adam8157@gmail.com>
@adam8157 adam8157 force-pushed the fix-cloud-wal-restore-backup-match branch from f80b5b9 to 6da99b2 Compare May 22, 2026 05:19
@adam8157 adam8157 changed the title Fix cloud-wal-restore missing compressed WAL on backup-label collision Fix cloud-wal-restore on compressed WAL + sibling backup label May 22, 2026
@adam8157

adam8157 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Somehow these two commits are already merged, closing

@adam8157 adam8157 closed this Jun 9, 2026
@adam8157 adam8157 deleted the fix-cloud-wal-restore-backup-match branch June 9, 2026 01:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant