Skip to content

[wip] mmap-backed cache backends and enterprise hardening#69015

Open
dwoz wants to merge 13 commits intosaltstack:3008.xfrom
dwoz:mmapcache
Open

[wip] mmap-backed cache backends and enterprise hardening#69015
dwoz wants to merge 13 commits intosaltstack:3008.xfrom
dwoz:mmapcache

Conversation

@dwoz
Copy link
Copy Markdown
Contributor

@dwoz dwoz commented Apr 30, 2026

No description provided.

@dwoz dwoz requested a review from a team as a code owner April 30, 2026 09:25
@dwoz dwoz added the test:full Run the full test suite label Apr 30, 2026
@dwoz dwoz changed the title [wip] Add mmap-backed cache backends and enterprise hardening [wip] mmap-backed cache backends and enterprise hardening Apr 30, 2026
dwoz added 13 commits May 3, 2026 23:33
Introduce MmapCache index+heap architecture as a production-grade
alternative to the localfs cache backend, and a dedicated mmap_key
backend for PKI key storage.

salt/utils/mmap_cache.py:
- Index+heap split: fixed-size slot index with append-only heap file
- Roster file (.roster) for O(occupied) list operations
- Header slot tracking occupied_count, deleted_count, high_water_mark
- Thread safety via threading.RLock + fcntl.flock (cross-process)
- Per-entry XXH3-64 checksums (verify_checksums=True, default on)
- xxhash.xxh3_64 replaces zlib.adler32/crc32: unified hash for both
  slot distribution and heap integrity, aligned with a future
  distributed token-ring partition scheme
- _roster_recover(): open-time self-healing of roster/index divergence
  after crash between index flush and roster append
- atomic_rebuild(): snapshot-style full table rebuild via temp files
- Persistent file descriptors for lock, roster write, heap read mmap

salt/cache/mmap_cache.py:
- Drop-in localfs replacement using MmapCache as storage
- Direct msgpack.packb/unpackb (not salt.payload) for faster
  deserialization on typical cache payloads

salt/cache/mmap_key.py:
- New mmap-native PKI key backend (keys.cache_driver: mmap_key)
- Stores minion ID, key state, and public key in heap; no filesystem
  fallback, no dual code path
- Heap layout: STATE_BYTE(1) + pub_key_bytes for the keys bank
- rebuild_from_localfs() for one-time migration from legacy pki layout

salt/cache/localfs_key.py:
- Remove embedded MmapCache index (_get_index, rebuild_index,
  pki_index_enabled guards) superseded by mmap_key backend
- Restore as a clean filesystem shim with no dual code paths

salt/key.py:
- Remove pki_index_enabled rebuild_index call sites

requirements:
- Add xxhash>=3.0.0 to base.txt; regenerate all static CI/packaging
  requirements files across all platforms and Python versions
- Allow xxhash as ungated 3rd-party import in .pylintrc

Tests (138 total, all passing):
- tests/pytests/unit/utils/test_mmap_cache.py: core API coverage
- tests/pytests/unit/utils/test_mmap_cache_errors.py: error paths
- tests/pytests/unit/utils/test_mmap_cache_enterprise.py: thread
  safety, xxhash integrity, roster recovery (22 tests)
- tests/pytests/unit/cache/test_mmap_cache.py: salt.cache adapter
- tests/pytests/unit/cache/test_mmap_cache_errors.py: error paths
- tests/pytests/unit/cache/test_mmap_key.py: mmap_key backend (27)
- tests/pytests/perf/test_cache_benchmarks.py: mmap vs localfs bench

Benchmark summary (mmap vs localfs):
  store_small:          117k vs  44k ops/s  (+2.7x)
  fetch_warm:           160k vs  54k ops/s  (+3.0x)
  updated:              519k vs 142k ops/s  (+3.7x)
  sequential_append:     97k vs  39k ops/s  (+2.5x)
  flush_key:            195k vs 107k ops/s  (+1.8x)
  bulk_store_and_scan:  1.6k vs 0.8k ops/s  (+2.0x)
salt/runners/pki.py:
- Rewrite runner to remove calls to localfs_key.get_index_stats and
  localfs_key.rebuild_index, both of which were removed when the
  embedded mmap index was stripped from localfs_key.
- rename rebuild_index -> migrate_to_mmap; dry_run mode scans the
  filesystem and reports counts without writing anything; live mode
  calls mmap_key.rebuild_from_localfs.
- status() remains a dry-run alias.

tests/pytests/unit/runners/test_pki.py:
- Replace the 4 failing tests (which asserted on the removed
  get_index_stats API) with 5 tests covering the new runner:
  empty pki_dir, per-state key counts, dry-run no-writes, status/
  migrate_to_mmap parity, dotfile exclusion.

salt/cache/localfs.py:
- Fix pre-existing bug in list_(): item.rstrip(item[-2:]) treated its
  argument as a character set, so "keep.p".rstrip(".p") produced "kee"
  instead of "keep". Changed to item[:-2].

tests/pytests/unit/cache/test_cache_backends.py:
- New parameterized test module: 31 contract tests run against both
  localfs and mmap_cache backends. The test_list_does_not_include_
  flushed_keys case caught the localfs.list_ bug above.
Regenerate the four cloud static requirements files to include both
upstream's yamlloader==1.6.0 addition and our xxhash==3.7.0 addition.
Use pytest.importorskip at module level so the entire perf module
is skipped gracefully in Salt's CI environment which does not have
pytest-benchmark installed. When pytest-benchmark is present the
benchmarks run normally.
Add pytest-benchmark to requirements/pytest.txt so the perf benchmarks
in tests/pytests/perf/ are runnable in CI without needing a separate
install step. Regenerate all static CI requirements files to include
pytest-benchmark and its dependency (py-cpuinfo).

Also revert the pytest.importorskip guard added in the previous commit
since pytest-benchmark is now a declared test dependency.
salt/runners/cache.py:
- Add migrate() function that copies all entries from one cache
  backend driver to another, defaulting to localfs → mmap_cache.
- Walks the source cache recursively using contains() to distinguish
  keys from sub-bank directories, so nested bank trees are handled
  correctly regardless of driver.
- Supports bank= to restrict migration to a subtree, dry_run=True to
  count entries without writing, and cachedir= override.
- Errors on individual keys are counted and reported without aborting
  the rest of the migration; details go to the log.
- Idempotent: safe to run multiple times.

CLI examples:
  salt-run cache.migrate dry_run=True
  salt-run cache.migrate
  salt-run cache.migrate bank=minions
  salt-run cache.migrate src_driver=localfs dst_driver=mmap_cache

tests/pytests/unit/runners/test_cache_migrate.py:
- 13 tests covering: full migration, dry-run, single-bank restriction,
  empty cache, idempotency, value types, cachedir fallback, singular
  grammar, missing Banks section, fetch error path, list() exception
  in _walk, contains() exception in _walk, top-level list() exception.
…ter fix

- salt/cluster/ring.py: consistent hash ring with VNode support (xxhash
  XXH3-64, 150 vnodes/node by default). Includes owns()/is_clustered so
  standalone masters (no Raft) never shunt job returns from an empty ring.
  rebuild() is the MembershipStateMachine.on_change hook; the ring stays
  empty until the first Raft CONFIG entry commits.

- salt/utils/mmap_cache.py: heap segmentation. The 8-byte OFFSET field
  encodes (segment_id << 48 | within-segment offset), giving up to 65535
  segments of 281 TB each. Segment 0 is the original .heap file; rolls are
  transparent to callers. atomic_rebuild handles multi-segment output and
  removes stale extra segments. Fully backward-compatible with existing
  on-disk data.

- salt/runners/cache.py: fix mmap_cache→localfs migration direction. When
  src_cache.list("") returns empty (mmap_cache has no root index), fall back
  to scanning the cachedir on disk to discover top-level bank directories.

- tests/pytests/unit/cluster/test_ring.py: 69 unit tests covering empty
  ring, single-node fast-path, determinism, VNode distribution, add/remove,
  rebuild, get_replicas, wrap-around, monotonicity, is_clustered/owns
  standalone vs clustered routing guard, thread safety, and constants.

- tests/pytests/unit/utils/test_mmap_cache_segments.py: 48 unit tests for
  heap segmentation — offset encoding, segment paths, single-segment
  baseline, rolling, cross-segment reads/overwrites, discovery, atomic_rebuild,
  checksum verification, lazy mmap refresh, thread safety, delete+reput.

- tests/pytests/functional/utils/test_mmap_cache.py: 20 functional tests
  exercising real on-disk persistence, concurrent forked writers, crash-safety
  (truncated/missing/extra roster), atomic_rebuild atomicity, large datasets.

- tests/pytests/functional/cache/test_mmap_cache_driver.py: 24 functional
  tests for the full salt.cache.Cache API backed by mmap_cache with real opts.

- tests/pytests/functional/runners/test_cache_migrate.py: 11 functional
  tests for cache.migrate with real localfs↔mmap_cache I/O.

- tests/pytests/functional/cluster/test_ring.py: 14 functional tests for
  ring lifecycle, Phase A routing gate, concurrent rebuild+routing, and
  replication set behaviour.
When a new master joins a running cluster, the initial TCP push from
an existing peer to the joiner's cluster pool port can fail with a
StreamClosedError.  Previously this error was only logged as a warning
and the pub_sock was left in a broken state — all subsequent pushes to
that peer reused the dead stream and also silently failed.  This meant
the peer-AES key exchange never completed and every event forwarded
through the joiner's publish path was queued indefinitely in
auth_errors, so test.ping (and any other command) issued via the
late-joining master would never return a result.

Fix in salt/channel/server.py:
- On StreamClosedError reset pub_sock to None so the next push attempt
  opens a fresh TCP connection instead of reusing a dead stream.
- Schedule send_aes_key_event() 2 s later so the AES key is
  re-announced to all peers once the connection recovers, restarting
  the full peer-key handshake.

Fix in test_fourth_master_joins_existing_cluster:
- Retry test.ping via cluster_master_4 with a 30 s per-attempt timeout
  and a 300 s hard deadline.  The first attempt may arrive before the
  peer-key exchange completes (triggering the reconnect+resend cycle);
  subsequent attempts succeed once peer_keys are established.
Register flush_ as flush so salt.cache.Cache can remove keys. Enforce
valid_id on keys and denied_keys operations. Remove numbered heap
segments when wiping a bank. Add functional tests for the key cache API
and unit tests for minion ID validation.
Use inode, mtime, and size in _get_cache_id so concurrent writers force a
remap instead of reusing a stale index mmap.  Initialize tmp_seg_id before
atomic_rebuild for correct temp-file cleanup on early failure.  Update error
tests for heap state kept in _seg_mms.  Raise setuptools and pip pins for the
lint-salt and lint-tests hooks so the isolated pre-commit env works on
Python 3.12 without stdlib distutils.
Track whether the index mmap was mapped ACCESS_WRITE.  If a caller opens
read-only (get/get_mtime) and later writes with an unchanged cache id,
replace the mmap instead of reusing the read-only view.  Fixes CI failures
from struct.pack_into and slot updates on readonly maps.  Add a unit test
for the read-then-write sequence.
Ensure heap data is synced before index/roster updates become durable so
multiprocess writers cannot observe OCCUPIED slots whose heap records are
not yet visible. Use the backing index fd for atomic_rebuild where mmap
has no fileno().
Keep the index file object open until after the index mmap is closed so flush and fsync use a valid descriptor. Prefer that fd when syncing after flush.

Add tests/pytests/mmapcache-smoke-tests.txt and run-mmapcache-smoke-tests.sh for CI regressions around mmap cache.

Force configure_loader_modules on cache perf benchmarks so mmap opts stay consistent when tests run in an integration session.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test:full Run the full test suite

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant