[wip] mmap-backed cache backends and enterprise hardening#69015
Open
dwoz wants to merge 13 commits intosaltstack:3008.xfrom
Open
[wip] mmap-backed cache backends and enterprise hardening#69015dwoz wants to merge 13 commits intosaltstack:3008.xfrom
dwoz wants to merge 13 commits intosaltstack:3008.xfrom
Conversation
Introduce MmapCache index+heap architecture as a production-grade alternative to the localfs cache backend, and a dedicated mmap_key backend for PKI key storage. salt/utils/mmap_cache.py: - Index+heap split: fixed-size slot index with append-only heap file - Roster file (.roster) for O(occupied) list operations - Header slot tracking occupied_count, deleted_count, high_water_mark - Thread safety via threading.RLock + fcntl.flock (cross-process) - Per-entry XXH3-64 checksums (verify_checksums=True, default on) - xxhash.xxh3_64 replaces zlib.adler32/crc32: unified hash for both slot distribution and heap integrity, aligned with a future distributed token-ring partition scheme - _roster_recover(): open-time self-healing of roster/index divergence after crash between index flush and roster append - atomic_rebuild(): snapshot-style full table rebuild via temp files - Persistent file descriptors for lock, roster write, heap read mmap salt/cache/mmap_cache.py: - Drop-in localfs replacement using MmapCache as storage - Direct msgpack.packb/unpackb (not salt.payload) for faster deserialization on typical cache payloads salt/cache/mmap_key.py: - New mmap-native PKI key backend (keys.cache_driver: mmap_key) - Stores minion ID, key state, and public key in heap; no filesystem fallback, no dual code path - Heap layout: STATE_BYTE(1) + pub_key_bytes for the keys bank - rebuild_from_localfs() for one-time migration from legacy pki layout salt/cache/localfs_key.py: - Remove embedded MmapCache index (_get_index, rebuild_index, pki_index_enabled guards) superseded by mmap_key backend - Restore as a clean filesystem shim with no dual code paths salt/key.py: - Remove pki_index_enabled rebuild_index call sites requirements: - Add xxhash>=3.0.0 to base.txt; regenerate all static CI/packaging requirements files across all platforms and Python versions - Allow xxhash as ungated 3rd-party import in .pylintrc Tests (138 total, all passing): - tests/pytests/unit/utils/test_mmap_cache.py: core API coverage - tests/pytests/unit/utils/test_mmap_cache_errors.py: error paths - tests/pytests/unit/utils/test_mmap_cache_enterprise.py: thread safety, xxhash integrity, roster recovery (22 tests) - tests/pytests/unit/cache/test_mmap_cache.py: salt.cache adapter - tests/pytests/unit/cache/test_mmap_cache_errors.py: error paths - tests/pytests/unit/cache/test_mmap_key.py: mmap_key backend (27) - tests/pytests/perf/test_cache_benchmarks.py: mmap vs localfs bench Benchmark summary (mmap vs localfs): store_small: 117k vs 44k ops/s (+2.7x) fetch_warm: 160k vs 54k ops/s (+3.0x) updated: 519k vs 142k ops/s (+3.7x) sequential_append: 97k vs 39k ops/s (+2.5x) flush_key: 195k vs 107k ops/s (+1.8x) bulk_store_and_scan: 1.6k vs 0.8k ops/s (+2.0x)
salt/runners/pki.py:
- Rewrite runner to remove calls to localfs_key.get_index_stats and
localfs_key.rebuild_index, both of which were removed when the
embedded mmap index was stripped from localfs_key.
- rename rebuild_index -> migrate_to_mmap; dry_run mode scans the
filesystem and reports counts without writing anything; live mode
calls mmap_key.rebuild_from_localfs.
- status() remains a dry-run alias.
tests/pytests/unit/runners/test_pki.py:
- Replace the 4 failing tests (which asserted on the removed
get_index_stats API) with 5 tests covering the new runner:
empty pki_dir, per-state key counts, dry-run no-writes, status/
migrate_to_mmap parity, dotfile exclusion.
salt/cache/localfs.py:
- Fix pre-existing bug in list_(): item.rstrip(item[-2:]) treated its
argument as a character set, so "keep.p".rstrip(".p") produced "kee"
instead of "keep". Changed to item[:-2].
tests/pytests/unit/cache/test_cache_backends.py:
- New parameterized test module: 31 contract tests run against both
localfs and mmap_cache backends. The test_list_does_not_include_
flushed_keys case caught the localfs.list_ bug above.
Regenerate the four cloud static requirements files to include both upstream's yamlloader==1.6.0 addition and our xxhash==3.7.0 addition.
Use pytest.importorskip at module level so the entire perf module is skipped gracefully in Salt's CI environment which does not have pytest-benchmark installed. When pytest-benchmark is present the benchmarks run normally.
Add pytest-benchmark to requirements/pytest.txt so the perf benchmarks in tests/pytests/perf/ are runnable in CI without needing a separate install step. Regenerate all static CI requirements files to include pytest-benchmark and its dependency (py-cpuinfo). Also revert the pytest.importorskip guard added in the previous commit since pytest-benchmark is now a declared test dependency.
salt/runners/cache.py: - Add migrate() function that copies all entries from one cache backend driver to another, defaulting to localfs → mmap_cache. - Walks the source cache recursively using contains() to distinguish keys from sub-bank directories, so nested bank trees are handled correctly regardless of driver. - Supports bank= to restrict migration to a subtree, dry_run=True to count entries without writing, and cachedir= override. - Errors on individual keys are counted and reported without aborting the rest of the migration; details go to the log. - Idempotent: safe to run multiple times. CLI examples: salt-run cache.migrate dry_run=True salt-run cache.migrate salt-run cache.migrate bank=minions salt-run cache.migrate src_driver=localfs dst_driver=mmap_cache tests/pytests/unit/runners/test_cache_migrate.py: - 13 tests covering: full migration, dry-run, single-bank restriction, empty cache, idempotency, value types, cachedir fallback, singular grammar, missing Banks section, fetch error path, list() exception in _walk, contains() exception in _walk, top-level list() exception.
…ter fix
- salt/cluster/ring.py: consistent hash ring with VNode support (xxhash
XXH3-64, 150 vnodes/node by default). Includes owns()/is_clustered so
standalone masters (no Raft) never shunt job returns from an empty ring.
rebuild() is the MembershipStateMachine.on_change hook; the ring stays
empty until the first Raft CONFIG entry commits.
- salt/utils/mmap_cache.py: heap segmentation. The 8-byte OFFSET field
encodes (segment_id << 48 | within-segment offset), giving up to 65535
segments of 281 TB each. Segment 0 is the original .heap file; rolls are
transparent to callers. atomic_rebuild handles multi-segment output and
removes stale extra segments. Fully backward-compatible with existing
on-disk data.
- salt/runners/cache.py: fix mmap_cache→localfs migration direction. When
src_cache.list("") returns empty (mmap_cache has no root index), fall back
to scanning the cachedir on disk to discover top-level bank directories.
- tests/pytests/unit/cluster/test_ring.py: 69 unit tests covering empty
ring, single-node fast-path, determinism, VNode distribution, add/remove,
rebuild, get_replicas, wrap-around, monotonicity, is_clustered/owns
standalone vs clustered routing guard, thread safety, and constants.
- tests/pytests/unit/utils/test_mmap_cache_segments.py: 48 unit tests for
heap segmentation — offset encoding, segment paths, single-segment
baseline, rolling, cross-segment reads/overwrites, discovery, atomic_rebuild,
checksum verification, lazy mmap refresh, thread safety, delete+reput.
- tests/pytests/functional/utils/test_mmap_cache.py: 20 functional tests
exercising real on-disk persistence, concurrent forked writers, crash-safety
(truncated/missing/extra roster), atomic_rebuild atomicity, large datasets.
- tests/pytests/functional/cache/test_mmap_cache_driver.py: 24 functional
tests for the full salt.cache.Cache API backed by mmap_cache with real opts.
- tests/pytests/functional/runners/test_cache_migrate.py: 11 functional
tests for cache.migrate with real localfs↔mmap_cache I/O.
- tests/pytests/functional/cluster/test_ring.py: 14 functional tests for
ring lifecycle, Phase A routing gate, concurrent rebuild+routing, and
replication set behaviour.
When a new master joins a running cluster, the initial TCP push from an existing peer to the joiner's cluster pool port can fail with a StreamClosedError. Previously this error was only logged as a warning and the pub_sock was left in a broken state — all subsequent pushes to that peer reused the dead stream and also silently failed. This meant the peer-AES key exchange never completed and every event forwarded through the joiner's publish path was queued indefinitely in auth_errors, so test.ping (and any other command) issued via the late-joining master would never return a result. Fix in salt/channel/server.py: - On StreamClosedError reset pub_sock to None so the next push attempt opens a fresh TCP connection instead of reusing a dead stream. - Schedule send_aes_key_event() 2 s later so the AES key is re-announced to all peers once the connection recovers, restarting the full peer-key handshake. Fix in test_fourth_master_joins_existing_cluster: - Retry test.ping via cluster_master_4 with a 30 s per-attempt timeout and a 300 s hard deadline. The first attempt may arrive before the peer-key exchange completes (triggering the reconnect+resend cycle); subsequent attempts succeed once peer_keys are established.
Register flush_ as flush so salt.cache.Cache can remove keys. Enforce valid_id on keys and denied_keys operations. Remove numbered heap segments when wiping a bank. Add functional tests for the key cache API and unit tests for minion ID validation.
Use inode, mtime, and size in _get_cache_id so concurrent writers force a remap instead of reusing a stale index mmap. Initialize tmp_seg_id before atomic_rebuild for correct temp-file cleanup on early failure. Update error tests for heap state kept in _seg_mms. Raise setuptools and pip pins for the lint-salt and lint-tests hooks so the isolated pre-commit env works on Python 3.12 without stdlib distutils.
Track whether the index mmap was mapped ACCESS_WRITE. If a caller opens read-only (get/get_mtime) and later writes with an unchanged cache id, replace the mmap instead of reusing the read-only view. Fixes CI failures from struct.pack_into and slot updates on readonly maps. Add a unit test for the read-then-write sequence.
Ensure heap data is synced before index/roster updates become durable so multiprocess writers cannot observe OCCUPIED slots whose heap records are not yet visible. Use the backing index fd for atomic_rebuild where mmap has no fileno().
Keep the index file object open until after the index mmap is closed so flush and fsync use a valid descriptor. Prefer that fd when syncing after flush. Add tests/pytests/mmapcache-smoke-tests.txt and run-mmapcache-smoke-tests.sh for CI regressions around mmap cache. Force configure_loader_modules on cache perf benchmarks so mmap opts stay consistent when tests run in an integration session.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.