Skip to content

feat: image-backed microVM boot#11

Merged
markovejnovic merged 87 commits into
mainfrom
feat/wire-linux
Jun 23, 2026
Merged

feat: image-backed microVM boot#11
markovejnovic merged 87 commits into
mainfrom
feat/wire-linux

Conversation

@markovejnovic

@markovejnovic markovejnovic commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

No description provided.

Extract attach_args() as a pure function to assemble losetup arguments
without filesystem access. Replace clap-based test (which invoked the
ok_backing_file validator and required /srv/hyper/test.img to exist)
with hermetic unit tests that assert on attach_args() directly:
- attach_args(false, path) includes --read-only
- attach_args(true, path) omits --read-only

Validates the real --read-only-omission logic without requiring any
backing files to exist on the system.
…link traversal

CRITICAL: Replace mknod --major/--minor with --device <BlockDev>. The helper
now opens the device with O_PATH|O_NOFOLLOW, fstats it, and uses the kernel's
own st_rdev for mknodat — callers can no longer name arbitrary major:minor.

HIGH (x2): Introduce open_parent_nofollow in safe_dev.rs: walks every parent
component of a JailPath under JAIL_BASE with openat(O_NOFOLLOW), so a symlinked
component causes ELOOP → SymlinkComponent before any write occurs. mknodat,
linkat, and fchownat(AT_SYMLINK_NOFOLLOW) are all relative to the verified
parent fd. Replaces plain chown/hard_link/mknod calls in mknod.rs and stage.rs.

Add pure unit tests for check_owner (uid/gid 0 and <1000 rejected),
jail_relative_parts (strips JAIL_BASE, splits parent+name), and existing
JailPath lexical tests. 18 tests pass; cargo build --release clean.
… Writable.release/1 holder

Add comments exposing the silent coupling between Rust setuid helper constants
and Elixir config. JAIL_BASE and HYPER_BASE must match config :hyper, work_dir
and its derived paths; changing one without rebuilding the helper breaks device
staging with opaque errors. Also document Writable.release/1 holder semantics.
@markovejnovic markovejnovic changed the title feat: image-backed microVM boot (Route B) feat: image-backed microVM boot Jun 22, 2026
…ce work_dir/helper base match at node startup
…e-adoption

A controller crash now discards the daemon (no orphaned VM) and the fresh
controller cold-boots. terminate always stops the daemon; MuonTrap kills the
firecracker OS process when its port closes, so none survive teardown/BEAM death.
Removes Daemon.ensure adopt branch and the AwaitingApi Running/Paused shortcut.
Daemon is now a static :permanent child of a plain Core supervisor (:one_for_all);
a firecracker crash exits it and Core restarts daemon+controller together for a
clean cold boot. Daemon.start_link resets the stale jail (chroot + cgroup, via the
new reset-jail helper) before each launch so relaunch succeeds. Removes the
controller's daemon monitor, the :booting/:crashed states, re-adoption, and
DynamicSupervisor; State.init goes straight to :awaiting_api.
Per-axis Check traits over six markers (absoluteness, components,
existence, file type, owner, mode); Any turns an axis off. One trait per
axis so each type-parameter slot only accepts its own markers. Validation
runs every enforced axis sharing a single symlink_metadata call, and
reports the first failure via one ValidationError (no per-combination
error type - Rust can't synthesise one). Not yet wired into the tools.
Seventh axis. Unlike the six type-only markers, confinement carries a
runtime base value (a &Path can't be a type parameter), so LivesUnder<'a>
holds the base and is supplied to a dedicated under(path, base)
constructor. TryFrom stays the entry for the unconfined (Any) case.
Drop the lifetime: SafePath<...,LivesUnder> no longer borrows the base, so
a confined path is self-contained. under() takes an owned PathBuf (e.g.
Config::get().jail_base() directly).
Remove the metadata axes (MustExist/IsRegularFile/RootOwner/OnlyRootWritable)
and their by-name stat: checked by path they are TOCTOU-racy, so calling them
"safe" was a footgun. SafePath now reasons about the name only (absoluteness,
components, confinement-prefix); existence/type/owner/mode move to fd-based
verification (future safe_file).
Lexical starts_with confinement is defeated by a symlinked component, so it
was the same false-safety footgun as the metadata checks. SafePath is now
purely absoluteness + components; real confinement is the O_NOFOLLOW walk
(fd-side).
Wraps an open fd (backed by std OwnedFd) and closes it exactly once, on
drop - never before. The fd half of the path-safety story: resolve a name
to a descriptor once, then verify (fstat) and operate (*at) through the
held fd, immune to the by-name TOCTOU races. Replaces the scattered manual
close calls (which leak on early return / risk double-close).
SafeFile<T,R,O> proves in its type which fstat-checked properties the held
fd has: file type / ownership / mode, the same axes pulled out of SafePath
(by-name they were TOCTOU footguns; on the fd they are sound). Verification
runs once in TryFrom<OwnedFd> sharing one fstat; Any turns an axis off.
Existence needs no axis - holding an fd proves the file exists.
open(path, flags) takes a lexically-validated SafePath, opens it with
O_NOFOLLOW|O_CLOEXEC always forced, and runs the fstat axes - so one call
proves existence (the open succeeded) plus type/owner/mode, all in the
returned type. O_PATH to verify-only, O_RDONLY to also read. Guards only the
final component; confined trees still need the fd-by-fd parent walk.
Replace the bespoke O_NOFOLLOW open + metadata() owner/mode/type checks in
Config::safe_load with the pipeline: lexical SafePath gate, then
SafeFile::<IsRegularFile,RootOwner,OnlyRootWritable>::open(O_RDONLY) which
proves existence + the fstat axes on the held fd, then read through that fd.
First real consumer of the new utilities; security logic lives in one place.
Drop the from_validation mapping function. LoadingError now wraps the
underlying errors via #[from] (Path/File variants), so safe_load uses ? and
the precise messages surface directly. Both ValidationErrors are now Copy
(payloads already are) to keep LoadingError Copy.
SafeDir owns a directory fd and is both the walk primitive (openat_dir
descends one component O_NOFOLLOW|O_DIRECTORY, relative to the pinned fd) and
the home for fd-relative removal: unlink/rmdir via unlinkat, and a recursive
remove_dir_all that descends with fresh openat'd fds and never re-resolves a
path by name (vs std::fs::remove_dir_all, a by-name TOCTOU footgun). Symlinked
entries are unlinked, never followed; DT_UNKNOWN falls back to a confined open
probe. Adds the nix "dir" feature.
…kDevice

Primitives for the walk migration:
- SafePath::relative_to(base) -> (parents, leaf), gated on StrictComponents.
- SafeDir: descend (walk), openat_file, create_file, mknod_block, link_from,
  chown, try_clone.
- SafeFile: IsBlockDevice tag + type-gated rdev() accessor.
…ete JailPath

Retire the bespoke path/fd security code in favour of the typed utilities:

- prepare: walk the chroot from JAIL_BASE via SafeDir (O_NOFOLLOW, confinement
  proven), then stage kernel + mknod rootfs relative to that dir fd.
- stage: stage_into(parent, name, src, ...) - link_from / create_file+copy /
  chown via SafeDir; canonicalize+confine the source. No manual close dance.
- mknod: device fd is SafeFile<IsBlockDevice>; rdev() is type-gated; node via
  SafeDir::mknod_block.
- remove: fd-relative deletion via SafeDir.remove_dir_all / rmdir after an
  O_NOFOLLOW walk, replacing std::fs::remove_dir_all (by-name TOCTOU).
- safe_dev: deleted JailPath, jail_relative_parts, open_parent_nofollow and the
  Jail/SymlinkComponent/DeviceStat error variants; it is now device-name
  newtypes only.
- trimmed dead code (safe_path::Any/as_path, safe_dir::openat_file).

Confinement is now proven by the walk, every fd is RAII, and the whole
O_NOFOLLOW/fstat/openat/close surface lives in util/{safe_path,safe_file,safe_dir}.
…calize

canonicalize + confine-under-HYPER_BASE stays (path resolution losetup needs
as a path), but the manual open + fstat + S_IFREG check becomes
SafeFile::<IsRegularFile,..>::open(O_PATH) - the regular-file proof rides the
held fd. SafeFile's fd is O_CLOEXEC, so dup an inheritable copy for the child
losetup to reopen via /proc/self/fd; the SafeFile closes on drop. Drops the
bespoke OpenBacking-open/NotRegularFile logic.
One type per file: snapshot.rs (SnapshotTable), thin_pool.rs (ThinPoolTable),
thin.rs (ThinTable), table.rs (DmTable), message.rs (ThinMessage). mod.rs keeps
the shared Error and the Dmsetup tool. No behaviour change.
util/chroot_jail.rs: a lazy, declarative builder for a VM chroot's contents.
Kernel and rootfs slots start Unset and carry their value once set
(with_kernel/with_rootfs); build() exists only on ChrootJail<Kernel,Rootfs>,
so a jail missing either artifact is a compile error. build() does the
confined O_NOFOLLOW walk then realizes each artifact relative to the chroot
dir fd - the staging (hardlink/copy/chown) and mknod logic folded in here.

prepare.rs collapses to a three-line declaration. Deleted tools/stage.rs and
tools/mknod.rs (logic now lives once, in the builder).
git mv src/{safe_bin,safe_dev}.rs -> src/util/; declared in util/mod.rs;
repointed crate::safe_{bin,dev} -> crate::util::safe_{bin,dev}. No behaviour
change.
Across the helper, filesystem paths and path components are now Path/PathBuf:
- SafeDir: name params &str -> &Path, descend &[String] -> &[PathBuf], errors
  hold PathBuf; entries read filenames as PathBuf via OsStr (no UTF-8 coupling,
  drops the BadName variant).
- SafePath::relative_to -> (Vec<PathBuf>, PathBuf), dropping NonUtf8.
- ChrootJail: chroot/kernel slots PathBuf; new/with_kernel take Into<PathBuf>.
- prepare/remove args -> PathBuf; remove helpers take &Path.
- losetup backing-file value parser returns PathBuf.
- output device fields (losetup/dmsetup) and sys-test hyper_base -> PathBuf
  (serde still serialises them as JSON strings, so the wire is unchanged).

Left as-is (not paths): FromStr(&str) parse interfaces, DmName/SafeBin
validated-name newtypes, and the path-literal consts.
# Conflicts:
#	lib/hyper.ex
#	lib/hyper/node.ex
#	lib/hyper/node/fire_vmm/state.ex
#	native/suidhelper/src/tools/dmsetup/mod.rs
#	native/suidhelper/src/tools/losetup.rs
Pass the newly-added CI's Rust gates: rustfmt the tree, and match the EXDEV
errno in the pattern instead of a guard (clippy::redundant_guard).
- Img.create_mutable / Daemon.start_link: narrow the supervisor/MuonTrap
  start result via case so the return matches the {:ok,pid}|{:error,_} spec
  (was returning the wider on_start_child type directly -> missing_range).
- Img.Mutable.State: add the @type t referenced by drop/2's spec
  (was unknown_type).
@markovejnovic markovejnovic marked this pull request as ready for review June 23, 2026 06:16
@markovejnovic markovejnovic merged commit 6de4008 into main Jun 23, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant