A flexible parallel test runner written in Rust with pluggable execution providers. By Imbue.
- Parallel execution across multiple sandboxes (local processes or remote environments)
- Pluggable providers: local, default (custom shell commands), and Modal
- Multiple test frameworks: pytest, cargo nextest, vitest, or any custom runner
- Automatic retry with flaky test detection
- JUnit XML reporting
- LPT scheduling when historical timing data is available, with round-robin fallback
- Group-level filtering to split tests into groups with different filters and retry policies
- Environment variable expansion in config values (
${VAR}and${VAR:-default}) - Bundled script references using
@filename.extsyntax in commands
Speedups measured on Imbue projects using Offload with the Modal provider. All local baselines were run on a MacBook Pro with Apple M4 (10 cores: 4P + 6E), 16 GB RAM.
| Run Kind | Time (s) | Time (%) | Speedup |
|---|---|---|---|
| pytest with xdist, n=3 (baseline) | 100.0% | 1.00x | |
| pytest with xdist, n=8 | 61.6% | 1.62x | |
| Offload (Modal, max 200) | 16.5% | 6.04x |
Notes
345 Playwright integration tests (browser-based, each launching a full Sculptor instance). Individual tests are heavyweight (Chromium + backend server per worker), so the default xdist cap is n=3. Offload bypasses xdist entirely, fanning out across up to 200 isolated Modal sandboxes -- each running a single test against its own Sculptor instance. The high per-test cost makes Offload's per-sandbox overhead negligible, yielding a 6.04x speedup.
| Run Kind | Time (s) | Time (%) | Speedup |
|---|---|---|---|
| pytest with xdist, n=4 (baseline) | 100.0% | 1.00x | |
| pytest with xdist, n=8 | 77.2% | 1.30x | |
| Offload (Modal, max 200) | 53.8% | 1.86x |
Notes
5,275 tests collected (unit + integration + acceptance, excluding release). Individual tests are lightweight and fast-running, so the default xdist cap is n=4. Offload bypasses xdist entirely, fanning out across up to 200 isolated Modal sandboxes. The low per-test cost makes Offload's per-sandbox overhead proportionally larger, yielding a more modest 1.86x speedup vs Sculptor's 6.04x.
From crates.io:
cargo install offloadFrom source:
cargo install --path .Core:
- Rust toolchain (
cargo) to install Offload
For Modal providers (type = "modal" or type = "default" with @modal_sandbox.py):
- uv — the bundled
modal_sandbox.pyis invoked viauv run, which auto-installs its dependencies (modal,click) - A Modal account — authenticate with
modal token new
For the pytest framework (local test discovery):
- Python and pytest installed locally — Offload runs
pytest --collect-onlyon the local machine to discover tests - The configured
command(e.g.uv run pytest,python -m pytest) must be on PATH
For the nextest framework:
- cargo-nextest — Offload runs
cargo nextest listfor test discovery. Install withcargo install cargo-nextest
For the vitest framework:
- Node.js and npm (or equivalent package manager) — Offload runs
npx vitest --reporter=jsonfor test discovery
For the default framework:
- Whatever tools your
discover_commandandrun_commandinvoke
Offload relies on a stable relationship between test discovery, execution, and result reporting. Understanding these expectations is essential when using the default framework or debugging test ID mismatches.
Each group triggers its own discovery call. The discovered test IDs become the canonical identifiers for the entire run.
- pytest: Runs
{command} --collect-only -qlocally and parses one test ID per line from stdout. Output format:path/to/test.py::TestClass::test_method. Groupfiltersare appended as extra pytest args (e.g.-m 'not slow'). - nextest (
type = "nextest"): Runscargo nextest list --message-format jsonlocally and parses test IDs from the JSON output. Test IDs are formatted as{binary_id} {test_name}. Groupfiltersare appended as extra nextest args. - default: Runs
discover_commandthroughsh -cand reads one test ID per line from stdout. The{filters}placeholder is replaced with the group's filter string (or empty string). Lines starting with#are ignored. - vitest (
type = "vitest"): Runs{command} --reporter=jsonlocally and parses test IDs from the JSON output. Groupfiltersare appended as extra vitest args.
Offload matches discovered test IDs to JUnit XML results using a test_id_format string that controls how JUnit XML name and classname attributes are combined into a test ID. For example, "{name}" uses just the name attribute; "{classname} {name}" joins them with a space. This is the most common source of "Not Run" errors.
- The JUnit attributes produced by the test runner must match the test ID from discovery after applying
test_id_format. If they don't match, Offload reports the test as "Not Run". - pytest: The format defaults to
"{name}". The_set_junit_test_idconftest fixture writes the full nodeid into the JUnitnameattribute so it matches thepytest --collect-onlyoutput. Configurable viatest_id_format. - nextest: The format defaults to
"{classname} {name}"where classname is the binary ID and name is the test function. Configurable viatest_id_format. - vitest: The format defaults to
"{classname} > {name}", configurable viatest_id_format. - default: The
test_id_formatfield is a required configuration option. Set it to match how your test runner populates the JUnit XMLnameandclassnameattributes.
After execution, Offload collects results via one of two mechanisms:
- JUnit XML (recommended): The test command writes a JUnit XML file. For the
defaultframework, configureresult_filewith the path and use{result_file}inrun_command. For pytest and cargo, Offload generates the--junitxml/ nextest JUnit flags automatically. - Exit code fallback (default framework only): If no
result_fileis configured, Offload infers pass/fail from the command's exit code. This loses per-test granularity — all tests are reported under a syntheticall_testsID, and flaky test detection will not work.
- Tests are retried up to
retry_counttimes (configured per group). - Retries run in parallel across available sandboxes.
- If any retry attempt passes, the test is reported as passed.
- A test that passes after a failure is marked as flaky (exit code 2).
- Without JUnit XML result files, retries cannot identify individual test failures and may behave incorrectly.
| Code | Meaning |
|---|---|
| 0 | All tests passed |
| 1 | One or more tests failed, or tests were not run |
| 2 | All tests passed, but some were flaky (passed only on retry) |
After a test run, Offload writes per-batch log files to {output_dir}/logs/:
| File | Meaning |
|---|---|
batch-{N}.stdout.{outcome} |
Standard output from batch N |
batch-{N}.stderr.{outcome} |
Standard error from batch N |
Where {outcome} is one of: success (all tests passed), failure (one or more tests failed), error (infrastructure error), or cancelled (batch cancelled before completion).
The {output_dir} defaults to test-results and is configurable via [report] output_dir.
- Initialize a configuration file:
offload init --provider local --framework pytest-
Edit
offload.tomlas needed for your project. -
Run tests:
offload run| Flag | Description |
|---|---|
-c, --config PATH |
Configuration file path (default: offload.toml) |
-v, --verbose |
Enable verbose output |
Run tests in parallel.
| Flag | Description |
|---|---|
--parallel N |
Override maximum parallel sandboxes |
--collect-only |
Discover tests without running them |
--copy-dir LOCAL:REMOTE |
Copy a directory into each sandbox (repeatable) |
--env KEY=VALUE |
Set an environment variable in sandboxes (repeatable) |
--no-cache |
Skip cached image lookup during prepare (forces fresh build) |
--trace |
Emit a Perfetto trace to {output_dir}/trace.json |
--fail-fast |
Stop on first test failure. Passes a framework-level stop flag (-x for pytest, --fail-fast for nextest, --bail for vitest) and cancels remaining batches at the orchestrator level |
--show-estimated-cost |
Show estimated sandbox cost after run (client-side estimate, may not reflect actual billing) |
--record-history |
Record test results to history file after run. Requires a [history] section in config |
Build the sandbox image without running tests. Prepares the provider image (resolving cache, building if needed) and writes the image ID to git notes. The image ID is printed to stdout on success.
| Flag | Description |
|---|---|
--no-cache |
Skip cached image lookup during prepare (forces fresh build) |
Discover tests without running them.
| Flag | Description |
|---|---|
-f, --format text|json |
Output format (default: text) |
Validate the configuration file and print a summary of settings.
Generate a new offload.toml configuration file.
| Flag | Description |
|---|---|
-p, --provider TYPE |
Provider type: local, default (default: local) |
-f, --framework TYPE |
Framework type: pytest, nextest, vitest, default (default: pytest) |
View per-test results from the most recent run. Reads the JUnit XML report
at {output_dir}/{junit_file} (default: test-results/junit.xml).
| Flag | Description |
|---|---|
--failures |
Show only failed tests |
--errors |
Show only errored tests |
--test ID |
Show only the test with this exact ID (repeatable) |
--test-regex PATTERN |
Show only tests whose ID matches this regex (substring match) |
All flags compose with AND logic. For example, offload logs --failures --test-regex "test_math"
shows only failed tests whose ID contains test_math.
With no flags, all test results are printed. Each test is separated by a banner:
=== tests/test_math.py::test_add [PASSED] ===
=== tests/test_math.py::test_div [FAILED] ===
AssertionError: expected 2 got 3
tests/test_math.py:10: in test_div
assert 1 / 0 == 2
E AssertionError: expected 2 got 3
Git merge driver for history files. Used automatically by git when configured.
offload history merge <base> <ours> <theirs>
Configure the git merge driver for offload-history.jsonl. Updates .gitattributes and .git/config so history files merge automatically during git operations.
offload history setup-merge-driver| Code | Meaning |
|---|---|
| 0 | All tests passed |
| 1 | Test failures or tests not run |
| 2 | Flaky tests only (passed on retry) |
Configuration is stored in a TOML file (default: offload.toml).
| Field | Type | Default | Description |
|---|---|---|---|
max_parallel |
integer | 10 |
Maximum number of parallel sandboxes |
test_timeout_secs |
integer | 900 |
Timeout per test batch in seconds |
working_dir |
string | (cwd) | Working directory for test execution |
sandbox_repo_root |
string | (none) | Path to the repository root inside the sandbox (e.g. /app). Used for thin-diff patches and as the default test working directory (OFFLOAD_ROOT) |
sandbox_project_root |
string | (none) | Working directory for test execution, if different from sandbox_repo_root. Only needed in monorepo setups where tests run from a subdirectory (e.g. /app/mypackage) |
sandbox_init_cmd |
string | (none) | Optional command to run during image build, after cwd/copy-dirs are applied |
post_patch_cmd |
string | (none) | Optional command to run after thin-diff patch is applied, before image materialization. Runs as an image layer. OFFLOAD_PATCH_FILE env var is set to the patch path when a diff exists |
Set sandbox_repo_root to tell Offload where the codebase lives in the sandbox. In monorepo setups where tests run from a subdirectory, also set sandbox_project_root to that subdirectory.
The type field selects the provider. One of: local, default, modal.
Run tests as local child processes.
| Field | Type | Default | Description |
|---|---|---|---|
working_dir |
string | (cwd) | Working directory for spawned processes |
env |
table | {} |
Environment variables for test processes |
shell |
string | /bin/sh |
Shell used to execute commands |
Custom shell commands for sandbox lifecycle management. Commands use placeholder variables that are replaced via simple string substitution at runtime.
| Field | Type | Default | Description |
|---|---|---|---|
prepare_command |
string | (none) | Runs once before sandbox creation. Must print an image ID as its last line of stdout (e.g. im-rlXozWoN3Q9TWD8I6fnxm5) |
create_command |
string | required | Creates a sandbox. Must print a sandbox ID to stdout (e.g. sb-xyz123). {image_id} is replaced with the output of prepare_command |
exec_command |
string | required | Runs a command inside a sandbox. {sandbox_id} is replaced with the sandbox ID from create_command. {command} is replaced with the full shell-escaped command string (program + args + env vars as a single quoted argument) |
destroy_command |
string | required | Destroys a sandbox. {sandbox_id} is replaced with the sandbox ID |
download_command |
string | (none) | Downloads files from a sandbox. {sandbox_id} is replaced with the sandbox ID. {paths} is replaced with space-separated 'remote':'local' pairs |
working_dir |
string | (cwd) | Working directory for lifecycle commands |
timeout_secs |
integer | 3600 |
Timeout for remote commands in seconds |
copy_dirs |
list | [] |
Directories to copy into the image ("local:remote" format) |
env |
table | {} |
Environment variables for test processes |
cpu_cores |
float | 1.0 |
CPU cores per sandbox |
Simplified Modal sandbox provider. Internally generates the appropriate Modal CLI commands.
| Field | Type | Default | Description |
|---|---|---|---|
dockerfile |
string | (none) | Path to Dockerfile for building the sandbox image |
include_cwd |
boolean | false |
Copy the current working directory into the image |
copy_dirs |
list | [] |
Directories to copy into the image ("local:remote" format) |
env |
table | {} |
Environment variables for test processes |
cpu_cores |
float | 0.125 |
CPU cores per sandbox |
memory_gb |
float | (none) | Memory per sandbox in GiB |
experimental_options |
table | {} |
Experimental options passed as JSON to Sandbox.create() (e.g. enable_docker = true) |
Use experimental_options to pass feature flags to Modal's Sandbox.create() (e.g. [provider.experimental_options]\nenable_docker = true). These options may change on Modal's side without notice.
The type field selects the framework. One of: pytest, nextest, vitest, default.
| Field | Type | Default | Description |
|---|---|---|---|
paths |
list | (none) | Optional directories to search for tests. When omitted, pytest uses its own default discovery |
command |
string | "python -m pytest" |
Full command prefix for pytest invocation (e.g. "uv run pytest") |
run_args |
string | (none) | Extra arguments for test execution only (not discovery) |
test_id_format |
string | "{name}" |
Format for matching test IDs from JUnit XML ({name}, {classname}) |
Requires cargo-nextest.
| Field | Type | Default | Description |
|---|---|---|---|
package |
string | (none) | Package to test in a Cargo workspace (cargo test -p <package>) |
features |
list | [] |
Cargo features to enable during testing |
bin |
string | (none) | Specific binary to test (cargo test --bin <name>) |
include_ignored |
boolean | false |
Include tests marked with #[ignore] |
test_id_format |
string | "{classname} {name}" |
Format for matching test IDs from JUnit XML ({name}, {classname}) |
Custom shell commands for test discovery and execution.
| Field | Type | Default | Description |
|---|---|---|---|
discover_command |
string | required | Command that outputs one test ID per line to stdout. Must contain {filters} placeholder |
run_command |
string | required | Command template; {tests} is replaced with space-separated test IDs. {result_file} is replaced with the result file path if configured |
result_file |
string | (none) | Path to JUnit XML result file produced by the test runner |
working_dir |
string | (cwd) | Working directory for test commands |
test_id_format |
string | required | Format for test IDs from JUnit XML ({name}, {classname}) |
| Field | Type | Default | Description |
|---|---|---|---|
command |
string | "npx vitest" |
Full command prefix for vitest invocation |
run_args |
string | (none) | Extra arguments for test execution only (not discovery) |
test_id_format |
string | "{classname} > {name}" |
Format for matching test IDs from JUnit XML ({name}, {classname}) |
At least one group is required. Each group runs its own test discovery with its filters.
| Field | Type | Default | Description |
|---|---|---|---|
retry_count |
integer | 0 |
Number of times to retry failed tests |
filters |
string | "" |
Filter string passed to the framework during discovery. For pytest: pytest args (e.g. -m 'not slow'). For cargo: nextest list args. For default: substituted into {filters} placeholder in discover_command |
schedule_individual |
boolean | false |
When true, each test in this group is scheduled in its own batch (batch size 1). Use for heavyweight tests that should not share a sandbox with other tests |
Failed tests that pass on retry are marked as "flaky" (exit code 2).
| Field | Type | Default | Description |
|---|---|---|---|
output_dir |
string | "test-results" |
Directory for report files |
junit |
boolean | true |
Enable JUnit XML output |
junit_file |
string | "junit.xml" |
Filename for JUnit XML output |
download_globs |
string[] | [] |
Glob patterns for files to download from sandboxes after each batch |
download_globs_failure_only |
boolean | false |
When true, only download download_globs artifacts for batches that had test failures or errors |
When present, enables history-based LPT scheduling. Offload loads historical test durations on every run to optimize batch assignment. Recording results to the history file is controlled by record_history.
| Field | Type | Default | Description |
|---|---|---|---|
record_history |
string | "flag" |
When to record results: "always" (every run) or "flag" (only with --record-history) |
path |
string | "offload-history.jsonl" |
Path to the JSONL history file. Can be checked into source control |
reservoir_size |
integer | 20 |
Maximum samples per outcome (pass/fail) per test. Larger values improve statistical estimates but increase file size |
default_duration_secs |
float | 1.0 |
Fallback duration estimate (seconds) when no historical data is available |
Example:
[history]
record_history = "flag"
path = "offload-history.jsonl"
reservoir_size = 20
default_duration_secs = 1.0Run offload history setup-merge-driver to enable automatic conflict-free merging of the history file during git operations.
Pass --trace to offload run to generate a Chrome Trace Event JSON file:
offload run --traceAfter the run completes, the trace is written to {output_dir}/trace.json (default: test-results/trace.json). Open it in Perfetto UI to visualize the execution timeline.
The trace includes:
- Local phases: config loading, test discovery, image preparation, sandbox pool creation
- Orchestrator: scheduling, result aggregation, sandbox cleanup
- Per-sandbox: batch execution, JUnit XML download, result parsing
When --trace is not passed, tracing is completely disabled with zero overhead.
Example configuration files are included in the repository root.
[offload]
max_parallel = 4
test_timeout_secs = 300
sandbox_repo_root = "."
[provider]
type = "local"
working_dir = "."
[framework]
type = "nextest"
[groups.all]
retry_count = 0
[report]
output_dir = "test-results"[offload]
max_parallel = 4
test_timeout_secs = 600
sandbox_repo_root = "/app"
[provider]
type = "default"
prepare_command = "uv run @modal_sandbox.py prepare --include-cwd examples/Dockerfile"
create_command = "uv run @modal_sandbox.py create {image_id}"
exec_command = "uv run @modal_sandbox.py exec {sandbox_id} {command}"
destroy_command = "uv run @modal_sandbox.py destroy {sandbox_id}"
download_command = "uv run @modal_sandbox.py download {sandbox_id} {paths}"
timeout_secs = 600
[framework]
type = "pytest"
paths = ["examples/tests"]
command = "uv run pytest"
[groups.unit]
retry_count = 2
filters = "-m 'not slow' -k 'not test_flaky'"
[groups.slow]
retry_count = 3
filters = "-m 'slow'"
schedule_individual = true
[groups.flaky]
retry_count = 5
filters = "-k test_flaky"
[report]
output_dir = "test-results"[offload]
max_parallel = 4
test_timeout_secs = 600
sandbox_repo_root = "/app"
[provider]
type = "modal"
dockerfile = ".devcontainer/Dockerfile"
include_cwd = true
[framework]
type = "nextest"
[groups.all]
retry_count = 1
[report]
output_dir = "test-results"[offload]
max_parallel = 40
test_timeout_secs = 60
sandbox_repo_root = "/code/mng"
sandbox_init_cmd = "git apply /offload-upload/patch --allow-empty && uv sync --all-packages"
[provider]
type = "default"
prepare_command = "uv run @modal_sandbox.py prepare --include-cwd libs/mng/imbue/mng/resources/Dockerfile"
create_command = "uv run @modal_sandbox.py create {image_id}"
exec_command = "uv run @modal_sandbox.py exec {sandbox_id} {command}"
destroy_command = "uv run @modal_sandbox.py destroy {sandbox_id}"
download_command = "uv run @modal_sandbox.py download {sandbox_id} {paths}"
timeout_secs = 600
[framework]
type = "pytest"
paths = ["libs/mng/tests"]
command = "uv run pytest"
[groups.all]
retry_count = 0
filters = "-m 'not acceptance and not release'"
[report]
output_dir = "test-results"
junit = true
junit_file = "junit.xml"This demonstrates using sandbox_init_cmd to run setup commands during image build. The sandbox_init_cmd applies a patch and syncs packages after the working directory is copied into the image, enabling the use of the native pytest framework instead of the default framework with inline setup commands.
Use post_patch_cmd for derived artifacts that must be regenerated when source changes — generated API clients, frontend bundles, or compiled assets. Unlike sandbox_init_cmd (which runs only during base image builds), post_patch_cmd runs after every thin-diff patch, ensuring derived artifacts stay in sync with patched source code.
[offload]
max_parallel = 40
test_timeout_secs = 60
sandbox_repo_root = "/code/myproject"
sandbox_init_cmd = "uv sync --all-packages"
post_patch_cmd = "make generate-client"Commands in configuration can reference bundled scripts using @filename.ext syntax. For example, uv run @modal_sandbox.py create {image_id} references the bundled modal_sandbox.py script. Scripts are extracted to a cache directory on first use.
Offload caches image IDs in git notes (refs/notes/offload-images). Notes are fetched from and pushed to the remote automatically. Pass --no-cache to offload run to skip cached image lookup and force a fresh build.
Configuration values support environment variable expansion:
${VAR}-- required; fails ifVARis not set${VAR:-default}-- usesdefaultifVARis not set
Offload can run its own test suite on Modal:
cargo run -- -c offload-pytest-default.toml runThis requires a valid Modal API key.
All Rights Reserved. See LICENSE for details.