Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
f4cbceb
refactor(server): introduce Authenticator trait and Principal enum
TaylorMutch May 15, 2026
74e704c
feat(server): gateway-minted sandbox JWTs and IssueSandboxToken RPC
TaylorMutch May 15, 2026
68769e6
feat(server)!: per-sandbox JWT identity over Bearer (wire break)
TaylorMutch May 15, 2026
358dbfb
fix(sandbox): strip supervisor-only credentials from entrypoint env
TaylorMutch May 15, 2026
2185ad1
fix(server): per-handler sandbox_id equality check (closes #1354)
TaylorMutch May 15, 2026
21f7d6c
feat(server): RefreshSandboxToken RPC + sandbox refresh loop
TaylorMutch May 15, 2026
beb6f17
feat(server): make K8s ServiceAccount bootstrap token TTL configurable
TaylorMutch May 15, 2026
c597411
feat(server): JWKS-based K8s ServiceAccount token validation
TaylorMutch May 15, 2026
fb3ba08
fix(server): three sandbox-identity issues found during helm exercise
TaylorMutch May 15, 2026
669531f
feat(sandbox): openshell-sandbox debug-rpc subcommand for end-to-end …
TaylorMutch May 15, 2026
ae809f5
fix(helm): mount sandbox JWT keys without TLS
TaylorMutch May 15, 2026
8a9d538
test(e2e): configure sandbox JWT keys in harnesses
TaylorMutch May 15, 2026
56a28c1
refactor(auth): remove sandbox token revocation
TaylorMutch May 18, 2026
0c7eb38
test(server): fix rebased test fixtures
TaylorMutch May 18, 2026
8ed1427
docs(helm): update chart values reference
TaylorMutch May 18, 2026
e5c0260
chore(markdown): ignore local architecture plans
TaylorMutch May 19, 2026
8e88775
fix(server): restrict sandbox principal RPC access
TaylorMutch May 19, 2026
e910610
fix(server): validate k8s serviceaccount tokens with tokenreview
TaylorMutch May 19, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion .agents/skills/debug-openshell-cluster/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,9 +116,16 @@ Check required Helm deployment secrets:
kubectl -n openshell get secret \
openshell-server-tls \
openshell-server-client-ca \
openshell-client-tls
openshell-client-tls \
openshell-jwt-keys
```

If the gateway exits with `failed to read sandbox JWT signing key from
/etc/openshell-jwt/signing.pem`, verify that `openshell-jwt-keys` contains
`signing.pem`, `public.pem`, and `kid`, and that the StatefulSet mounts the
`sandbox-jwt` secret at `/etc/openshell-jwt`. The sandbox JWT mount is required
even when local Helm values disable TLS.

Check the image references currently used by the gateway deployment:

```bash
Expand Down
7 changes: 5 additions & 2 deletions .agents/skills/helm-dev-environment/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,9 @@ mise run helm:k3s:create
```

Creates a k3d cluster and merges its kubeconfig into the worktree-local `kubeconfig` file.
Also applies base manifests (`deploy/kube/manifests/agent-sandbox.yaml`). Traefik is
disabled at cluster creation time.
Also applies base manifests (`deploy/kube/manifests/agent-sandbox.yaml`) and preloads the
default community sandbox image into k3d so the first sandbox create does not wait on a
large registry pull. Traefik is disabled at cluster creation time.

**Multi-worktree support:** the cluster name is derived from the last component of the
current git branch (e.g. branch `kube-support/local-dev/tmutch` → cluster
Expand All @@ -43,6 +44,8 @@ Port mappings created at cluster time (cannot be changed without recreating):

Override with env vars before running `helm:k3s:create`:
- `HELM_K3S_LB_HOST_PORT` (default: `8080`)
- `HELM_K3S_PRELOAD_SANDBOX_IMAGE` (default:
`ghcr.io/nvidia/openshell-community/sandboxes/base:latest`; set to an empty value to skip)

### 2. Deploy OpenShell

Expand Down
1 change: 1 addition & 0 deletions .markdownlint-cli2.jsonc
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
".claude/**",
".opencode/**",
".github/**",
"architecture/plans/**",
"**/node_modules/**",
"target/**",
".pytest_cache/**",
Expand Down
3 changes: 3 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

18 changes: 18 additions & 0 deletions architecture/gateway.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,24 @@ Sandbox supervisor RPCs authenticate with either mTLS material or a sandbox
secret depending on the runtime and deployment mode. User-facing mutations are
authorized by role policy when OIDC or edge identity is enabled.

Sandbox secrets are gateway-signed JWTs bound to a single sandbox ID. Docker,
Podman, and VM drivers deliver the initial token through supervisor-only
runtime material; Kubernetes supervisors exchange a projected ServiceAccount
token through `IssueSandboxToken`. The gateway validates that projected token
with Kubernetes `TokenReview`, checks the returned pod binding against the live
pod UID, and reads the pod's sandbox annotation before minting the gateway JWT.
Supervisors renew gateway JWTs in memory before expiry. Older tokens are not
server-revoked; deployments bound replay exposure with short
`gateway_jwt.ttl_secs` lifetimes.

Sandbox JWTs are not user credentials. The gRPC router accepts
`Principal::Sandbox` only on the supervisor-to-gateway RPC allowlist
(`ConnectSupervisor`, `RelayStream`, token renewal, config sync, policy status,
log push, and policy-analysis callbacks). Handlers then compare the
authenticated sandbox ID with any sandbox ID or name resolved from the request.
Supervisor control and relay streams require a matching sandbox principal before
the gateway registers the session or bridges relay bytes.

## API Surface

The gateway API is organized around platform objects and operational streams:
Expand Down
1 change: 1 addition & 0 deletions crates/openshell-bootstrap/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ bytes = { workspace = true }
futures = { workspace = true }
miette = { workspace = true }
rcgen = { workspace = true }
sha2 = { workspace = true }
serde = { workspace = true }
serde_json = { workspace = true }
tar = "0.4"
Expand Down
112 changes: 112 additions & 0 deletions crates/openshell-bootstrap/src/jwt.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
// SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
// SPDX-License-Identifier: Apache-2.0

//! Gateway-minted JWT signing-key generation.
//!
//! The gateway mints per-sandbox identity tokens (see PR 2 of the
//! per-sandbox identity series, issue #1354) signed with an Ed25519
//! keypair generated once at gateway init and persisted alongside the
//! existing PKI bundle. The signing key never leaves the gateway; the
//! public key plus a stable `kid` are consumed by the gateway's own
//! validator and any future external verifiers.

use miette::{IntoDiagnostic, Result, WrapErr};
use rcgen::{KeyPair, PKCS_ED25519};
use sha2::{Digest, Sha256};

/// All PEM-encoded material needed to mint and validate sandbox JWTs.
///
/// The signing key stays in the gateway process. The public key is shared
/// across gateway replicas (so any replica can validate a JWT minted by
/// any other replica). The `kid` is published in every minted JWT's
/// header so the validator can pick the right key after a future rotation.
pub struct JwtKeyMaterial {
/// PKCS#8 PEM-encoded Ed25519 private key.
pub signing_key_pem: String,
/// `SubjectPublicKeyInfo` PEM-encoded Ed25519 public key.
pub public_key_pem: String,
/// Stable identifier derived from the public key (SHA-256 hex prefix).
/// Embedded in every minted JWT's `kid` header so future rotation can
/// be performed in-place by adding a second key without breaking
/// in-flight tokens.
pub kid: String,
}

/// Generate a fresh Ed25519 JWT signing key.
///
/// Output PEM is in the formats `jsonwebtoken` consumes via
/// `EncodingKey::from_ed_pem` (signing) and `DecodingKey::from_ed_pem`
/// (validation), so the gateway can round-trip its own tokens with no
/// further conversion.
pub fn generate_jwt_key() -> Result<JwtKeyMaterial> {
let keypair = KeyPair::generate_for(&PKCS_ED25519)
.into_diagnostic()
.wrap_err("failed to generate Ed25519 JWT signing key")?;
let signing_key_pem = keypair.serialize_pem();
let public_key_pem = keypair.public_key_pem();
let kid = kid_from_public_key_der(&keypair.public_key_der());
Ok(JwtKeyMaterial {
signing_key_pem,
public_key_pem,
kid,
})
}

/// Stable `kid` derived from the SHA-256 of the public-key DER.
///
/// First 16 bytes hex-encoded — collision-resistant for the small N of
/// signing keys a single deployment ever has, while staying short enough
/// to keep JWT headers compact.
fn kid_from_public_key_der(public_key_der: &[u8]) -> String {
let digest = Sha256::digest(public_key_der);
hex_encode_prefix(&digest, 16)
}

fn hex_encode_prefix(bytes: &[u8], n: usize) -> String {
use std::fmt::Write as _;
let mut out = String::with_capacity(n * 2);
for byte in bytes.iter().take(n) {
let _ = write!(out, "{byte:02x}");
}
out
}

#[cfg(test)]
mod tests {
use super::*;

#[test]
fn generate_jwt_key_produces_parseable_pem() {
let material = generate_jwt_key().expect("generate_jwt_key");
assert!(material.signing_key_pem.contains("BEGIN PRIVATE KEY"));
assert!(material.public_key_pem.contains("BEGIN PUBLIC KEY"));
assert_eq!(material.kid.len(), 32, "kid is 16 bytes hex-encoded");
assert!(material.kid.chars().all(|c| c.is_ascii_hexdigit()));
}

#[test]
fn kid_is_stable_for_identical_public_keys() {
// Same input -> same kid. Hash of a fixed byte string.
let kid_a = kid_from_public_key_der(b"abc");
let kid_b = kid_from_public_key_der(b"abc");
assert_eq!(kid_a, kid_b);
}

#[test]
fn kid_differs_for_different_public_keys() {
let kid_a = kid_from_public_key_der(b"first");
let kid_b = kid_from_public_key_der(b"second");
assert_ne!(kid_a, kid_b);
}

#[test]
fn generated_keys_are_unique() {
let a = generate_jwt_key().expect("generate_jwt_key");
let b = generate_jwt_key().expect("generate_jwt_key");
assert_ne!(
a.kid, b.kid,
"fresh keypairs must produce distinct public keys"
);
assert_ne!(a.signing_key_pem, b.signing_key_pem);
}
}
1 change: 1 addition & 0 deletions crates/openshell-bootstrap/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@

pub mod build;
pub mod edge_token;
pub mod jwt;
pub mod oidc_token;

mod metadata;
Expand Down
20 changes: 20 additions & 0 deletions crates/openshell-bootstrap/src/pki.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
// SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
// SPDX-License-Identifier: Apache-2.0

use crate::jwt::{JwtKeyMaterial, generate_jwt_key};
use miette::{IntoDiagnostic, Result, WrapErr};
use rcgen::{BasicConstraints, CertificateParams, DnType, Ia5String, IsCa, KeyPair, SanType};
use std::net::IpAddr;
Expand All @@ -15,6 +16,12 @@ pub struct PkiBundle {
pub server_key_pem: String,
pub client_cert_pem: String,
pub client_key_pem: String,
/// PKCS#8 PEM Ed25519 private key for minting per-sandbox JWTs.
pub jwt_signing_key_pem: String,
/// SPKI PEM Ed25519 public key, paired with `jwt_signing_key_pem`.
pub jwt_public_key_pem: String,
/// Stable identifier embedded in the `kid` header of every minted JWT.
pub jwt_key_id: String,
}

/// Default SANs always included on the server certificate. Covers the host
Expand Down Expand Up @@ -99,13 +106,23 @@ pub fn generate_pki(extra_sans: &[String]) -> Result<PkiBundle> {
.into_diagnostic()
.wrap_err("failed to sign client certificate")?;

// --- JWT signing key (Ed25519, used to mint per-sandbox identity tokens) ---
let JwtKeyMaterial {
signing_key_pem: jwt_signing_key_pem,
public_key_pem: jwt_public_key_pem,
kid: jwt_key_id,
} = generate_jwt_key().wrap_err("failed to generate JWT signing key")?;

Ok(PkiBundle {
ca_cert_pem: ca_cert.pem(),
ca_key_pem: ca_key.serialize_pem(),
server_cert_pem: server_cert.pem(),
server_key_pem: server_key.serialize_pem(),
client_cert_pem: client_cert.pem(),
client_key_pem: client_key.serialize_pem(),
jwt_signing_key_pem,
jwt_public_key_pem,
jwt_key_id,
})
}

Expand Down Expand Up @@ -148,6 +165,9 @@ mod tests {
assert!(bundle.server_key_pem.contains("BEGIN PRIVATE KEY"));
assert!(bundle.client_cert_pem.contains("BEGIN CERTIFICATE"));
assert!(bundle.client_key_pem.contains("BEGIN PRIVATE KEY"));
assert!(bundle.jwt_signing_key_pem.contains("BEGIN PRIVATE KEY"));
assert!(bundle.jwt_public_key_pem.contains("BEGIN PUBLIC KEY"));
assert_eq!(bundle.jwt_key_id.len(), 32, "kid is 16 bytes hex-encoded");
}

#[test]
Expand Down
5 changes: 5 additions & 0 deletions crates/openshell-cli/src/run.rs
Original file line number Diff line number Diff line change
Expand Up @@ -743,6 +743,11 @@ fn import_local_package_mtls_bundle(name: &str) -> Result<Option<PathBuf>> {
client_key_pem: std::fs::read_to_string(&key)
.into_diagnostic()
.wrap_err_with(|| format!("failed to read {}", key.display()))?,
// CLI never holds the gateway's JWT signing material — only the
// gateway needs it. Fill the JWT fields with placeholders.
jwt_signing_key_pem: String::new(),
jwt_public_key_pem: String::new(),
jwt_key_id: String::new(),
};
openshell_bootstrap::mtls::store_pki_bundle(name, &bundle)
.wrap_err_with(|| format!("failed to store mTLS bundle for gateway '{name}'"))?;
Expand Down
14 changes: 14 additions & 0 deletions crates/openshell-cli/tests/ensure_providers_integration.rs
Original file line number Diff line number Diff line change
Expand Up @@ -535,6 +535,20 @@ impl OpenShell for TestOpenShell {
Err(Status::unimplemented("not implemented in test"))
}

async fn issue_sandbox_token(
&self,
_request: tonic::Request<openshell_core::proto::IssueSandboxTokenRequest>,
) -> Result<Response<openshell_core::proto::IssueSandboxTokenResponse>, Status> {
Err(Status::unimplemented("not implemented in test"))
}

async fn refresh_sandbox_token(
&self,
_request: tonic::Request<openshell_core::proto::RefreshSandboxTokenRequest>,
) -> Result<Response<openshell_core::proto::RefreshSandboxTokenResponse>, Status> {
Err(Status::unimplemented("not implemented in test"))
}

async fn connect_supervisor(
&self,
_request: tonic::Request<tonic::Streaming<SupervisorMessage>>,
Expand Down
14 changes: 14 additions & 0 deletions crates/openshell-cli/tests/mtls_integration.rs
Original file line number Diff line number Diff line change
Expand Up @@ -424,6 +424,20 @@ impl OpenShell for TestOpenShell {
Err(Status::unimplemented("not implemented in test"))
}

async fn issue_sandbox_token(
&self,
_request: tonic::Request<openshell_core::proto::IssueSandboxTokenRequest>,
) -> Result<Response<openshell_core::proto::IssueSandboxTokenResponse>, Status> {
Err(Status::unimplemented("not implemented in test"))
}

async fn refresh_sandbox_token(
&self,
_request: tonic::Request<openshell_core::proto::RefreshSandboxTokenRequest>,
) -> Result<Response<openshell_core::proto::RefreshSandboxTokenResponse>, Status> {
Err(Status::unimplemented("not implemented in test"))
}

async fn connect_supervisor(
&self,
_request: tonic::Request<tonic::Streaming<openshell_core::proto::SupervisorMessage>>,
Expand Down
14 changes: 14 additions & 0 deletions crates/openshell-cli/tests/provider_commands_integration.rs
Original file line number Diff line number Diff line change
Expand Up @@ -800,6 +800,20 @@ impl OpenShell for TestOpenShell {
Err(Status::unimplemented("not implemented in test"))
}

async fn issue_sandbox_token(
&self,
_request: tonic::Request<openshell_core::proto::IssueSandboxTokenRequest>,
) -> Result<Response<openshell_core::proto::IssueSandboxTokenResponse>, Status> {
Err(Status::unimplemented("not implemented in test"))
}

async fn refresh_sandbox_token(
&self,
_request: tonic::Request<openshell_core::proto::RefreshSandboxTokenRequest>,
) -> Result<Response<openshell_core::proto::RefreshSandboxTokenResponse>, Status> {
Err(Status::unimplemented("not implemented in test"))
}

async fn connect_supervisor(
&self,
_request: tonic::Request<tonic::Streaming<SupervisorMessage>>,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -604,6 +604,20 @@ impl OpenShell for TestOpenShell {
Err(Status::unimplemented("not implemented in test"))
}

async fn issue_sandbox_token(
&self,
_request: tonic::Request<openshell_core::proto::IssueSandboxTokenRequest>,
) -> Result<Response<openshell_core::proto::IssueSandboxTokenResponse>, Status> {
Err(Status::unimplemented("not implemented in test"))
}

async fn refresh_sandbox_token(
&self,
_request: tonic::Request<openshell_core::proto::RefreshSandboxTokenRequest>,
) -> Result<Response<openshell_core::proto::RefreshSandboxTokenResponse>, Status> {
Err(Status::unimplemented("not implemented in test"))
}

async fn connect_supervisor(
&self,
_request: tonic::Request<tonic::Streaming<SupervisorMessage>>,
Expand Down
14 changes: 14 additions & 0 deletions crates/openshell-cli/tests/sandbox_name_fallback_integration.rs
Original file line number Diff line number Diff line change
Expand Up @@ -437,6 +437,20 @@ impl OpenShell for TestOpenShell {
Err(Status::unimplemented("not implemented in test"))
}

async fn issue_sandbox_token(
&self,
_request: tonic::Request<openshell_core::proto::IssueSandboxTokenRequest>,
) -> Result<Response<openshell_core::proto::IssueSandboxTokenResponse>, Status> {
Err(Status::unimplemented("not implemented in test"))
}

async fn refresh_sandbox_token(
&self,
_request: tonic::Request<openshell_core::proto::RefreshSandboxTokenRequest>,
) -> Result<Response<openshell_core::proto::RefreshSandboxTokenResponse>, Status> {
Err(Status::unimplemented("not implemented in test"))
}

async fn connect_supervisor(
&self,
_request: tonic::Request<tonic::Streaming<SupervisorMessage>>,
Expand Down
Loading
Loading