diff --git a/docs/docs/architecture/keycloak-runtime.md b/docs/docs/architecture/keycloak-runtime.md index 52955d3..c7ac3be 100644 --- a/docs/docs/architecture/keycloak-runtime.md +++ b/docs/docs/architecture/keycloak-runtime.md @@ -1,6 +1,6 @@ --- title: Keycloak Runtime -description: Keycloak deployment, configuration, backups, recovery, and break-glass paths. +description: Keycloak deployment, first-boot configuration, recovery, and break-glass paths. --- # Keycloak Runtime @@ -15,10 +15,12 @@ Keycloak runs on one dedicated EC2 instance in the `lab` account. The runtime contract is: -- instance: `t4g.small`, Amazon Linux 2023 on ARM, in the `172.16.0.0/16` VPC -- runtime: Docker Compose +- instance: `t4g.small`, Flatcar Container Linux on ARM, in the + `172.16.0.0/16` VPC +- runtime: Ignition-managed systemd units that run Docker containers - services: upstream Keycloak plus upstream Postgres, both pinned in `infra` -- database: colocated Postgres with data on the instance EBS root volume +- database: colocated Postgres with data on a dedicated encrypted gp3 data + volume mounted at `/var/lib/keycloak` - access name: `id.glab.lol` - TLS: Traefik ACME DNS-01 through Route 53 using the instance IAM role - reverse proxy: Traefik on the host, terminating TLS and proxying to Keycloak @@ -41,42 +43,48 @@ must be tuned for the 2 GB memory budget: The instance IAM role grants only what the runtime needs: - Route 53 writes scoped to `_acme-challenge.id.glab.lol` -- S3 write access to the Keycloak backup bucket prefix -- SSM Parameter Store reads for bootstrap-time values such as reconciliation - credentials +- permission to invoke the GitHub token broker Lambda for short-lived + `GilmanLab/secrets` access +- KMS decrypt only for SOPS material with `Repo=GilmanLab/secrets` and + `Scope=keycloak` +- SSM managed-instance access for operator inspection and repair Cluster access, secret decryption, and tailnet access follow the AWS bootstrap and secrets contracts. Keycloak does not receive broad AWS administrative permissions. -## Realm And Federation +## Realm And Local Admin One realm named `lab` holds lab users and OIDC/SAML clients. -GitHub is the only upstream identity provider and is federated through OIDC. -There is no standing local username-password fallback for normal users. The -bootstrap admin user exists only during initial realm creation and is disabled -after `keycloak-config-cli` reconciles the realm from git. +The first implemented human admin path is local Keycloak authentication, not a +GitHub identity provider. The `lab` realm has one local admin account, +`admin@glab.lol`, with a generated password and a required WebAuthn enrollment +for YubiKey-backed touch authentication. The temporary master bootstrap admin +exists only for initial realm creation and should be disabled after the browser +login path is proven. -The first expected clients are: +Future clients are expected to include: - Kubernetes API OIDC for Talos clusters - Argo CD web UI and CLI - Grafana -The authoritative client list lives in the realm repository. +The authoritative client list is intentionally deferred until each integration +is implemented. ## Configuration As Code -The realm repository is the source of truth for Keycloak's declarative surface: +For the current slice, the Keycloak host imports the minimal `lab` realm from an +`infra/aws/keycloak` template during first boot. That declarative surface +includes: - realms -- clients -- client scopes - roles and role mappings -- identity-provider configuration - authentication flows and required actions - realm-level settings +- the local admin user shell, with its password supplied from SOPS at import + time Runtime state is intentionally out of git: @@ -87,10 +95,12 @@ Runtime state is intentionally out of git: - audit and event logs - ephemeral tokens and one-time codes -`keycloak-config-cli` reconciles from the realm repository on a short schedule -from the Keycloak host. It authenticates with a service account whose secret is -stored in SSM Parameter Store. Keycloak version upgrades are driven by bumping -the pinned runtime version in `infra` and reconciling forward. +`keycloak-config-cli` runs once after Keycloak is healthy. It fetches the local +admin dotenv payload from `GilmanLab/secrets` through the existing +broker-backed `labctl` path, imports the realm, and writes +`/var/lib/keycloak/config/lab-realm-imported` on success so it does not rerun on +reboot. Scheduled reconciliation, service clients, and GitHub OIDC are not part +of this slice. ## Backups @@ -124,10 +134,12 @@ the lab's bootstrap secrets, and the S3 bucket also uses SSE-KMS. The primary path is rebuild-first: 1. Provision a fresh EC2 instance from `infra`. -2. Start Keycloak and fresh Postgres through Docker Compose. -3. Run `keycloak-config-cli` against the new instance using the realm repo. -4. Sign in through GitHub. -5. Re-enroll WebAuthn or TOTP. +2. Let Flatcar mount the persistent Keycloak data volume and start the + systemd-managed Postgres, Keycloak, and Traefik containers. +3. Let the first-boot `keycloak-config-cli` unit import the minimal `lab` realm + if the import marker is absent. +4. Sign in as the local `lab` realm admin. +5. Enroll or re-enroll the YubiKey WebAuthn credential when prompted. Target RTO for the single-user lab is 15 minutes. This path requires AWS access and git; it does not require a backup store. @@ -138,9 +150,9 @@ The restore fallback is: 2. Pull a selected point-in-time backup from S3 or the NAS. 3. Restore the Postgres dump. 4. Place the TLS cert bundle and config files. -5. Start Docker Compose. -6. Let `keycloak-config-cli` reconcile the restored runtime forward to git - `HEAD`. +5. Start the Flatcar systemd units. +6. Let the first-boot import run only if the restored data does not already + include the import marker and realm state. Restores are for cases where preserving exact runtime state matters, such as federated identity linkages, event history, and active user state. @@ -170,6 +182,6 @@ authentication paths. | Vault | Unseal keys and root/recovery keys | Stored outside any Keycloak-dependent path. | | AWS | IAM Identity Center local user with hardware key | AWS does not federate to Keycloak. | | Grafana | Local admin account | Kept active alongside OIDC. | -| GitHub | Personal GitHub account with hardware-key MFA | GitHub is upstream of Keycloak. | +| GitHub | Personal GitHub account with hardware-key MFA | GitHub is not a Keycloak upstream IdP. The token broker remains a machine bootstrap path for secrets access. | These anchors live outside Keycloak-dependent storage. diff --git a/docs/docs/architecture/secrets-identity-pki.md b/docs/docs/architecture/secrets-identity-pki.md index c2c1213..99f41ed 100644 --- a/docs/docs/architecture/secrets-identity-pki.md +++ b/docs/docs/architecture/secrets-identity-pki.md @@ -324,12 +324,19 @@ Remaining implementation threads: Keycloak is the central human-facing identity system for lab services, but it is not a bootstrap dependency for the raw recovery path. -Keycloak runs on a dedicated EC2 instance in the `lab` account, colocated with -Postgres and managed with Docker Compose. It is reached at `id.glab.lol`. -GitHub is the upstream identity provider through OIDC. - -Keycloak configuration should be reconciled from git. Runtime state such as -sessions, user credentials, and TOTP enrollment is backed up separately. +Keycloak runs on a dedicated Flatcar EC2 instance in the `lab` account, with +Postgres colocated on a dedicated encrypted data volume. It is reached at +`id.glab.lol`. + +The first implemented human admin path is local Keycloak authentication: +username and password plus WebAuthn/YubiKey enrollment for the single `lab` +realm admin account. GitHub is not a human SSO dependency for Keycloak. The +GitHub token broker remains a machine bootstrap path for short-lived access to +encrypted secrets. + +Keycloak configuration starts with a first-boot import from `infra`. Runtime +state such as sessions, user credentials, and WebAuthn enrollment is backed up +separately. See [Keycloak Runtime](./keycloak-runtime.md) for the EC2 runtime shape, backup contract, rebuild/restore paths, hostname constraint, and break-glass