Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 42 additions & 30 deletions docs/docs/architecture/keycloak-runtime.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Keycloak Runtime
description: Keycloak deployment, configuration, backups, recovery, and break-glass paths.
description: Keycloak deployment, first-boot configuration, recovery, and break-glass paths.
---

# Keycloak Runtime
Expand All @@ -15,10 +15,12 @@ Keycloak runs on one dedicated EC2 instance in the `lab` account.

The runtime contract is:

- instance: `t4g.small`, Amazon Linux 2023 on ARM, in the `172.16.0.0/16` VPC
- runtime: Docker Compose
- instance: `t4g.small`, Flatcar Container Linux on ARM, in the
`172.16.0.0/16` VPC
- runtime: Ignition-managed systemd units that run Docker containers
- services: upstream Keycloak plus upstream Postgres, both pinned in `infra`
- database: colocated Postgres with data on the instance EBS root volume
- database: colocated Postgres with data on a dedicated encrypted gp3 data
volume mounted at `/var/lib/keycloak`
- access name: `id.glab.lol`
- TLS: Traefik ACME DNS-01 through Route 53 using the instance IAM role
- reverse proxy: Traefik on the host, terminating TLS and proxying to Keycloak
Expand All @@ -41,42 +43,48 @@ must be tuned for the 2 GB memory budget:
The instance IAM role grants only what the runtime needs:

- Route 53 writes scoped to `_acme-challenge.id.glab.lol`
- S3 write access to the Keycloak backup bucket prefix
- SSM Parameter Store reads for bootstrap-time values such as reconciliation
credentials
- permission to invoke the GitHub token broker Lambda for short-lived
`GilmanLab/secrets` access
- KMS decrypt only for SOPS material with `Repo=GilmanLab/secrets` and
`Scope=keycloak`
- SSM managed-instance access for operator inspection and repair

Cluster access, secret decryption, and tailnet access follow the AWS bootstrap
and secrets contracts. Keycloak does not receive broad AWS administrative
permissions.

## Realm And Federation
## Realm And Local Admin

One realm named `lab` holds lab users and OIDC/SAML clients.

GitHub is the only upstream identity provider and is federated through OIDC.
There is no standing local username-password fallback for normal users. The
bootstrap admin user exists only during initial realm creation and is disabled
after `keycloak-config-cli` reconciles the realm from git.
The first implemented human admin path is local Keycloak authentication, not a
GitHub identity provider. The `lab` realm has one local admin account,
`admin@glab.lol`, with a generated password and a required WebAuthn enrollment
for YubiKey-backed touch authentication. The temporary master bootstrap admin
exists only for initial realm creation and should be disabled after the browser
login path is proven.

The first expected clients are:
Future clients are expected to include:

- Kubernetes API OIDC for Talos clusters
- Argo CD web UI and CLI
- Grafana

The authoritative client list lives in the realm repository.
The authoritative client list is intentionally deferred until each integration
is implemented.

## Configuration As Code

The realm repository is the source of truth for Keycloak's declarative surface:
For the current slice, the Keycloak host imports the minimal `lab` realm from an
`infra/aws/keycloak` template during first boot. That declarative surface
includes:

- realms
- clients
- client scopes
- roles and role mappings
- identity-provider configuration
- authentication flows and required actions
- realm-level settings
- the local admin user shell, with its password supplied from SOPS at import
time

Runtime state is intentionally out of git:

Expand All @@ -87,10 +95,12 @@ Runtime state is intentionally out of git:
- audit and event logs
- ephemeral tokens and one-time codes

`keycloak-config-cli` reconciles from the realm repository on a short schedule
from the Keycloak host. It authenticates with a service account whose secret is
stored in SSM Parameter Store. Keycloak version upgrades are driven by bumping
the pinned runtime version in `infra` and reconciling forward.
`keycloak-config-cli` runs once after Keycloak is healthy. It fetches the local
admin dotenv payload from `GilmanLab/secrets` through the existing
broker-backed `labctl` path, imports the realm, and writes
`/var/lib/keycloak/config/lab-realm-imported` on success so it does not rerun on
reboot. Scheduled reconciliation, service clients, and GitHub OIDC are not part
of this slice.

## Backups

Expand Down Expand Up @@ -124,10 +134,12 @@ the lab's bootstrap secrets, and the S3 bucket also uses SSE-KMS.
The primary path is rebuild-first:

1. Provision a fresh EC2 instance from `infra`.
2. Start Keycloak and fresh Postgres through Docker Compose.
3. Run `keycloak-config-cli` against the new instance using the realm repo.
4. Sign in through GitHub.
5. Re-enroll WebAuthn or TOTP.
2. Let Flatcar mount the persistent Keycloak data volume and start the
systemd-managed Postgres, Keycloak, and Traefik containers.
3. Let the first-boot `keycloak-config-cli` unit import the minimal `lab` realm
if the import marker is absent.
4. Sign in as the local `lab` realm admin.
5. Enroll or re-enroll the YubiKey WebAuthn credential when prompted.

Target RTO for the single-user lab is 15 minutes. This path requires AWS access
and git; it does not require a backup store.
Expand All @@ -138,9 +150,9 @@ The restore fallback is:
2. Pull a selected point-in-time backup from S3 or the NAS.
3. Restore the Postgres dump.
4. Place the TLS cert bundle and config files.
5. Start Docker Compose.
6. Let `keycloak-config-cli` reconcile the restored runtime forward to git
`HEAD`.
5. Start the Flatcar systemd units.
6. Let the first-boot import run only if the restored data does not already
include the import marker and realm state.

Restores are for cases where preserving exact runtime state matters, such as
federated identity linkages, event history, and active user state.
Expand Down Expand Up @@ -170,6 +182,6 @@ authentication paths.
| Vault | Unseal keys and root/recovery keys | Stored outside any Keycloak-dependent path. |
| AWS | IAM Identity Center local user with hardware key | AWS does not federate to Keycloak. |
| Grafana | Local admin account | Kept active alongside OIDC. |
| GitHub | Personal GitHub account with hardware-key MFA | GitHub is upstream of Keycloak. |
| GitHub | Personal GitHub account with hardware-key MFA | GitHub is not a Keycloak upstream IdP. The token broker remains a machine bootstrap path for secrets access. |

These anchors live outside Keycloak-dependent storage.
19 changes: 13 additions & 6 deletions docs/docs/architecture/secrets-identity-pki.md
Original file line number Diff line number Diff line change
Expand Up @@ -324,12 +324,19 @@ Remaining implementation threads:
Keycloak is the central human-facing identity system for lab services, but it
is not a bootstrap dependency for the raw recovery path.

Keycloak runs on a dedicated EC2 instance in the `lab` account, colocated with
Postgres and managed with Docker Compose. It is reached at `id.glab.lol`.
GitHub is the upstream identity provider through OIDC.

Keycloak configuration should be reconciled from git. Runtime state such as
sessions, user credentials, and TOTP enrollment is backed up separately.
Keycloak runs on a dedicated Flatcar EC2 instance in the `lab` account, with
Postgres colocated on a dedicated encrypted data volume. It is reached at
`id.glab.lol`.

The first implemented human admin path is local Keycloak authentication:
username and password plus WebAuthn/YubiKey enrollment for the single `lab`
realm admin account. GitHub is not a human SSO dependency for Keycloak. The
GitHub token broker remains a machine bootstrap path for short-lived access to
encrypted secrets.

Keycloak configuration starts with a first-boot import from `infra`. Runtime
state such as sessions, user credentials, and WebAuthn enrollment is backed up
separately.

See [Keycloak Runtime](./keycloak-runtime.md) for the EC2 runtime shape,
backup contract, rebuild/restore paths, hostname constraint, and break-glass
Expand Down