Skip to content

Domains#1

Closed
siddhsuresh wants to merge 31 commits into
mainfrom
domains
Closed

Domains#1
siddhsuresh wants to merge 31 commits into
mainfrom
domains

Conversation

@siddhsuresh
Copy link
Copy Markdown

@siddhsuresh siddhsuresh commented May 17, 2026

Greptile Summary

This PR introduces opt-in Ravion-managed domain/cert provisioning for ECS clusters and services, adds a new ravion_domain Terraform resource type backed by a custom provider, and refactors load_balancer_attachment from a typed Terraform variable to a JSON-decoded string to support control-plane ARN templating.

  • ecs_cluster: new ravion_domains.tf allocates a wildcard ravion_domain for the cluster, binds the cert to the public ALB HTTPS listener as an SNI extra, and exposes domain outputs; module.yml adds UI inputs, dashboard metrics, and output declarations.
  • ecs_service: new ravion_domains.tf implements Mode A (auto-FQDN under the cluster wildcard, with a two-phase cutover via ravion_auto_domain_status) and Mode B (per-service cert covering customer FQDNs); load_balancer_attachment variable replaced by load_balancer_attachment_json string with local decoding; module.yml added from scratch.
  • Broad try(..., true) change: every variable validation in autoscaling, ecs_cluster, ecs_service, alb, nlb, and vpc is wrapped to suppress throws from complex-object attribute access.

Confidence Score: 2/5

Multiple blocking gaps exist between the module.yml control-plane definitions and the actual Terraform code — mismatched output names, mismatched input variable names, and conflicting provider version pins — that will cause apply failures or silent no-ops on every stack that uses the new Ravion domain wiring.

The module.yml files declare Ravion output names (ravion_managed_domains_enabled, ravion_public_alb_default_*) that simply do not exist in outputs.tf, and input names (ravion_parent_app_domain_id, ravion_auto_domain_listener_arn, etc.) that don't match the TF variables the resources actually read. Any downstream module referencing these outputs will receive nothing, and any upstream value the control plane passes using the module.yml keys will be silently ignored. Separately, the Ravion provider is pinned at = 0.4.3 in ecs_cluster and = 0.4.5 in ecs_service, which Terraform cannot satisfy in a shared workspace. Finally, when certificate_arns = [] (the new default set in module.yml), the ALB HTTPS listener is created with certificate_arn = null, which AWS rejects. Together these issues mean the primary new feature cannot be exercised in production without code fixes.

compute/ecs_cluster/module.yml and compute/ecs_service/module.yml (output/input name mismatches), compute/ecs_cluster/versions.tf and compute/ecs_service/versions.tf (conflicting provider version pins), networking/alb/locals.tf (null certificate_arn on HTTPS listener)

Important Files Changed

Filename Overview
compute/ecs_cluster/versions.tf Ravion provider pinned to = 0.4.3, conflicting with ecs_service pin of = 0.4.5; breaks any workspace using both modules
compute/ecs_service/versions.tf Ravion provider pinned to = 0.4.5, conflicting with ecs_cluster pin of = 0.4.3
compute/ecs_cluster/module.yml Declared Ravion output names don't match actual TF outputs; also declares private ALB Ravion outputs that have no corresponding TF resources
compute/ecs_service/module.yml New file; UI input names (ravion_parent_app_domain_id, ravion_auto_domain_listener_arn, etc.) don't match TF variables; output names also diverge from outputs.tf
networking/alb/locals.tf https_default_cert_arn can be null when certificate_arns is empty (now the default), causing HTTPS listener creation to fail at the AWS API level
networking/alb/listeners.tf certificate_arn now references local.https_default_cert_arn (null-safe); SNI extras guard improved with length check; depends on ALB locals fix
compute/ecs_cluster/ravion_domains.tf New file: allocates ravion_domain.cluster (wildcard cert) and binds it as SNI extra on public ALB HTTPS listener; preconditions guard required inputs
compute/ecs_service/ravion_domains.tf New file implementing Mode A (auto-FQDN under cluster wildcard) and Mode B (custom cert) with a two-phase cutover mechanism via ravion_auto_domain_status data source
compute/ecs_service/locals.tf Switches load_balancer_attachment from typed variable to JSON-decoded local; applies defaults post-decode; enable_auto_scaling guard simplified with try()
compute/ecs_service/variables.tf load_balancer_attachment replaced with load_balancer_attachment_json string; wait_for_steady_state default changed from true to false; Ravion variables added
compute/autoscaling/variables.tf All variable validations wrapped in try(..., true), silently swallowing errors for complex object inputs
compute/ecs_cluster/outputs.tf Adds seven Ravion domain outputs (cluster domain id/fqdn/url, cert arn/status, aws account/region); names diverge from module.yml declarations

Sequence Diagram

sequenceDiagram
    participant CP as Control Plane
    participant Cluster as ecs_cluster module
    participant RavionAPI as Ravion Provider
    participant AWS as AWS (ACM + ALB)
    participant Service as ecs_service module

    CP->>Cluster: "use_ravion_managed_domains=true, ravion_aws_account_id"
    Cluster->>RavionAPI: "ravion_domain.cluster (wildcard cert, target=public ALB)"
    RavionAPI-->>AWS: "Issue ACM cert (*.cluster-fqdn + cluster-fqdn)"
    RavionAPI-->>Cluster: cert_arn, fqdn, id
    Cluster->>AWS: aws_lb_listener_certificate (SNI extra on HTTPS listener)
    Cluster-->>CP: ravion_cluster_domain_id, ravion_cluster_cert_arn, ...

    CP->>Service: cluster_parent_domain_id, cluster_https_listener_arn
    Service->>RavionAPI: data.ravion_auto_domain_status (check retirement)
    RavionAPI-->>Service: "retired=false"
    Service->>RavionAPI: ravion_domain.auto (child under cluster domain, Mode A)
    RavionAPI-->>Service: auto fqdn (svc-hash.cluster-fqdn)
    Service->>AWS: aws_lb_listener_rule.ravion (host_header to target group)

    Note over Service,AWS: Mode B (custom domains)
    CP->>Service: "domains=[api.example.com]"
    Service->>RavionAPI: ravion_domain.custom (cert covering customer FQDNs)
    RavionAPI-->>AWS: Attach cert to ALB listener via resource_arn
    Note over RavionAPI,AWS: Customer adds DNS CNAME records
    RavionAPI-->>Service: "retired=true (after DNS resolves)"
    Service->>RavionAPI: ravion_domain.auto count to 0 (destroy)
Loading

Comments Outside Diff (1)

  1. compute/autoscaling/variables.tf, line 9-10 (link)

    P2 try(..., true) silences all validation errors

    Wrapping every validation condition in try(..., true) means any expression that throws (e.g., a type mismatch or an unexpected attribute access on an incorrectly-shaped value) silently passes as valid instead of surfacing an error message. For the simple scalar checks like var.desired_capacity == null || var.desired_capacity >= 0, the null-guard short-circuit already prevented the throw without try(), so the wrapper is no-op but harmless there. The concern is the object-typed variables (mixed_instances_policy, instance_refresh, warm_pool, scaling_policies) where a malformed input object that causes an attribute-access error will now be accepted with no user feedback. This same pattern is applied throughout compute/ecs_cluster/variables.tf, compute/ecs_service/variables.tf, networking/alb/variables.tf, and networking/nlb/variables.tf.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: compute/autoscaling/variables.tf
    Line: 9-10
    
    Comment:
    **`try(..., true)` silences all validation errors**
    
    Wrapping every validation condition in `try(..., true)` means any expression that throws (e.g., a type mismatch or an unexpected attribute access on an incorrectly-shaped value) silently passes as valid instead of surfacing an error message. For the simple scalar checks like `var.desired_capacity == null || var.desired_capacity >= 0`, the null-guard short-circuit already prevented the throw without `try()`, so the wrapper is no-op but harmless there. The concern is the object-typed variables (`mixed_instances_policy`, `instance_refresh`, `warm_pool`, `scaling_policies`) where a malformed input object that causes an attribute-access error will now be accepted with no user feedback. This same pattern is applied throughout `compute/ecs_cluster/variables.tf`, `compute/ecs_service/variables.tf`, `networking/alb/variables.tf`, and `networking/nlb/variables.tf`.
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
Fix the following 5 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 5
compute/ecs_cluster/versions.tf:17-20
**Conflicting provider version pins across modules**

`ecs_cluster` pins the Ravion provider at `= 0.4.3` while `ecs_service/versions.tf` pins it at `= 0.4.5`. Terraform/OpenTofu resolves a single provider version per workspace. Any stack that instantiates both modules (the typical pattern shown in the README) will immediately fail with `Error: Failed to query available provider packages — conflicting provider version constraints`. Both modules must use the same pinned version (or a shared `>= x.y` range).

### Issue 2 of 5
compute/ecs_cluster/module.yml:622-629
**module.yml output declarations don't match actual Terraform outputs**

The seven outputs declared here (`ravion_managed_domains_enabled`, `ravion_public_alb_default_url`, `ravion_public_alb_default_fqdn`, `ravion_public_alb_default_cert_arn`, `ravion_private_alb_default_url`, `ravion_private_alb_default_fqdn`, `ravion_private_alb_default_cert_arn`) do not exist in `compute/ecs_cluster/outputs.tf`. The actual TF outputs are `ravion_cluster_domain_id`, `ravion_cluster_domain_fqdn`, `ravion_cluster_domain_url`, `ravion_cluster_cert_arn`, `ravion_cluster_cert_status`, `ravion_aws_account_id`, and `ravion_aws_region`. Any control-plane reference to these declared-but-absent outputs will resolve to nothing, silently breaking downstream wiring. There are also no private-ALB Ravion domain resources in the TF code, so the three `ravion_private_alb_*` entries can never be populated.

### Issue 3 of 5
compute/ecs_service/module.yml:101-127
**module.yml input names don't match Terraform variable names**

Several UI inputs defined here map to non-existent TF variables:

- `ravion_listener_arn` → no matching TF variable (the TF variable is `cluster_https_listener_arn`, which serves both Mode A and Mode B).
- `ravion_parent_app_domain_id` → TF variable is `cluster_parent_domain_id`.
- `ravion_auto_domain_listener_arn` → TF variable is `cluster_https_listener_arn`.
- `ravion_auto_domain_alb_dns_name` → TF variable is `cluster_alb_dns_name`.
- `ravion_auto_domain_alb_zone_id` → TF variable is `cluster_alb_zone_id`.

When the control plane passes these UI input values to Terraform, the misnamed keys will be silently ignored, leaving the actual TF variables unset (defaulting to `null`) and breaking the Ravion domain wiring at plan/apply time. The output section has the same issue — `ravion_domains`, `ravion_cert_id`, `ravion_cert_arn`, `ravion_cert_status`, `ravion_auto_domain_enabled`, `ravion_auto_domain_fqdn`, `ravion_auto_domain_url` are declared but the actual TF outputs are `ravion_domain_id`, `ravion_domain_fqdn`, `ravion_domain_url`, `ravion_custom_domain_id`, and `ravion_custom_domain_cert_arn`.

### Issue 4 of 5
networking/alb/locals.tf:32
**HTTPS listener created with `null` certificate_arn when `certificate_arns = []`**

`https_default_cert_arn` is `null` when `certificate_arns` is empty. `compute/ecs_cluster/module.yml` now sets `default: []` for `public_alb_certificate_arns`, so a user who sets `use_ravion_managed_domains = true` and leaves `public_alb_certificate_arns` at its default will trigger this path. The `aws_lb_listener.https` resource then passes `certificate_arn = null` to AWS, which requires a certificate for HTTPS listeners and will reject the call with a validation error. The Ravion wildcard cert is wired only as an SNI extra via `aws_lb_listener_certificate` (after the listener exists), so it cannot fill the listener's mandatory default-cert slot. The HTTPS listener needs at least one cert at creation time.

### Issue 5 of 5
compute/autoscaling/variables.tf:9-10
**`try(..., true)` silences all validation errors**

Wrapping every validation condition in `try(..., true)` means any expression that throws (e.g., a type mismatch or an unexpected attribute access on an incorrectly-shaped value) silently passes as valid instead of surfacing an error message. For the simple scalar checks like `var.desired_capacity == null || var.desired_capacity >= 0`, the null-guard short-circuit already prevented the throw without `try()`, so the wrapper is no-op but harmless there. The concern is the object-typed variables (`mixed_instances_policy`, `instance_refresh`, `warm_pool`, `scaling_policies`) where a malformed input object that causes an attribute-access error will now be accepted with no user feedback. This same pattern is applied throughout `compute/ecs_cluster/variables.tf`, `compute/ecs_service/variables.tf`, `networking/alb/variables.tf`, and `networking/nlb/variables.tf`.

Reviews (1): Last reviewed commit: "ecs_service: bring auto-FQDN back when u..." | Re-trigger Greptile

Greptile also left 4 inline comments on this PR.

siddhsuresh and others added 16 commits May 16, 2026 13:10
…ster + compute/ecs_service

networking/alb
- New use_ravion_managed_domains toggle (default false, non-breaking).
- When on, declares one domains_alb_attachment that allocates an FQDN, issues
  the cluster wildcard ACM cert, writes the A-ALIAS, and exposes default_cert_arn.
- HTTPS listener is force-created and its default cert comes from the
  attachment; var.certificate_arns[0] is ignored, additional ARNs still
  attach as SNI. New outputs: ravion_default_url / _fqdn / _cert_arn / _alb_attachment_id.

compute/ecs_cluster
- Single use_ravion_managed_domains toggle that forwards down to both the
  public and private alb child modules (distinct slots so they coexist).
- Forces HTTPS + HTTP->HTTPS redirect when on. Per-ALB cert ARN inputs are
  ignored at index 0 when on (additional ARNs still attach as SNI).
- module.yml gets a new "Ravion-Managed Domains" section; cert-ARN inputs are
  hidden when the toggle is on. New outputs cover both ALBs.

compute/ecs_service
- New domains list(string) input. When non-empty, declares one
  domains_module_certificate that hands the FQDN list + cluster's HTTPS
  listener ARN to Ravion's api. The api requests the ACM cert, persists
  validation CNAMEs (visible in Domains tab), polls until ISSUED, and
  attaches as SNI out-of-band — terraform apply never blocks on customer DNS.
  On destroy the api detaches + deletes the cert.
- New outputs surface cert id / arn / status for the UI.

The Ravion provider source (ravion.com/ravion/domains) only resolves when
the runner's terraform mirror is configured (Ravion pipeline runners inject
this via ~/.terraformrc). Required-provider declarations are added so any
caller using these modules without Ravion-managed mode still works against a
local mirror or skipping the providers in tooling.
When the cluster module has Ravion-managed domains on, it exposes
`ravion_public_alb_default_app_domain_id` (the app_domain_id from the
cluster's alb_attachment). A service module wires that as
`ravion_parent_app_domain_id`, plus the cluster ALB's HTTPS listener ARN
and DNS name. The new resources in compute/ecs_service/ravion_domains.tf
then activate iff:

  - var.domains is empty (no custom-domain takeover yet), AND
  - ravion_parent_app_domain_id is non-empty

and install:

  1. domains_app_domain.auto with parent_id = cluster's app_domain.
     Allocates `<svc>-<hash>.<cluster-fqdn>` server-side (allocator now
     supports a ParentApex). The cluster wildcard cert covers it — zero
     per-service ACM work, zero per-service validation records.
  2. domains_dns_record.auto_alias writes the A-ALIAS in our Route53.
  3. aws_lb_listener_rule.ravion_auto_domain installs a host_header rule
     on the cluster HTTPS listener, forwarding to the service's TG.

When the user adds a custom domain, count flips to 0 → terraform destroys
all three → server-side cascade removes the ManagedDomain row, the
Route53 record, and AWS deletes the listener rule. The user's
domains_module_certificate takes over.

networking/alb gains a `ravion_default_app_domain_id` output exposing
the alb_attachment's app_domain_id (was already in the underlying
provider resource, just not surfaced).

ecs_cluster forwards the public + private flavors.
- networking/vpc/module.yml + compute/ecs_cluster/module.yml: add readme,
  ui.metrics (CloudWatch via Ravion AWS account), and ui.links.
- compute/ecs_service/module.yml: new file mirroring the input schema
  (incl. the new ravion-managed domain inputs) with metrics + readme.
- networking/vpc/variables.tf: wrap validation conditions in try() so a
  null default doesn't blow up the iteration/length checks. Terraform 1.10
  does not short-circuit `||` over `for`/`length` in variable validation.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Same TF 1.10 quirk as the variables.tf fix in the previous commit: `||`
does not short-circuit length() over a null value inside a precondition.
Wrap each length-check in try() so the null default for *_subnet_cidrs
passes the precondition without erroring.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Same TF 1.10 quirk hitting child modules pulled in by ecs_cluster:
- compute/autoscaling/variables.tf: instance_maintenance_policy attribute
  access fails when the policy itself is null. Wrap min/max condition in
  try() so the null default doesn't blow up the validation.
- networking/nlb/variables.tf: contains() barfs on null. Wrap both
  dns_record_client_routing_policy and enforce_security_group_inbound_rules
  conditions in try().

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Bulk-applied transform via /tmp/wrap_null_conditions.py over all the
modules ecs_cluster + ecs_service pull in. TF 1.10's variable validation
does not short-circuit `||` when the right side touches a null value, so
the existing `var.X == null || EXPR(var.X)` guards blow up at plan time.

try(..., true) preserves the original semantics for non-null inputs (the
expression evaluates and returns its bool result) and silently passes
when the expression errors on null — same intent the `== null ||` guards
had originally.

Touches: autoscaling, ecs_cluster, ecs_service, alb, nlb, vpc.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
… is on

When use_ravion_managed_domains = true, the listener default cert is
supplied via the domains_alb_attachment resource (Ravion provisions and
attaches the cluster wildcard cert). var.certificate_arns is intentionally
empty in that case — the precondition was rejecting the valid config.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…s is on

Without this gate the per-service auto-domain modules can't reference
public_alb_https_listener_arn even though the HTTPS listener is created
by the Ravion-managed-domains code path on the cluster ALB.
The control plane stores load_balancer_attachment_json as a string in the
module input so it can template-substitute listener/ALB ARNs from
upstream cluster outputs. The Terraform variable now matches that shape
(string) and the module decodes it once in locals.tf, merging the
decoded value with the optional() defaults the typed variable used to
provide. All downstream resources read from local.load_balancer_attachment
unchanged.
TF 1.10 doesn't short-circuit `var.X != null && var.X.attr` when X is
null — it still evaluates the right side and errors on attribute access.
Same pattern fix as the validation cleanup commits.
Server-side DNS record validation rejects ALIAS values whose zone_id is
empty. The ecs_service module was hardcoding zone_id="" — now takes
ravion_auto_domain_alb_zone_id (mapped from the cluster's
public_alb_zone_id output) and writes it into the ALIAS value.
Locked to ravion.com/ravion/domains 0.1.99, a local-dev throwaway version
whose checksum no longer matches the published binary. TF regenerates a
fresh lock on the next init under the ~> 0.1 constraint.
Initial TF apply runs against the placeholder hello-world task definition
(no real app image yet) which never stabilises and trips ECS's deployment
circuit breaker. The deploy workflow that pushes the actual image is the
one that should wait for steady state, not the infra apply.
Replaces ravion_app_domain + dns_record + module_certificate + the
auto/custom split with a single `ravion_domain` resource that nests
under a cluster-level parent (customer's top-level TF). The parent owns
the wildcard cert; the service rides it via SNI — no per-service ACM.

Drops `use_ravion_managed_domains` and all related Ravion plumbing from
ecs_cluster and networking/alb: the cluster module is now pure-AWS, and
the customer wires `ravion_domain` themselves at the top level.

Provider pinned to ~> 0.3.

Pin ravion provider to 0.1.99 for local-mirror testing

The local mirror at port 8095 has 0.1.99 republished with the latest
provider code (Mode A/B, certificate.domains). Higher versions exist in
the mirror but aren't in sync with the runner's lock file. Exact-pin
lets the runner pick the binary we just published without touching
.terraform.lock.hcl.

Bump ravion provider constraint to ~> 0.4

Domain handler rewrite (Mode A/B split, `domains` field replaces
`custom_domains`) ships in 0.4.0.

Per-service custom domains: Mode A vs Mode B

Custom domains move OUT of the cluster module and onto each service:
add `domains = ["api.example.com"]` to `ecs_service` to opt into Mode B.

- ecs_cluster: drop ravion_custom_domains var (cluster cert covers only
  the wildcard pair); add ravion_aws_account_id + ravion_aws_region as
  passthrough outputs so services can issue their own certs in the same
  account.
- ecs_service: new `domains` input (max 10). New cluster_alb_dns_name +
  cluster_alb_zone_id + ravion_aws_account_id + ravion_aws_region
  inputs (Mode B only — pipe from cluster outputs). ravion_domain.this
  switches between Mode A (rides wildcard) and Mode B (own cert
  covering only customer FQDNs) based on whether `domains` is set.
- ecs_service: aws_lb_listener_certificate.ravion added (Mode B only,
  attaches the per-service cert as SNI extra). aws_lb_listener_rule
  host_header values switch between auto-FQDN (Mode A) and custom
  domains (Mode B).

Move cluster ravion_domain into ecs_cluster module

The cluster-level `ravion_domain` (wildcard cert + listener binding) now
lives inside `compute/ecs_cluster` rather than being hand-rolled in the
customer's top-level TF. Opt in with `use_ravion_managed_domains = true`
and pipe `module.cluster.ravion_cluster_domain_id` +
`module.cluster.public_alb_https_listener_arn` into each `ecs_service`.

- ecs_cluster: new ravion_domains.tf allocates cluster ravion_domain
  (wildcard) and binds the cert as SNI extra on the public ALB HTTPS
  listener (does NOT replace the listener's default cert)
- ecs_cluster: new vars `use_ravion_managed_domains`,
  `ravion_cluster_name`, `ravion_aws_account_id`, `ravion_aws_region`,
  `ravion_custom_domains`
- ecs_cluster: new outputs `ravion_cluster_domain_id`,
  `ravion_cluster_domain_fqdn`, `ravion_cluster_domain_url`,
  `ravion_cluster_cert_arn`, `ravion_cluster_cert_status`
- ecs_cluster: require ravion provider ~> 0.3
- ecs_cluster: README section documenting the wildcard pattern + the
  `ravion_custom_domains` DNS-01 validation flow
- ecs_service: update comments to reference cluster module outputs

Adopt domains provider 0.2.0 (domains_module_certificate → domains_certificate)

The provider dropped the platform-internal naming (no more "module" or
"cluster" in resource types). domains_module_certificate is now just
domains_certificate; domains_cluster_certificate folded into the same
resource since the only difference was whether the caller passes
explicit domains.

Bump version constraint to ~> 0.2.

Wire per-service custom-domain certs via aws_lb_listener_certificate

domains_module_certificate no longer accepts listener_arn (provider 0.1.x).
Add explicit aws_lb_listener_certificate.ravion_custom to bind the cert as
SNI on the cluster's HTTPS listener. Refresh doc strings across both
modules to match the new ownership model.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
siddhsuresh and others added 8 commits May 18, 2026 21:32
0.4.3 in the local mirror has the new ravion_domain resource (and
certificate.domains field). 0.1.99 was an old binary that still
exposed domains_module_certificate / domains_app_domain etc.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
The alb module knows nothing about Ravion-managed domains anymore —
the ecs_cluster module attaches the Ravion wildcard cert as an SNI
extra on the listener after the alb is created. The precondition was
referencing a variable that was never declared in alb's variables.tf.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Mode B previously replaced the auto-FQDN with the customer domains in
the listener rule's host_header, dropping all traffic to the auto-FQDN
the moment `domains` was set. Now Mode A and Mode B are two separate
`ravion_domain` resources living side-by-side; the listener rule
matches BOTH the auto-FQDN and every customer FQDN, giving the
customer a no-downtime window to flip their DNS over. The Ravion
control plane retires the auto-FQDN's Route53 record once at least
one customer routing record is MATCHED.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Reads cutover state from the new data source so the auto resource's
count flips to 0 once Ravion retires the slot. Next plan after
retirement shows a clean destroy of ravion_domain.auto[0] and the
listener rule narrows to only customer FQDNs. Provider bumped to 0.4.4
for the new data source.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Following the Mode A/B split, outputs.tf still referenced the dead
`ravion_domain.this` symbol. Repoint to `ravion_domain.auto` and add
two new outputs for the Mode B cert (custom domain id + cert_arn).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Drops aws_lb_listener_certificate.ravion. The new ravion_domain.custom
takes `resource_arn = var.cluster_https_listener_arn` and Ravion
attaches the cert to the listener server-side after ACM validates it.

TF apply no longer fails with `UnsupportedCertificate` for the
between-states-window when ACM has not yet flipped the cert ISSUED.
The whole post-issue + pre-delete cert→listener lifecycle lives in
Ravion now.

Provider pinned to 0.4.5 for the new resource_arn attribute.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Two changes both targeting the same edge case (Mode B → Mode A):

1. `ravion_auto_retired` is only honoured while `ravion_mode_b` is
   true. When the user removes every entry from `var.domains`, the
   auto-FQDN must come back — otherwise the listener rule has zero
   host_header values and AWS rejects the apply.

2. Defensive: `aws_lb_listener_rule.ravion` count guards on
   `length(local.ravion_host_header_values) > 0`, so any future
   transient zero-headers state declines to declare the rule instead
   of failing AWS schema validation at plan time.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@siddhsuresh siddhsuresh marked this pull request as ready for review May 18, 2026 21:15
Comment thread compute/ecs_cluster/versions.tf
Comment thread compute/ecs_cluster/module.yml Outdated
Comment thread compute/ecs_service/module.yml Outdated
Comment thread networking/alb/locals.tf
siddhsuresh and others added 3 commits May 19, 2026 16:39
…r to 0.5.0

Multiple services share one cluster HTTPS listener, and AWS rejects
duplicate listener-rule priorities. Default ravion_listener_rule_priority
was 50000 (collision-prone if anyone overrode to a fixed number); now
defaults to 0 = auto-derive a stable priority from sha256(var.name),
giving every service a unique [1000, 49999] slot without hand-picking.
Explicit values (1-50000) still win.

Bumps both compute/ecs_service and compute/ecs_cluster provider pin
from 0.4.x to 0.5.0, which is the rotation-aware terraform-provider-domains
build: in-place SAN edits no longer destroy+create the cert (the prior
flow cascade-deleted ManagedDomain rows and left the Domains tab empty
while the new cert sat in PENDING_VALIDATION). 0.5.0 lands the PATCH
/domains/{id} path and tower-go's RotateManagedCertWorkflow.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
When cluster_parent_domain_id is set, aws_lb_listener_rule.ravion owns
ALB routing scoped by host_header to this service's FQDNs. The caller-
supplied listener_rules from the control plane became redundant — and
actively harmful: priorities collide across services on the shared
listener, and any path-only rule (path-pattern /*) catches traffic
destined for sibling services. Short-circuit them entirely in that
mode.

Also add `ignore_changes = [action]` to the ravion rule so blue/green
deploy controllers can flip target groups without TF undoing the swap.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
0.5.1 ships the schema fix that drops UseStateForUnknown on the
rotation pending_* attributes — without it, in-place SAN rotations
trigger "Provider produced inconsistent result after apply" because
the planner expected the prior (empty) state values but the server
returns the new rotation target.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
siddhsuresh and others added 4 commits May 19, 2026 21:47
…ssary try() calls for improved readability and performance. Update ECS cluster and service modules to support Ravion-managed domains, including new inputs for cluster domain ID and HTTPS listener ARN. Adjust ALB and NLB configurations to handle Ravion-specific requirements, ensuring proper certificate management and listener settings. Enhance documentation in module.yml files to clarify usage of Ravion-managed domains and associated configurations.
Restores the validation conditions in compute/{autoscaling,ecs_cluster,ecs_service}
and networking/{alb,nlb} variables.tf to their pre-03f87c3 state. The other
changes in 03f87c3 (Ravion module.yml + listener wiring fixes) remain.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
`public_alb_certificate_arns` is a pre-Ravion input. ARNs in it are
frequently stale once the customer flips to Ravion mode, and silently
trying to SNI-attach them on every apply turns dead certs into
recurring "CertificateNotFound" failures. The Ravion wildcard already
serves as the listener default, so the SNI-extras resource was strictly
additive and never load-bearing.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
The alb sub-module's security_group wires HTTPS ingress (443) only when
it owns the HTTPS listener (gated on local.create_https_listener). My
prior refactor flipped that off so Ravion could own the listener with
the wildcard cert as default — but I forgot to re-add the SG rule.

Symptom: dig resolves, port 443 TCP connect times out (SG drop), demo
HTTPS broken end-to-end.

Mirrors the rules the alb module would have emitted: one per IPv4 cidr
in var.public_alb_ingress_cidr_blocks, one per IPv6 cidr (hardcoded
::/0 to match the alb module's default since the cluster module doesn't
yet expose a public_alb_ingress_ipv6_cidr_blocks pass-through).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant