This Infrastructure-as-Code (IaC) project provisions a highly available Rancher Kubernetes Engine 2 (RKE2) cluster on AWS using Terraform for infrastructure and Ansible for cluster bootstrap and configuration.
The goal of this project is to demonstrate a production-style Kubernetes deployment with fully automated provisioning and configuration management, suitable for Proof-of-Concept (PoC) and learning environments.
Key features include:
- Multi-node high availability control plane with embedded etcd
- NGINX TCP load balancing for Kubernetes API and RKE2 supervisor traffic
- Ansible roles with Jinja2 templating, idempotent tasks, and handlers
- Dynamic inventory — Terraform outputs feed directly into Ansible, no manual IP management
- Bastion-host access model for secure, private-subnet cluster operations
- End-to-end cluster validation after deployment
The cluster is deployed within a dedicated AWS VPC with public and private subnets.
-
NGINX Load Balancer / Bastion (
t3.micro, public subnet)- SSH jump host for operator and Ansible access
- TCP load balancer for Kubernetes API (port 6443) and RKE2 supervisor (port 9345)
-
Control Plane Nodes × 3 (
t3.medium, private subnet)- Run Kubernetes control plane components
- Form an embedded etcd quorum for high availability
-
Worker Nodes (
t3.medium, private subnet)- Run application workloads (kubelet + kube-proxy)
-
Networking
- VPC
10.0.0.0/16with public (10.0.1.0/24) and private (10.0.2.0/24) subnets - NAT Gateway for outbound internet access from private nodes
- No direct public access to control plane or worker nodes
- VPC
| Layer | Tool | Purpose |
|---|---|---|
| Infrastructure | Terraform | VPC, subnets, EC2, security groups, SSH key, NAT |
| Inventory | Python (dynamic) | Translates Terraform output → Ansible groups + hostvars |
| Configuration | Ansible | Node prep, NGINX, RKE2 install, kubeconfig setup |
| Templates | Jinja2 | NGINX config, RKE2 config.yaml files |
.
├── terraform/ # AWS infrastructure
│ ├── vpc.tf # VPC, subnets, IGW, NAT, route tables
│ ├── compute.tf # EC2 instances (bastion, control planes, workers)
│ ├── security.tf # Security groups
│ ├── ssh.tf # Auto-generated RSA key pair
│ ├── inventory.tf # Generates inventory/inventory.json
│ ├── outputs.tf
│ └── variables.tf
│
├── inventory/
│ └── inventory.json # Generated by Terraform — do not edit manually
│
├── ansible/
│ ├── ansible.cfg # Ansible settings (inventory, SSH options)
│ ├── inventory.py # Dynamic inventory: reads inventory.json,
│ │ # outputs Ansible JSON with ProxyCommand per host
│ ├── site.yml # Master playbook — runs all roles in order
│ ├── group_vars/
│ │ └── all.yml # Shared variables (timeouts, retry counts)
│ └── roles/
│ ├── common/ # All k8s nodes: disable swap, install deps
│ ├── nginx_lb/ # Bastion: install NGINX, deploy stream config
│ ├── rke2_init/ # CP-1: install RKE2, cluster-init, start etcd
│ ├── rke2_cp/ # CP-2/3: fetch token, join cluster (serial: 1)
│ ├── rke2_worker/ # Workers: install rke2-agent, register node
│ └── kubectl_access/ # Bastion: install kubectl, configure kubeconfig
│
├── install.sh # One-command: terraform apply → ansible-playbook
├── start.sh # Bootstrap only (assumes infra exists)
└── shutdown.sh # terraform destroy
ansible/site.yml runs 7 plays in sequence:
| Play | Hosts | Role | What it does |
|---|---|---|---|
| 1 | k8s_nodes |
common |
Disable swap, install curl/jq |
| 2 | bastion |
nginx_lb |
Install NGINX, deploy Jinja2 stream config, restart via handler |
| 3 | control_plane_init |
rke2_init |
Install RKE2, write cluster-init: true config, start service, wait for port 9345 |
| 4 | control_plane_join |
rke2_cp |
Fetch token (delegate_to CP1), install RKE2, join cluster — serial: 1 |
| 5 | workers |
rke2_worker |
Fetch token, install rke2-agent, register with cluster |
| 6 | bastion |
kubectl_access |
Install kubectl, slurp kubeconfig from CP1, patch server URL to 127.0.0.1 |
| 7 | bastion |
inline tasks | Wait for all nodes Ready, label workers, verify system pods |
ansible/inventory.py reads inventory/inventory.json (generated by Terraform) and outputs Ansible inventory JSON with:
- Groups:
bastion,control_plane_init,control_plane_join,control_plane,workers,k8s_nodes - Per-host vars:
ansible_ssh_private_key_file,ansible_ssh_common_argswith ProxyCommand for private nodes - Shared vars:
lb_private_ip,rke2_versioninjected into every host
This means no manual inventory editing — destroy and recreate infrastructure and Ansible automatically picks up the new IPs.
- Embedded etcd on control planes — no external etcd to manage or back up separately
- Jinja2 NGINX template — control plane IPs rendered at runtime from inventory groups
serial: 1for CP joins — nodes join the etcd cluster one at a time to preserve quorumdelegate_tofor token fetch — Ansible retrieves the RKE2 join token from CP1 directly, no intermediate fileswait_forafter service start — explicit readiness checks on port 9345 before proceeding to the next play- TLS SANs include LB IP —
kubectlconnects via load balancer without certificate errors - Bastion ProxyCommand — all private-node SSH tunnels through bastion, configured per-host in dynamic inventory
Local tools
- Linux or macOS with Bash
- Terraform v1.5+
- Ansible (
pip install ansible) - Python 3
- AWS CLI configured with valid credentials (
aws configure)
AWS account permissions
- Create VPCs, subnets, and route tables
- Launch EC2 instances and manage security groups
- Provision NAT Gateways and Elastic IPs
Verify everything is in place before running:
terraform version && ansible --version && python3 --version && aws sts get-caller-identitycurl -fsSL https://raw.githubusercontent.com/Abhiram-Rakesh/RKE2-Kubernetes-HA-AWS/main/install.sh | bashIf the repo is not already present locally, the script clones it to ~/RKE2-Kubernetes-HA-AWS automatically before proceeding.
install.sh will:
- Check all local prerequisites and AWS credentials
- Run
terraform initandterraform applyto provision the VPC, subnets, EC2 instances, and security groups - Generate
inventory/inventory.jsonfrom Terraform outputs - Run
ansible-playbook ansible/site.ymlto bootstrap the full cluster end-to-end
Use this if you want full control over each phase, or if you need to re-run a specific step.
Step 1 — Clone the repository
git clone https://github.com/Abhiram-Rakesh/RKE2-Kubernetes-HA-AWS.git
cd RKE2-Kubernetes-HA-AWSStep 2 — Install Ansible
pip install ansibleStep 3 — Provision AWS infrastructure
cd terraform
terraform init
terraform apply
cd ..Terraform will create the VPC, subnets, IGW, NAT Gateway, security groups, EC2 instances, and write inventory/inventory.json.
Step 4 — Make the dynamic inventory executable
chmod +x ansible/inventory.pyStep 5 — Bootstrap the cluster
cd ansible
ansible-playbook site.ymlThis runs all 7 plays in sequence: node prep → NGINX LB → CP init → CP join → workers → kubeconfig → verify.
The bastion public IP is printed by Terraform at the end of terraform apply. SSH in using the auto-generated key:
# curl install
ssh -i ~/RKE2-Kubernetes-HA-AWS/terraform/ssh_key.pem ubuntu@<BASTION_PUBLIC_IP>
# manual install (from repo root)
ssh -i terraform/ssh_key.pem ubuntu@<BASTION_PUBLIC_IP>Then verify the cluster:
kubectl get nodes
kubectl get pods -AAll nodes should be in Ready state.
If the AWS infrastructure is already provisioned and you only want to re-run Ansible:
# curl install
bash ~/RKE2-Kubernetes-HA-AWS/start.sh
# manual install (from repo root)
bash start.shcurl -fsSL https://raw.githubusercontent.com/Abhiram-Rakesh/RKE2-Kubernetes-HA-AWS/main/shutdown.sh | bashThe script resolves the installation at ~/RKE2-Kubernetes-HA-AWS (where the quick install placed it) and runs terraform destroy -auto-approve against the existing state.
Step 1 — Destroy all AWS infrastructure
cd terraform
terraform destroyTerraform will prompt for confirmation before destroying. Type yes to proceed.
Step 2 — Clean up local state (optional)
cd ..
rm -f inventory/inventory.json terraform/ssh_key.pem terraform/ssh_key.pem.pubWarning:
terraform destroyis irreversible. All EC2 instances, networking components, and cluster data will be permanently deleted.
- Verify AWS credentials:
aws sts get-caller-identity - Check you have the required IAM permissions
- Verify the region has sufficient EC2 capacity for
t3.medium
- Confirm the bastion is reachable:
ssh -i terraform/ssh_key.pem ubuntu@<BASTION_IP> - Ensure
terraform/ssh_key.pemhas correct permissions:chmod 600 terraform/ssh_key.pem - Check
inventory/inventory.jsonexists and contains valid IPs
- Verify NGINX is running on the bastion:
sudo systemctl status nginx - Check that control plane security group allows port 6443 and 9345 from the bastion
- Review RKE2 server logs:
sudo journalctl -u rke2-server -f
- Check kubelet/agent logs on the affected node:
sudo journalctl -u rke2-agent -f(workers) orsudo journalctl -u rke2-server -f(control planes) - Ensure the node can reach the LB on port 9345
This project demonstrates a production-aligned Kubernetes deployment using two of the most widely used DevOps tools:
- Terraform manages all cloud infrastructure declaratively
- Ansible handles configuration management with roles, Jinja2 templates, idempotent tasks, and proper handlers
The combination of a Terraform-generated dynamic inventory and Ansible ProxyCommand SSH tunneling reflects real-world patterns used in enterprise Kubernetes deployments.