AWS Terraform Case Studies: Practical Implementations

AWS Terraform Case Studies: Practical Implementations - featured image

Key Takeaways

This guide distills practical AWS Terraform case studies into replicable patterns engineers can apply. Each case emphasizes secure, multi-account delivery with reproducible code workflows and real-world blueprints you can copy, adapt, and ship with confidence.

  • Anchor to LZA/Control Tower guardrails: Reuse SSM parameters for KMS, VPC, shared services, and enforce HCP Terraform Sentinel policies in CI/CD.
  • Standardize RAG on Amazon Bedrock: Compose Agents, Knowledge Bases, and OpenSearch with Terraform modules for repeatable, auditable AI deployments across accounts.
  • Automate data transfers with DataSync: Orchestrate EFS to S3 and cross-account migrations using Terraform modules, minimizing manual steps and configuration drift.
  • Codify end-to-end Terraform delivery: Implement plans and applies via GitHub Actions to streamline deployments and governance hooks.
  • Harden identities and encryption paths: Use cross-account IAM role assumptions and AWS KMS encryption at rest to secure multi-account infrastructure workflows.
  • Stabilize state and detect drift early: Store remote state in S3 with the native lockfile, version modules, and surface inventory visibility continuously. (No DynamoDB table required on Terraform 1.10+.)

Next, we unpack each case with architectures, code patterns, and pipeline choreography. Use these blueprints to accelerate secure, scalable, and auditable deployments

Introduction

Need blueprints you can promote to production – not theory? This article distills AWS Terraform case studies into repeatable patterns engineers can apply across accounts. We anchor designs to Control Tower and the Landing Zone Accelerator, reusing SSM parameters and shared services to keep delivery secure, consistent, and auditable.

You will see how to standardize RAG on Amazon Bedrock by composing Agents, Knowledge Bases, and OpenSearch modules, and how to enforce HCP Terraform Sentinel policies directly in CI/CD. We map the guardrails, variables, and module boundaries that keep changes traceable while enabling velocity across teams.

We also automate data movement with AWS DataSync for EFS to S3 and cross-account migrations, codify plan and apply via GitHub Actions, and harden identities with cross-account IAM and KMS encryption. Remote state in S3 using the native lockfile reduces drift and surprises – no DynamoDB table needed on Terraform 1.10+. Let’s explore the architectures, code patterns, and pipeline choreography.

Landing Zone Accelerator Terraform guardrails in practice

Most tutorials stop at a single account and a sunny-day plan and apply. Real enterprises already have an AWS Control Tower or Landing Zone Accelerator footprint that defines the baseline, and your Terraform needs to plug into those guardrails instead of reinventing them. The most reliable production pattern we have seen is simple: read org-defined parameters from SSM, use those IDs directly in modules, and let policy-as-code catch anything off-pattern in CI. That way, every workload inherits encryption, networking, and logging without your team copy-pasting values across repos. If you are just getting started, the fundamentals in AWS Terraform Integration Basics: A Beginner’s Guide will set a solid foundation before you scale.

Reusing SSM parameters for KMS and VPC

With Landing Zone Accelerator on AWS (LZA) or Control Tower, shared primitives like VPC IDs, subnet lists, KMS keys, and log buckets are often published to AWS Systems Manager Parameter Store. The pattern is to define a tiny wrapper module or direct data sources that read those values. You then pass them to your workload modules, enforcing consistency and zero guessing. Compared to embedding IDs in locals, this approach reacts to platform updates automatically and avoids drift across dozens of repos. AWS has a helpful primer on LZA integration patterns in Using Terraform with Landing Zone Accelerator on AWS.

Here is a minimal example that pulls the standard KMS key and VPC from SSM. It supports both cross-account and per-account parameters if your platform team scopes them that way. In many AWS Terraform case studies, this simple lookup pattern is the difference between fast, repeatable rollouts and brittle copy-paste configs.


data "aws_ssm_parameter" "kms_key_arn" {
  name = "/org/security/kms/workload_key_arn" # published by LZA or platform
}

data "aws_ssm_parameter" "vpc_id" {
  name = "/org/network/vpc/main/id"
}

data "aws_ssm_parameter" "private_subnet_ids" {
  name = "/org/network/vpc/main/private_subnet_ids" # comma-separated
}

locals {
  private_subnets = split(",", data.aws_ssm_parameter.private_subnet_ids.value)
}

module "app_bucket" {
  source  = "terraform-aws-modules/s3-bucket/aws"
  version = "~> 5.6"

  bucket            = "app-logs-${var.env}"
  force_destroy = false

  # Enforce SSE-KMS using platform key
  server_side_encryption_configuration = {
    rule = {
      apply_server_side_encryption_by_default = {
        sse_algorithm           = "aws:kms"
        kms_master_key_id = data.aws_ssm_parameter.kms_key_arn.value
      }
    }
  }
}

module "app_vpc_security" {
  source  = "terraform-aws-modules/security-group/aws"
  version = "~> 5.3"

  name         = "app-sg"
  description = "App security group"
  vpc_id        = data.aws_ssm_parameter.vpc_id.value

  ingress_rules = ["https-443-tcp"]
  egress_rules  = ["all-all"]
}

A lightweight helper Lambda can periodically synchronize LZA outputs to SSM if your platform prefers CloudFormation exports. Many teams also publish SSM parameters under a path like /ct/shared/ to separate from environment-specific values. Whichever pattern you choose, document the parameter names in a common README and treat them as the API for your platform team. If you want an independent check against AWS Well-Architected guardrails, review our non-promotional AWS & DevOps re:Align guidance for alignment tips you can apply immediately.

Mapping practical cases of AWS Terraform implementations

Let’s connect the above guardrails to practical cases of AWS Terraform implementations you will actually ship. These fall into a few predictable buckets: Bedrock RAG stacks for internal search, scheduled DataSync jobs for migrations, multi-account CI pipelines that assume roles, and shared encryption controls across services. Each case uses the same foundation: SSM parameters for VPC, subnets, and KMS keys; cross-account providers; and Sentinel to stop misconfigurations early. If you standardize those basics, the rest becomes reusable modules instead of bespoke one-offs. That is the throughline you will notice across AWS Terraform case studies in the wild.

Here is a quick mapping you can reuse. Start small, cut complexity, and expand once the first deployment is boring and reliable. Treat these as scaffolds you can refine over time.

  • AI stack – Amazon Bedrock Agents with Knowledge Bases, OpenSearch Serverless for vector search, and an S3 corpus. All subnets, security groups, and KMS are pulled from SSM.
  • Data transfers – AWS DataSync tasks from EFS to S3 with CloudWatch Events schedule and KMS encryption enforced through data source lookups.
  • Pipelines – GitHub Actions with OIDC. Plans run in non-prod first, then Sentinel policies gate merges and applies in prod.
  • Governance – HCP Terraform workspaces with Sentinel policy sets bound to folders or tags. State always encrypted at rest with a platform KMS key.

For a refresher on why this pairing accelerates delivery, explore the benefits of combining Terraform with AWS. These building blocks keep variance low while giving teams room to move fast. If you want ongoing reading material for similar patterns and lessons learned, browse our blog for deep dives and practical walkthroughs.

HCP Terraform Sentinel policies in pipelines

Policy-as-code becomes practical when it is not optional. HCP Terraform lets you attach Sentinel policy sets to workspaces based on tags or projects, and you can run enforcement on plan or apply. Common production controls include allowed regions, mandatory KMS encryption for S3, minimum TLS on ALBs, and log delivery enforcement for CloudTrail and CloudWatch. You do not need 50 policies on day one – start with three that block the most expensive mistakes, then expand as findings show up. This is a recurring thread across AWS Terraform case studies because preventing risky plans is far cheaper than remediating them.

Here is a tiny Sentinel example that blocks unencrypted S3 buckets unless a KMS key ARN is present. Bind it to your workspace set that manages data stores.


import "tfplan/v2" as tfplan

# Helper: is this change a create or update?
is_change = func(actions) {
  any actions as a { a in ["create", "update"] }
}

# Helper: does the bucket resource itself (legacy) specify KMS?
has_inline_kms = func(rc) {
  changed = rc.change.after
  changed is not null and
  changed.server_side_encryption_configuration is not null and
  changed.server_side_encryption_configuration.rule.apply_server_side_encryption_by_default.kms_master_key_id is not null
}

# Helper: is an explicit exception tag present? (string "true")
has_exception_tag = func(rc) {
  changed = rc.change.after
  changed is not null and
  changed.tags is not null and
  (lower(changed.tags.encryption_exception) == "true")
}

# Helper: does the plan contain any SSE config resource with a KMS key?
# (Note: this does not strictly match the same bucket ID, but enforces that
# at least one KMS-backed SSE config is created/updated in this run.)
plan_has_kms_sse_resource = func() {
  any tfplan.resource_changes as sse_rc {
    sse_rc.type == "aws_s3_bucket_server_side_encryption_configuration" and
    is_change(sse_rc.change.actions) and
    sse_rc.change.after is not null and
    sse_rc.change.after.rule is not null and
    length(sse_rc.change.after.rule) > 0 and
    sse_rc.change.after.rule[0].apply_server_side_encryption_by_default.kms_master_key_id is not null
  }
}

main = rule {
  all tfplan.resource_changes as rc {
    # Only care about S3 bucket creates/updates
    rc.type != "aws_s3_bucket" or
    not is_change(rc.change.actions) or
    has_exception_tag(rc) or
    has_inline_kms(rc) or
    plan_has_kms_sse_resource()
  }
}

To integrate this in CI, run plans in HCP Terraform via the API or the GitHub Actions integration. Plans that violate policies will fail with clear messages that developers can fix before approval. For broader governance patterns, the APN post Scale Your AWS Environment Securely with HashiCorp Terraform and Sentinel Policy as Code shows how teams scale policy at pace. HashiCorp maintains up-to-date docs for Sentinel and HCP Terraform integrations at developer.hashicorp.com.

Standardized RAG on Amazon Bedrock – Terraform modules

Retrieval augmented generation patterns on AWS look complicated until you standardize the components. You need a corpus in S3, document embeddings, a vector index, and an agent to orchestrate the flow. Terraform can stitch these together with a few opinionated modules and then plug the deployment into your landing zone guardrails. The result is a repeatable RAG scaffold that teams can deploy in a new account in under an hour without custom glue. AWS’s engineering teams have published end-to-end examples of Terraform-driven Bedrock stacks, such as the reference in Build an AI-powered automated summarization system with Amazon Bedrock.

Agents, Knowledge Bases, and OpenSearch composition

Amazon Bedrock supports Knowledge Bases to connect your documents to models, and Agents to handle orchestration and tool use. For vector storage, OpenSearch Serverless is a convenient option because you can enable vector search collections and keep encryption aligned with your KMS key. Many teams also add an SQS or EventBridge pipeline to handle document updates, but the first milestone is getting ingestion and query working reliably and securely. Across many AWS Terraform case studies, this component split – corpus, embeddings, vectors, agent – keeps stacks simple to reason about.

Terraform-wise, you can split this into modules: a knowledge base module, an OpenSearch Serverless module configured for vector embeddings, and an agent module that references the knowledge base. The S3 bucket and KMS are provided by SSM parameters to keep everything inside the guardrails. Here is a sketch using the registry where possible and custom resources where the provider is still maturing.


data "aws_ssm_parameter" "kms_arn" {
  name = "/org/security/kms/workload_key_arn"
}

data "aws_ssm_parameter" "vpc_id" {
  name = "/org/network/vpc/main/id"
}

data "aws_ssm_parameter" "subnets" {
  name = "/org/network/vpc/main/private_subnet_ids"
}

locals {
  subnets = split(",", data.aws_ssm_parameter.subnets.value)
}

# S3 corpus bucket (terraform-aws-modules/s3-bucket v5)
module "rag_corpus_bucket" {
  source  = "terraform-aws-modules/s3-bucket/aws"
  version = "~> 5.0"

  bucket            = "rag-corpus-${var.env}"
  force_destroy = false

  # Enforce SSE-KMS using the platform key from SSM
  server_side_encryption_configuration = {
    rule = {
      apply_server_side_encryption_by_default = {
        sse_algorithm           = "aws:kms"
        kms_master_key_id = data.aws_ssm_parameter.kms_arn.value
      }
    }
  }
}

# OpenSearch Serverless collection for vectors
resource "aws_opensearchserverless_collection" "vectors" {
  name = "kb-vectors-${var.env}"
  type   = "VECTORSEARCH"
}

# Encryption policy for the collection (uses customer-managed KMS)
resource "aws_opensearchserverless_security_policy" "encryption" {
  name  = "kb-vectors-kms-${var.env}"
  type    = "encryption"
  policy  = jsonencode({
    Rules = [{
      ResourceType = "collection",
      Resource         = ["collection/${aws_opensearchserverless_collection.vectors.name}"]
    }],
    AWSOwnedKey = false,
    KmsKeyArn        = data.aws_ssm_parameter.kms_arn.value
  })
}

# Private access via VPC endpoint (recommended for production)
resource "aws_opensearchserverless_vpc_endpoint" "main" {
  name                      = "aoss-${var.env}"
  vpc_id                     = data.aws_ssm_parameter.vpc_id.value
  subnet_ids              = local.subnets
  security_group_ids = [module.app_vpc_security.security_group_id]
}

# Network policy: disable public, allow via the VPC endpoint only
resource "aws_opensearchserverless_security_policy" "network" {
  name  = "kb-vectors-network-${var.env}"
  type    = "network"
  policy = jsonencode({
    Rules = [
      {
        ResourceType     = "collection",
        Resource             = ["collection/${aws_opensearchserverless_collection.vectors.name}"],
        AllowFromPublic = false,
        SourceVPCEs     = ["vpce/${aws_opensearchserverless_vpc_endpoint.main.id}"]
      },
      {
        ResourceType     = "dashboard",
        Resource             = ["collection/${aws_opensearchserverless_collection.vectors.name}"],
        AllowFromPublic = false,
        SourceVPCEs     = ["vpce/${aws_opensearchserverless_vpc_endpoint.main.id}"]
      }
    ]
  })
}

# Data access policy: grant a writer role least-privileged access
# (Bind this to a CI/CD or app role ARN)
variable "aoss_writer_role_arn" {
  type            = string
  description = "Role allowed to write/read vectors"
}

resource "aws_opensearchserverless_access_policy" "data" {
  name = "kb-vectors-access-${var.env}"
  type = "data"
  policy = jsonencode([{
    Description = "Least-privileged access for vector collection",
    Rules = [{
      Resource = [
        "collection/${aws_opensearchserverless_collection.vectors.name}",
        "index/${aws_opensearchserverless_collection.vectors.name}/*"
      ],
      Permission = [
        "aoss:ReadDocument",
        "aoss:WriteDocument",
        "aoss:CreateIndex",
        "aoss:DescribeCollectionItems"
      ]
    }],
    Principal = [var.aoss_writer_role_arn]
  }])
}

# Knowledge Base (Agents for Amazon Bedrock)
resource "aws_bedrockagent_knowledge_base" "kb" {
  name     = "kb-${var.env}"
  role_arn = aws_iam_role.bedrock_kb_role.arn

  knowledge_base_configuration {
    type = "VECTOR"
    vector_knowledge_base_configuration {
      embedding_model_arn = var.embedding_model_arn
    }
  }

  storage_configuration {
    type = "OPENSEARCH_SERVERLESS"
    opensearch_serverless_configuration {
      collection_arn           = aws_opensearchserverless_collection.vectors.arn
      vector_index_name  = "rag-index"
      field_mapping = {
        vector_field      = "vector"
        text_field          = "text"
        metadata_field = "metadata"
      }
    }
  }

  # S3 corpus as a data source
  data_source {
    name = "corpus"
    configuration {
      type = "S3"
      s3_configuration {
        bucket_arn            = module.rag_corpus_bucket.s3_bucket_arn
        inclusion_prefixes = ["docs/"]
      }
    }
  }
}

# Bedrock Agent
resource "aws_bedrockagent_agent" "rag_agent" {
  name                                 = "rag-agent-${var.env}"
  foundation_model             = var.foundation_model_id
  instruction                         = "Answer using the knowledge base. Cite sources."
  agent_resource_role_arn = aws_iam_role.bedrock_agent_role.arn
}

# Associate KB with the Agent
resource "aws_bedrockagent_agent_knowledge_base_association" "kb_assoc" {
  agent_id                    = aws_bedrockagent_agent.rag_agent.agent_id
  description                = "Primary RAG knowledge base"
  knowledge_base_id = aws_bedrockagent_knowledge_base.kb.knowledge_base_id
}

Provider support for Bedrock resources continues to improve. If a given resource is not yet available, wrap the AWS CLI in a small module using null_resource and local-exec with clear idempotency checks. Keep that shim isolated so you can swap it out when the native resource becomes available in the provider. You can also reference the AWS Prescriptive Guidance pattern Deploy a RAG use case on AWS by using Terraform and Amazon Bedrock for a production-ready baseline.

AWS Terraform case studies – RAG architecture

Let’s ground this in a repeatable architecture you can adapt for internal document search with citations. A straightforward layout uses an S3 bucket for corpus, a Lambda preprocessor to chunk documents, an OpenSearch Serverless collection for vectors, and a Bedrock Agent to serve answers. An API Gateway – Lambda bridge adds authentication and request shaping while keeping networking within your landing zone subnets. AWS Terraform case studies often reuse centralized KMS and VPC parameters from SSM to keep encryption and connectivity consistent out of the box.


module "lambda_auth" {
  source  = "terraform-aws-modules/lambda/aws"
  version = "~> 8.0"

  function_name     = "rag-gateway-${var.env}"
  handler                = "app.handler"
  runtime                = "python3.12"
  source_path            = "src/gateway"

  vpc_subnet_ids              = local.subnets
  vpc_security_group_ids = [module.app_vpc_security.security_group_id]

  environment_variables = {
    AGENT_ID = aws_bedrockagent_agent.rag_agent.agent_id
  }

  # Encrypt env vars with your platform CMK from SSM
  kms_key_arn = data.aws_ssm_parameter.kms_arn.value
}

module "apigw" {
  source  = "terraform-aws-modules/apigateway-v2/aws"
  version = "~> 5.3"

  name             = "rag-api-${var.env}"
  protocol_type = "HTTP"

  # Integrate HTTP API with the Lambda function
  target = module.lambda_auth.lambda_function_arn

  cors_configuration = {
    allow_origins    = ["https://app.internal.example.com"]
    allow_methods = ["POST"]
  }
}

The team experience here is consistent across organizations: keep IAM trust simple for the agent role and Lambda execution role, reference shared logging buckets from SSM, and parameterize everything you can. That way, a dev-to-prod promotion becomes a matter of switching workspace variables. These are the boring, repeatable bits you see again and again in AWS Terraform case studies, and that is a good thing.

When you’re ready to stand up the corpus bucket, vector collection, and agent behind API Gateway in a fresh account, AWS & DevOps re:Build walks through the pipeline, promotion steps, and guardrails to turn this pattern into a production service.

Example pipeline with Sentinel checks in CI

Bedrock workloads tend to produce many resources, so pre-flight checks matter. A common setup is to run terraform fmt, validate, and plan in a feature branch, then use a pull request to trigger Sentinel in HCP Terraform before you merge to main. If the resource graph includes any S3 bucket without KMS, or any public subnet without an intentional flag, the pipeline blocks. That protects your security posture even if someone forgets an attribute or copies an outdated example. Pipelines like this show up in many AWS Terraform case studies because they strike the right balance between speed and safety.


name: rag-terraform

on:
  pull_request:
    branches: [ main ]

# Prevent overlapping runs on the same PR/branch
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

# Minimal token scope for this workflow
permissions:
  contents: read

jobs:
  plan:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout
        uses: actions/checkout@v5

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: 1.13.0

      - name: Terraform format
        run: terraform fmt -check -recursive

      # Local validation catches syntax/provider issues fast.
      # (HCP Terraform will do its own init/plan.)
      - name: Terraform init
        run: terraform init -backend=false

      - name: Terraform validate
        run: terraform validate

      # Run plan in HCP Terraform so Sentinel policy checks apply
      - name: HCP Terraform plan
        id: plan
        uses: hashicorp/hcp-terraform-actions@v1
        with:
          command: plan
          workspace: rag-${{ github.head_ref }}

      # Fail the job if Sentinel policy checks did not pass
      - name: Enforce Sentinel policy
        run: test "${{ steps.plan.outputs.policy_checks_passed }}" = "true"

On merge to main, an apply job runs in a controlled environment and assumes the deploy role for the target account. Pair this with branch protections and required PR approvals, and you get auditable changes plus a decent developer experience. The same pattern works for DataSync and analytics stacks, not just AI. You will see this pattern echoed across AWS Terraform case studies that favor consistency over one-off scripts.

Automating AWS DataSync with Terraform modules

Data migrations are not glamorous, but Terraform can make them boringly reliable. AWS DataSync supports EFS, S3, FSx, SMB, and NFS sources and destinations, and you can schedule tasks to run once or continuously. The trick is to set up the task roles and endpoint agents with the right IAM boundaries and KMS keys, then treat task runs as code-triggered events. Below are patterns you can reuse whether you are doing a one-time migration or recurring syncs between accounts. For step-by-step examples, see the AWS Storage Blog guide Automate data transfers and migrations with AWS DataSync and Terraform.

EFS to S3 tasks and scheduling

An EFS to S3 transfer is a classic example. You define the locations, the task, a schedule, and the CloudWatch log group. You also enforce encryption on the destination bucket and make sure the DataSync service role has permission to use the KMS key. Publishing those ARNs in SSM means the migration module can run safely in any account without reconfiguration. That is a practical move you will find in many AWS Terraform case studies, especially when multiple teams share the same paved road.


data "aws_ssm_parameter" "kms_arn" {
  name = "/org/security/kms/workload_key_arn"
}

module "dst_bucket" {
  source  = "terraform-aws-modules/s3-bucket/aws"
  version = "~> 5.6"

  bucket = "efs-archive-${var.env}"

  # Enforce SSE-KMS with platform key from SSM
  server_side_encryption_configuration = {
    rule = {
      apply_server_side_encryption_by_default = {
        sse_algorithm           = "aws:kms"
        kms_master_key_id = data.aws_ssm_parameter.kms_arn.value
      }
    }
  }
}

data "aws_caller_identity" "current" {}

resource "aws_datasync_location_efs" "src" {
  ec2_config {
    security_group_arns = [module.app_vpc_security.security_group_arn]
    subnet_arn                = "arn:aws:ec2:${var.region}:${data.aws_caller_identity.current.account_id}:subnet/${local.subnets[0]}"
  }
  efs_file_system_arn = aws_efs_file_system.app.arn
}

resource "aws_datasync_location_s3" "dst" {
  s3_bucket_arn = module.dst_bucket.s3_bucket_arn
  subdirectory     = "/archive"
  s3_config {
    bucket_access_role_arn = aws_iam_role.datasync_role.arn
  }
}

resource "aws_datasync_task" "efs_to_s3" {
  name                               = "efs-to-s3-${var.env}"
  source_location_arn        = aws_datasync_location_efs.src.arn
  destination_location_arn = aws_datasync_location_s3.dst.arn

  options {
    verify_mode            = "POINT_IN_TIME_CONSISTENT"
    atime                       = "BEST_EFFORT"
    bytes_per_second  = -1
    posix_permissions  = "PRESERVE"
  }

  includes {
    filter_type = "SIMPLE_PATTERN"
    value        = "/prod/*"
  }

  schedule {
    schedule_expression = "cron(0 2 * * ? *)" # 2am UTC daily
  }

  cloudwatch_log_group_arn = aws_cloudwatch_log_group.datasync.arn
}

For recurring schedules, publish success and error metrics to CloudWatch and add simple alarms. That way you are not guessing if the job ran last night. The AWS DataSync Terraform resources are stable, and the Terraform Registry offers helper modules if you prefer an opinionated wrapper. This pattern shows up frequently in AWS Terraform case studies because it trades manual console tweaks for predictable, reviewable code.

Cross-account transfers and role assumptions

Cross-account transfers mean you need to assume a role in the destination account and grant DataSync the ability to use its KMS key. The provider configuration uses assume_role with an external_id so the source account can deploy the task definitions safely. Your organization can also require that only a central pipeline account deploys cross-account tasks to keep an audit trail clean. These elements are common in AWS Terraform case studies that touch regulated data or multiple business units.


provider "aws" {
  region = var.region
}

provider "aws" {
  alias    = "dst"
  region = var.region

  assume_role {
    role_arn           = "arn:aws:iam::${var.dst_account_id}:role/TerraformDeployRole"
    external_id       = var.org_external_id
    session_name = "tf-datasync"
  }
}

# Destination account KMS key from SSM
data "aws_ssm_parameter" "dst_kms_arn" {
  provider = aws.dst
  name     = "/org/security/kms/workload_key_arn"
}

# Destination account DataSync role
resource "aws_iam_role" "datasync_role_dst" {
  provider                    = aws.dst
  name                        = "DataSyncS3DstRole"
  assume_role_policy = data.aws_iam_policy_document.datasync_assume_role.json
}

# Attach a policy statement to grant DataSync role usage of the KMS key
resource "aws_kms_key_policy" "dst" {
  provider  = aws.dst
  key_id    = data.aws_ssm_parameter.dst_kms_arn.value
  policy     = data.aws_iam_policy_document.kms_policy.json
}

Once the identity plumbing is solid, the rest of the task definition looks like the single-account case. The most common mistake here is forgetting the KMS key policy in the destination account, even if the S3 bucket policy allows the role. Build a Sentinel policy to block any DataSync task that writes to an S3 bucket without a matching KMS key reference, and you will save hours of debugging. That practical safeguard pops up again and again in AWS Terraform case studies because it prevents the most expensive class of failure.

Minimizing configuration drift and visibility

Data pipelines suffer from silent drift – someone updates a task by hand in the console because of a production incident, then the next plan fails or, worse, silently skips a resource. To reduce this pain, tighten IAM boundaries for human roles so updates go through code, and add a scheduled plan job that runs daily with -detailed-exitcode. If the exit code signals drift, post to Slack and create a ticket automatically. HCP Terraform has drift detection as well, which you can enable on key workspaces.

If you need a visual inventory, third-party tools like ControlMonkey and env0 offer drift monitoring and resource catalogs. Use these as complements to your CI plan checks, not replacements. The goal is a short feedback loop when any resource diverges from code so you can either reconcile or adopt the change properly. For a deeper look at inventory and governance at scale, the APN story Using ControlMonkey’s Terraform Platform to Govern Large-scale AWS Environments is a useful companion read.

CI/CD for Terraform on AWS examples

CI/CD for Terraform on AWS follows a predictable flow: format, validate, plan, policy check, and apply with approvals. Your two main branching approaches are GitOps with environment folders or workspace-per-env with variables from a central source. Both work fine; the important part is mapping permissions to accounts and enforcing policy gates consistently. Let’s walk through GitHub Actions patterns that teams actually run in production. You will recognize many of these from AWS Terraform case studies focused on repeatable delivery.

GitHub Actions with OIDC and approvals

GitHub Actions with OIDC removes long-lived AWS keys from secrets and is widely adopted. You configure an IAM role in each target account that trusts the GitHub OIDC provider with repository-bound conditions. The workflow then requests a token and assumes the role, runs Terraform, and posts plans back to the PR. Combine this with required reviews, and you have a decent approval model that developers can live with. This flow is common in AWS Terraform case studies because it scales across many repositories without key sprawl.


name: terraform

on:
  pull_request:
    branches: [ main ]
  push:
    branches: [ main ]

# Prevent overlapping runs on the same branch/PR
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

jobs:
  plan:
    runs-on: ubuntu-latest
    permissions:
      id-token: write   # needed for OIDC
      contents: read    # checkout
    steps:
      - name: Checkout
        uses: actions/checkout@v5

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: 1.13.0

      - name: Configure AWS via OIDC
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/TerraformPlanRole
          aws-region: us-east-1

      - name: Terraform format
        run: terraform fmt -check -recursive

      - name: Terraform init
        run: terraform init -backend=false

      - name: Terraform validate
        run: terraform validate

      - name: Terraform plan
        run: terraform plan -no-color

  apply:
    if: github.ref == 'refs/heads/main'
    needs: [plan]
    runs-on: ubuntu-latest
    environment: production
    permissions:
      id-token: write
      contents: read
    steps:
      - name: Checkout
        uses: actions/checkout@v5

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: 1.13.0

      - name: Configure AWS via OIDC
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/TerraformApplyRole
          aws-region: us-east-1

      - name: Terraform init
        run: terraform init

      - name: Terraform apply
        run: terraform apply -auto-approve -no-color

Tie production applies to a protected environment in GitHub so approvals are required before the job runs. If you work multi-account, add a matrix to apply in sequence based on account order and dependencies. Store Terraform variables in GitHub Environments or pull them from SSM at runtime to keep your code repo clean. Patterns like these show up across AWS Terraform case studies because they reduce friction without relaxing controls.

Integrating HCP Terraform and Sentinel gates

The cleanest pattern is to keep execution in HCP Terraform and treat GitHub Actions or CodeBuild as a trigger. That way, state, run history, plan outputs, and policy decisions live in one place. You can tag workspaces by environment or application and attach Sentinel policy sets to those tags so new workspaces get guardrails automatically. For private networking, use HCP Terraform agents hosted in your pipeline account to run plans inside your VPC. You will find this separation of concerns highlighted in AWS Terraform case studies that prioritize auditability.

Here is a sketch of a GitHub job that defers plan to HCP Terraform. It passes the VCS commit and branch for traceability, and Sentinel evaluates policies without extra CLI steps.


- name: HCP Terraform plan
  id: plan
  uses: hashicorp/hcp-terraform-actions@v1
  with:
    command: plan
    workspace: app-${{ github.ref_name }}
    variables: |
      env=${{ github.ref_name }}
      deployer=${{ github.actor }}

For organizations that prefer an approval outside GitHub, require a policy override in HCP Terraform for high-risk changes like Internet-facing load balancers or public S3 ACLs. Those overrides are auditable and enforce the conversation you want before production impact. This is a small habit with a large payoff in AWS Terraform case studies that must meet strict compliance outcomes.

Cross-account IAM and AWS KMS encryption

Cross-account deployments are where great Terraform patterns either shine or crack. The formula is simple but unforgiving: least-privilege roles per environment, tight trust boundaries with external IDs, and consistent KMS policies that never surprise you during an apply. Instead of inventing ad hoc patterns per team, publish a reference implementation that modules can depend on and make the roles discoverable through SSM. This is table stakes in most AWS Terraform case studies, especially at scale.

Role assumption patterns and trust boundaries

Use a deploy role per account per permission boundary. A plan role might have read-only and describe permissions, while an apply role has full CRUD on resources managed by Terraform. Trust policy conditions should bind to your CI OIDC provider or a centralized pipeline account, not to human users. Pair that with STS session tags so you can attribute who initiated the change from a Git commit. For a broader security backdrop, see this concise checklist of production fundamentals in 26 AWS security best practices to adopt in production.


data "aws_iam_policy_document" "github_oidc_trust" {
  statement {
    actions = ["sts:AssumeRoleWithWebIdentity"]

    principals {
      type          = "Federated"
      identifiers = ["arn:aws:iam::${var.account_id}:oidc-provider/token.actions.githubusercontent.com"]
    }

    # Ensure token audience is sts.amazonaws.com
    condition {
      test         = "StringEquals"
      variable  = "token.actions.githubusercontent.com:aud"
      values    = ["sts.amazonaws.com"]
    }

    # Restrict to a specific repo and branch
    condition {
      test         = "StringEquals"
      variable  = "token.actions.githubusercontent.com:sub"
      values    = ["repo:example/infrastructure:ref:refs/heads/main"]
    }
  }
}

resource "aws_iam_role" "terraform_apply_role" {
  name                       = "TerraformApplyRole"
  assume_role_policy = data.aws_iam_policy_document.github_oidc_trust.json
}

For account-to-account assumptions, require an external_id managed by your platform team. Publish that value to SSM so modules reference it and do not hardcode secrets in code. When you rotate the external_id, pipelines pick it up on the next run without code changes. These little guardrails add up in AWS Terraform case studies where dozens of teams push changes daily.

Encrypt at rest for state and data

State is the crown jewels. Use an S3 backend with a platform KMS key and the native lockfile for state locking (Terraform 1.10+), avoiding a DynamoDB lock table. Keep the bucket private, block public access, and deny unencrypted transport. For data resources like S3 buckets, OpenSearch collections, EBS volumes, and RDS, always reference a KMS key from SSM instead of creating a new one per workload. Centralized keys simplify key rotation and access reviews. This is one of those evergreen lessons you see in nearly all AWS Terraform case studies for good reason.


terraform {
  backend "s3" {
    bucket             = "org-tf-state"
    key                  = "apps/myapp/terraform.tfstate" # set per workspace/env via -backend-config
    region              = "us-east-1"
    encrypt            = true
    use_lockfile     = true     # S3-native locking replaces DynamoDB table
    # kms_key_id  = "arn:aws:kms:us-east-1:123456789012:key/xxxx" # optional
  }
}

In data plane modules, enforce KMS usage with variables that default to the SSM-provided ARN. Your Sentinel policies will act as a last line of defense to catch any explicit opt-out. That keeps the conversation focused on exceptions rather than the default behavior. It is a small discipline, but it compounds quickly across AWS Terraform case studies that span many accounts and regions.

Terraform configurations for AWS KMS usage

Here are recurring snippets for KMS across services. These are straightforward but easy to forget under time pressure, so add them to your module templates. If a team needs to override the key, let them, but require a reason in the PR and a tagged exception that Sentinel can honor conditionally. Leaning on snippets like these is a consistent theme in AWS Terraform case studies because they reduce surprises.


resource "aws_s3_bucket" "data" {
  bucket = "app-data-${var.env}"
}

resource "aws_s3_bucket_server_side_encryption_configuration" "data" {
  bucket = aws_s3_bucket.data.id
  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm           = "aws:kms"
      kms_master_key_id = data.aws_ssm_parameter.kms_arn.value
    }
  }
}

# EBS volume encryption via Launch Template
resource "aws_launch_template" "app" {
  name_prefix = "app-"

  block_device_mappings {
    device_name = "/dev/xvda"
    ebs {
      volume_size                 = 50
      delete_on_termination = true
      encrypted                     = true
      kms_key_id                  = data.aws_ssm_parameter.kms_arn.value
    }
  }
}

# OpenSearch Serverless encryption policy
resource "aws_opensearchserverless_security_policy" "encryption" {
  name  = "collection-kms-${var.env}"
  type    = "encryption"
  policy = jsonencode({
    AWSOwnedKey = false,
    KmsKeyArn        = data.aws_ssm_parameter.kms_arn.value,
    Rules = [{
      ResourceType = "collection",
      Resource         = ["collection/${aws_opensearchserverless_collection.main.name}"]
    }]
  })
}

Aligning on a few such snippets reduces variance and surprises. You will notice the theme – most of the heavy lifting is handled by consistent references to platform-published parameters. That is the hidden lever that many tutorials skip and the reason your production applies stay predictable.

Real-world Terraform AWS state and drift patterns

Terraform’s success in an enterprise has less to do with clever modules and more to do with disciplined state and drift practices. You want state to be boring, locked, and encrypted, you want versioned modules to avoid accidental breakage, and you want drift detection to page you before auditors do. The following patterns summarize what works across teams using real-world terraform aws workflows. These habits sit at the heart of many AWS Terraform case studies because they keep day-2 operations calm.

S3 backend and state locking (Terraform 1.10+)

The gold standard backend on AWS is S3; on Terraform 1.10+ you should enable the native S3 lockfile for state locking, removing the need for a DynamoDB lock table. If you still run older Terraform versions, a DynamoDB lock table remains a valid legacy pattern until you upgrade. Create a state bucket per organization with separate prefixes per app, enforce server-side encryption with a KMS key, enable versioning, and disallow public access. If you go with DynamoDB, the lock table needs a simple hash key named LockID. You can bootstrap the backend with a one-time script, then every repo uses the same backend settings through a shared partial backend file. This is a staple across AWS Terraform case studies and a low-effort win for stability.


resource "aws_s3_bucket" "tf_state" {
  bucket            = "org-tf-state"
  force_destroy = false
}

# Enforce bucket-owner semantics (ACLs disabled by default on new buckets)
resource "aws_s3_bucket_ownership_controls" "tf_state" {
  bucket = aws_s3_bucket.tf_state.id
  rule {
    object_ownership = "BucketOwnerEnforced"
  }
}

# Encrypt state with CMK
resource "aws_s3_bucket_server_side_encryption_configuration" "tf_state" {
  bucket = aws_s3_bucket.tf_state.id
  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm          = "aws:kms"
      kms_master_key_id = data.aws_ssm_parameter.kms_arn.value
    }
  }
}

# Versioning is critical for state recovery
resource "aws_s3_bucket_versioning" "tf_state" {
  bucket = aws_s3_bucket.tf_state.id
  versioning_configuration {
    status = "Enabled"
  }
}

# Require TLS for all requests to the state bucket
data "aws_iam_policy_document" "tf_state_tls_only" {
  statement {
    sid        = "DenyInsecureTransport"
    effect    = "Deny"
    actions = ["s3:*"]
    principals {
      type          = "*"
      identifiers = ["*"]
    }
    resources = [
      aws_s3_bucket.tf_state.arn,
      "${aws_s3_bucket.tf_state.arn}/*"
    ]
    condition {
      test        = "Bool"
      variable = "aws:SecureTransport"
      values   = ["false"]
    }
  }
}

resource "aws_s3_bucket_policy" "tf_state" {
  bucket = aws_s3_bucket.tf_state.id
  policy  = data.aws_iam_policy_document.tf_state_tls_only.json
}

# DynamoDB table for state locking
resource "aws_dynamodb_table" "tf_lock" {
  name            = "org-tf-locks"
  billing_mode = "PAY_PER_REQUEST"
  hash_key      = "LockID"

  attribute {
    name = "LockID"
    type   = "S"
  }
}

Developers sometimes ask why they cannot just use local state for a quick test. The answer is conflict and loss. Local state breaks as soon as two people touch the same stack, and recovery is painful. Shared, locked, encrypted state is the foundation for collaboration, and it costs almost nothing to operate.

Terraform 1.10+: the S3 backend now supports native state locking via an S3 lockfile, so you no longer need a DynamoDB table for locking. Enable it with use_lockfile = true; DynamoDB-based locking is deprecated and scheduled for removal in a future minor release.

Module versioning and naming conventions

Breaking changes in modules are inevitable. Pin versions explicitly and adopt semantic versioning so upgrades are conscious choices. Name resources predictably with a standard prefix that includes app, env, and a short purpose like logs or data. You will thank yourself later when you try to filter costs or apply IAM conditions by resource name. These naming and versioning habits show up again and again in AWS Terraform case studies because they lower cognitive load.


module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "~> 6.0"

  name = "app-${var.env}"
  cidr = "10.20.0.0/16"

  azs                     = ["us-east-1a", "us-east-1b", "us-east-1c"]
  private_subnets = ["10.20.1.0/24","10.20.2.0/24","10.20.3.0/24"]
  public_subnets  = ["10.20.11.0/24","10.20.12.0/24","10.20.13.0/24"]

  # Highly-available NAT for private subnets
  enable_nat_gateway         = true
  one_nat_gateway_per_az = true
}

Drift detection patterns that stick

Drift happens. Maybe a hotfix during an incident, maybe a console experiment that lingered. Build a drift budget into your process by scheduling a daily terraform plan in non-prod and weekly in prod. If the plan returns exit code 2 with -detailed-exitcode, treat it as a finding and triage. You can integrate this with HCP Terraform drift detection to surface resource-level drift even when no code changes land.


name: drift-check

on:
  schedule:
    - cron: "0 7 * * 1-5"  # weekdays 07:00 UTC
  workflow_dispatch:

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

jobs:
  plan-drift:
    runs-on: ubuntu-latest
    permissions:
      id-token: write   # needed for OIDC to AWS
      contents: read
    steps:
      - name: Checkout
        uses: actions/checkout@v5

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: 1.13.0

      - name: Configure AWS via OIDC
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/TerraformPlanRole
          aws-region: us-east-1

      - name: Terraform init
        run: terraform init

      - name: Terraform plan (detect drift)
        id: plan
        run: |
          set +e
          terraform plan -detailed-exitcode -no-color
          code=$?
          echo "exitcode=$code" >> "$GITHUB_OUTPUT"
          exit 0

      - name: Notify on drift
        if: ${{ steps.plan.outputs.exitcode == '2' }}
        run: ./scripts/notify_slack.sh

For higher signal, scope drift checks to critical workspaces like networking, identity, and shared data stores. The more people touch them, the more likely drift occurs. Add a Sentinel policy that hard fails if a resource moved across regions or if an S3 ACL becomes public without an accompanying exception tag. These simple checks catch the majority of risky changes before they explode into tickets.

For day-two rigor drift alerts and steady-state KPIs, take a look at our AWS & DevOps re:Maintain and turn these checks into repeatable runbooks.

Everything ties back to those enterprise guardrails. When you reuse LZA or Control Tower SSM parameters for KMS, VPC, and shared services, you make stacks portable and safe by default. When Sentinel sits in your CI, developers get fast feedback instead of surprises during a midnight incident. These practical cases of aws terraform implementations are less about fancy code and more about predictable plumbing – and that is what production needs.

To round things off, keep a living runbook that shows terraform on aws examples for your common stacks: Bedrock RAG, DataSync pipelines, a basic VPC plus ALB service, and an analytics bucket with Glue catalog. Reference the modules and the policy sets, not just screenshots. Teams can then replicate real-world terraform aws deployments consistently across accounts and regions, without relearning the same lessons each quarter. And yes, keep the jokes in the comments – you will need them during plan review days.

For documentation, lean on the official AWS service guides and Terraform docs you referenced earlier in this article. Keep those links in your team runbook so they are easy to find. As services evolve, refresh your modules and policies to match current provider capabilities.

Conclusion

Real production work starts by wiring Terraform into the guardrails you already own. Treat LZA or Control Tower as the baseline, read shared IDs from SSM, and pass them through your modules so encryption, networking, and logging are inherited by default. Pair cross-account roles and consistent KMS policies with Sentinel checks in HCP Terraform to stop risky plans early. With that foundation, AWS Terraform case studies become repeatable building blocks you can roll out across teams with less ceremony.

Contact us if you want a second set of eyes on your first implementation, and we will help you ship with confidence. The best time to standardize was yesterday; the next best time is your very next pull request.

Share :
About the Author

Petar is the visionary behind Cloud Solutions. He’s passionate about building scalable AWS Cloud architectures and automating workflows that help startups move faster, stay secure, and scale with confidence.

Mastering AWS Cost Management For Startups - featured image

Mastering AWS Cost Management For Startups

Understanding AWS SOC Compliance - featured image

Understanding AWS SOC Compliance

Building A Cost-Effective AWS Architecture: Practical Guide - featured image

Building A Cost-Effective AWS Architecture: Practical Guide