TL;DR
Terraform is declarative but not DRY. The moment you manage a second AWS account, you’re copy-pasting backend.tf and provider.tf into every environment directory. Terragrunt solves this with a single root.hcl that every module inherits, a built-in dependency graph, per-module state isolation, and run --all fan-out. The result is a live repo where the filesystem layout is your blast-radius policy — no workspaces, no count = var.env == "prod" ? 1 : 0.
Three accounts, two regions, and a shared backend.tf
The project had three AWS accounts — management, dev, production — spread across eu-west-1 and us-east-1. That’s six distinct backend configurations before we’d written a single module. The first engineer to set it up copy-pasted. The second engineer updated one copy. Three months later, the prod state bucket key had a typo nobody noticed until a plan came back empty.
I’ve seen this exact failure mode on at least four separate teams. It’s not a discipline problem. It’s a structural one: Terraform gives you the declarative model but leaves the scaffolding repetition to you.
That’s the gap Terragrunt plugs.

The four things Terragrunt actually adds
Terragrunt is a thin wrapper around Terraform (and OpenTofu) that adds exactly four capabilities:
1. DRY configuration inheritance via root.hcl
Define your remote state and provider once. Every child terragrunt.hcl inherits it via include:
# root.hcl — lives at the repo root, inherited by every module
locals {
region = read_terragrunt_config(find_in_parent_folders("region.hcl"))
account = read_terragrunt_config(find_in_parent_folders("account.hcl"))
}
remote_state {
backend = "s3"
generate = {
path = "backend.tf"
if_exists = "overwrite_terragrunt"
}
config = {
bucket = "demo-terraform-state"
key = "${path_relative_to_include()}/terraform.tfstate"
region = "eu-west-1"
encrypt = true
}
}
generate "provider" {
path = "provider.tf"
if_exists = "overwrite_terragrunt"
contents = <<EOF
provider "aws" {
region = "${local.region.locals.aws_region}"
allowed_account_ids = ["${local.account.locals.account_id}"]
assume_role {
role_arn = "arn:aws:iam::${local.account.locals.account_id}:role/TerraformAutomationRole"
}
}
EOF
}
The allowed_account_ids line is worth calling out: if the provider somehow targets the wrong account, Terraform aborts before touching anything. One line of config, significant blast-radius protection.
2. Per-module state isolation
path_relative_to_include() returns the path of the current directory relative to the root config. For dev/eu-west-1/dev/vpc/, that’s dev/eu-west-1/dev/vpc — a unique S3 key. No module ever shares state with another. You can run terragrunt plan on the VPC without loading the EKS state.
3. Explicit dependency graph
# dev/eu-west-1/dev/eks/terragrunt.hcl
dependency "vpc" {
config_path = "../vpc"
# Mock outputs let plan run even before vpc exists
mock_outputs_allowed_terraform_commands = ["validate", "plan"]
mock_outputs = {
vpc_id = "vpc-00000000"
private_subnet_ids = ["subnet-00000000", "subnet-11111111"]
}
}
inputs = {
vpc_id = dependency.vpc.outputs.vpc_id
subnet_ids = dependency.vpc.outputs.private_subnet_ids
}
Terragrunt resolves the DAG and runs modules in the correct order. The mock_outputs block means you can plan EKS on a greenfield environment before the VPC exists — useful for previewing what a full environment will look like.
4. run --all fan-out
# Plan every module in the dev/eu-west-1/dev subtree, in dependency order
terragrunt run --all plan --working-dir dev/eu-west-1/dev
One command fans out across all modules in parallel (respecting deps), aggregates output, and reports. Apply works the same way. This is how you apply an entire environment without a 400-line pipeline file.
ℹ️ CLI syntax changed in Terragrunt v0.54 (September 2023)
The
run-allsubcommand was replaced byrun --allas part of the CLI redesign. Older posts and docs may showterragrunt run-all plan— this still works with--experiment cli-redesignon v0.48–v0.53, and was made the default in v0.54. This series uses the current syntax throughout. Pin>= 0.54in your CI image.

The built-in navigation functions
Terragrunt’s power comes partly from its filesystem functions. The three you’ll use constantly:
| Function | What it does | Typical use |
|---|---|---|
find_in_parent_folders("region.hcl") | Walks up the tree until it finds the file | Loading per-level config in root.hcl |
path_relative_to_include() | Returns current dir path relative to the include root | Generating unique S3 state keys |
get_terragrunt_dir() | Absolute path of the current terragrunt.hcl | Deriving account_alias = basename(get_terragrunt_dir()) |
get_repo_root() | Absolute path of the git repo root | Referencing shared scripts from any depth |
read_terragrunt_config(path) | Parses another HCL file and returns its locals | Assembling the full config from parent files |
These functions are what make the account/region/env hierarchy feel seamless rather than like a folder convention you have to manually maintain. The filesystem is the config.
Why not workspaces?
Workspaces work for a single account with two structurally identical environments. The moment you have different provider configs per account (different role ARNs, different allowed_account_ids), environments that drift in resource composition, or want to run a plan on only one environment without loading others — workspaces become friction. You end up writing count = var.workspace == "prod" ? 1 : 0 everywhere, and the provider block becomes a lookup table.
The Terragrunt directory-per-environment model makes each environment a first-class citizen. The CI pipeline (covered in Parts 4 and 5) can inspect which .hcl files changed and derive exactly what to plan or apply from the directory path — no external state, no manifest file.
What it costs
Terragrunt isn’t free. It adds a binary, a layer of HCL abstraction, and a debugging surface. When root.hcl has a bug, every module breaks. When Terragrunt updates its CLI (as it did in v0.54), you need to update your pipeline images.
If your team is small and runs a single AWS account, vanilla Terraform with a good module structure is probably enough. The overhead pays off when you have multiple accounts, multiple regions, and a CI pipeline that needs to surgically plan just the changed environment.
Coming up next
Part 2 covers the module supply chain: how to version, scan, and distribute Terraform modules using semantic-release — so consumers can pin to a git tag and get Dependabot PRs when new versions drop.
Reference repo: hagzag/tf-modules (public, versioned module library) and tf-live (the companion live repo for this series).
Series Navigation → Next: Part 2 — Terraform Modules: Versioning, Scanning, and Distribution
Discussion