TL;DR

Terraform is declarative but not DRY. The moment you manage a second AWS account, you’re copy-pasting backend.tf and provider.tf into every environment directory. Terragrunt solves this with a single root.hcl that every module inherits, a built-in dependency graph, per-module state isolation, and run --all fan-out. The result is a live repo where the filesystem layout is your blast-radius policy — no workspaces, no count = var.env == "prod" ? 1 : 0.

Three accounts, two regions, and a shared backend.tf

The project had three AWS accounts — management, dev, production — spread across eu-west-1 and us-east-1. That’s six distinct backend configurations before we’d written a single module. The first engineer to set it up copy-pasted. The second engineer updated one copy. Three months later, the prod state bucket key had a typo nobody noticed until a plan came back empty.

I’ve seen this exact failure mode on at least four separate teams. It’s not a discipline problem. It’s a structural one: Terraform gives you the declarative model but leaves the scaffolding repetition to you.

That’s the gap Terragrunt plugs.

The four things Terragrunt actually adds

Terragrunt is a thin wrapper around Terraform (and OpenTofu) that adds exactly four capabilities:

1. DRY configuration inheritance via root.hcl

Define your remote state and provider once. Every child terragrunt.hcl inherits it via include:

# root.hcl — lives at the repo root, inherited by every module
locals {
  region  = read_terragrunt_config(find_in_parent_folders("region.hcl"))
  account = read_terragrunt_config(find_in_parent_folders("account.hcl"))
}

remote_state {
  backend = "s3"
  generate = {
    path      = "backend.tf"
    if_exists = "overwrite_terragrunt"
  }
  config = {
    bucket  = "demo-terraform-state"
    key     = "${path_relative_to_include()}/terraform.tfstate"
    region  = "eu-west-1"
    encrypt = true
  }
}

generate "provider" {
  path      = "provider.tf"
  if_exists = "overwrite_terragrunt"
  contents  = <<EOF
provider "aws" {
  region              = "${local.region.locals.aws_region}"
  allowed_account_ids = ["${local.account.locals.account_id}"]
  assume_role {
    role_arn = "arn:aws:iam::${local.account.locals.account_id}:role/TerraformAutomationRole"
  }
}
EOF
}

The allowed_account_ids line is worth calling out: if the provider somehow targets the wrong account, Terraform aborts before touching anything. One line of config, significant blast-radius protection.

2. Per-module state isolation

path_relative_to_include() returns the path of the current directory relative to the root config. For dev/eu-west-1/dev/vpc/, that’s dev/eu-west-1/dev/vpc — a unique S3 key. No module ever shares state with another. You can run terragrunt plan on the VPC without loading the EKS state.

3. Explicit dependency graph

# dev/eu-west-1/dev/eks/terragrunt.hcl
dependency "vpc" {
  config_path = "../vpc"
  # Mock outputs let plan run even before vpc exists
  mock_outputs_allowed_terraform_commands = ["validate", "plan"]
  mock_outputs = {
    vpc_id             = "vpc-00000000"
    private_subnet_ids = ["subnet-00000000", "subnet-11111111"]
  }
}

inputs = {
  vpc_id     = dependency.vpc.outputs.vpc_id
  subnet_ids = dependency.vpc.outputs.private_subnet_ids
}

Terragrunt resolves the DAG and runs modules in the correct order. The mock_outputs block means you can plan EKS on a greenfield environment before the VPC exists — useful for previewing what a full environment will look like.

4. run --all fan-out

# Plan every module in the dev/eu-west-1/dev subtree, in dependency order
terragrunt run --all plan --working-dir dev/eu-west-1/dev

One command fans out across all modules in parallel (respecting deps), aggregates output, and reports. Apply works the same way. This is how you apply an entire environment without a 400-line pipeline file.

ℹ️ CLI syntax changed in Terragrunt v0.54 (September 2023)

The run-all subcommand was replaced by run --all as part of the CLI redesign. Older posts and docs may show terragrunt run-all plan — this still works with --experiment cli-redesign on v0.48–v0.53, and was made the default in v0.54. This series uses the current syntax throughout. Pin >= 0.54 in your CI image.

The built-in navigation functions

Terragrunt’s power comes partly from its filesystem functions. The three you’ll use constantly:

FunctionWhat it doesTypical use
find_in_parent_folders("region.hcl")Walks up the tree until it finds the fileLoading per-level config in root.hcl
path_relative_to_include()Returns current dir path relative to the include rootGenerating unique S3 state keys
get_terragrunt_dir()Absolute path of the current terragrunt.hclDeriving account_alias = basename(get_terragrunt_dir())
get_repo_root()Absolute path of the git repo rootReferencing shared scripts from any depth
read_terragrunt_config(path)Parses another HCL file and returns its localsAssembling the full config from parent files

These functions are what make the account/region/env hierarchy feel seamless rather than like a folder convention you have to manually maintain. The filesystem is the config.

Why not workspaces?

Workspaces work for a single account with two structurally identical environments. The moment you have different provider configs per account (different role ARNs, different allowed_account_ids), environments that drift in resource composition, or want to run a plan on only one environment without loading others — workspaces become friction. You end up writing count = var.workspace == "prod" ? 1 : 0 everywhere, and the provider block becomes a lookup table.

The Terragrunt directory-per-environment model makes each environment a first-class citizen. The CI pipeline (covered in Parts 4 and 5) can inspect which .hcl files changed and derive exactly what to plan or apply from the directory path — no external state, no manifest file.

What it costs

Terragrunt isn’t free. It adds a binary, a layer of HCL abstraction, and a debugging surface. When root.hcl has a bug, every module breaks. When Terragrunt updates its CLI (as it did in v0.54), you need to update your pipeline images.

If your team is small and runs a single AWS account, vanilla Terraform with a good module structure is probably enough. The overhead pays off when you have multiple accounts, multiple regions, and a CI pipeline that needs to surgically plan just the changed environment.

Coming up next

Part 2 covers the module supply chain: how to version, scan, and distribute Terraform modules using semantic-release — so consumers can pin to a git tag and get Dependabot PRs when new versions drop.

Reference repo: hagzag/tf-modules (public, versioned module library) and tf-live (the companion live repo for this series).


Series Navigation → Next: Part 2 — Terraform Modules: Versioning, Scanning, and Distribution