FedRAMP, From the Platform Side — Part 4: Drawing the Boundary

TL;DR

The FedRAMP authorization boundary is the single most consequential decision in your entire program, and the one I see teams handle worst. It defines what’s in scope, what’s inherited from a leveraged provider like AWS GovCloud, and what’s an external system you depend on but don’t authorize. Get the boundary right and every subsequent control narrative writes itself. Get it wrong and you’ll spend the next year fighting to include or exclude systems that should never have been part of the conversation. This post is a platform engineer’s walkthrough of how to draw the line, what 20x changes about how the boundary is expressed, and the traps I’d warn anyone away from.

Introduction

Part 3 ended on the line “the perimeter has to be drawn precisely, and everything inside it has to be FIPS-clean.” I owe you the post about how that perimeter actually gets drawn. The boundary is also the prerequisite for everything I want to write next — OSCAL submissions, Kubernetes-in-FedRAMP, IaC as continuous evidence — because every one of those conversations starts with “well, which systems are we talking about?”

The boundary is where I see the most expensive mistakes get made. Not because it’s technically hard — it isn’t — but because teams treat it as a Visio exercise rather than the strategic platform decision it actually is. The boundary is the line between “I owe FedRAMP evidence for this” and “I don’t.” Move the line one box to the left and you’ve added six months of work. Move it one box to the right and your auditor will tell you the boundary doesn’t make sense.

So let’s draw it carefully.

What the Boundary Actually Is

The authorization boundary is the set of system components — the people, processes, data, and technology — that you are seeking authorization for. It’s a defined scope, formalized in your SSP, that the 3PAO assesses against and the authorizing official issues the ATO over.

A defensible boundary has three properties:

It’s complete. Every component required to deliver the service to the federal customer is either inside the boundary or explicitly accounted for as a leveraged or external system.
It’s minimal. Nothing is in the boundary that doesn’t need to be. Each component inside the boundary is a control-implementation burden you carry forever.
It’s coherent. A boundary that includes “the production API” but excludes “the database the production API reads from” isn’t defensible. The boundary needs to make architectural sense.

Most boundary mistakes I’ve seen come from one of those three failing. Incomplete boundaries get flagged by the 3PAO and force a re-scope mid-engagement. Over-broad boundaries quietly suck corporate IT into FedRAMP scope and double the cost of the program. Incoherent boundaries fail the basic sniff test of “would another engineer agree this is the system?”

The Three Concentric Rings

When I draw a boundary on a whiteboard, I think in three concentric rings — what’s inside the boundary, what’s leveraged, and what’s external. Most teams get the inside right and the other two rings wrong.

Ring 1 — Inside the boundary. This is the system you are authorizing. Your production application, the workloads that process federal data, the databases that store it, the queues and caches that move it around, the build and deploy pipeline that produces the artifacts running in production, the IAM and audit-log infrastructure that governs all of the above. The customer’s data enters here and lives inside this perimeter.

Ring 2 — Leveraged authorizations. This is the cloud platform (or platforms) underneath your system that already carries its own FedRAMP authorization. AWS GovCloud, AWS Commercial (with FedRAMP High), Azure Government, GCP Government — each of these is a leveraged provider whose ATO you inherit. You don’t re-authorize EC2; you inherit AWS’s control implementations for the IaaS layer. What you have to do is articulate, in your SSP, which leveraged provider you’re sitting on and which of your own controls inherit from theirs. This is the single biggest leverage move in the entire program — if you didn’t get it, FedRAMP would be impossible to do as anything smaller than Amazon.

Ring 3 — External / interconnected systems. This is where it gets interesting. External systems are services your boundary talks to but doesn’t own. Examples: an identity provider (Okta, Azure Entra, Auth0), an email provider (SendGrid, SES used as an external service), a monitoring SaaS (Datadog, New Relic), a payments provider, a customer-support platform. Each external system is treated according to its own posture — some carry their own FedRAMP authorization (which you note as a separate leveraged authorization), some don’t (which means you have to either swap them out, restrict the data flow, or carry the risk in your POA&M).

The trap I keep seeing: teams don’t realize that a SaaS dependency is an external system in FedRAMP language. They list “Datadog” as an internal tool because it lives in their SaaS bill. The auditor reads the SSP, sees the API calls leaving the production boundary, and asks “what’s the authorization status of the system you’re sending operational data to?” If that conversation hasn’t been pre-empted, it derails the engagement.

Where to Draw the Line — Practical Patterns

A few patterns I keep returning to when sizing the boundary:

The data flow drives the boundary, not the org chart. Trace where federal customer data enters, where it lives, what processes it, and where it leaves. That trace is your boundary. If a microservice never touches federal data, it doesn’t belong in scope just because it’s in the same Kubernetes cluster as services that do. Conversely, a service that does touch the data belongs in scope even if it sits in a different account from the rest of the platform.

Isolation patterns earn their keep here. A dedicated AWS account or Kubernetes namespace for the FedRAMP workload is one of the highest-leverage architectural decisions you can make. The boundary becomes “this account, this VPC, this cluster” — a line a fifth-grader could draw. Trying to authorize a shared multi-tenant cluster where federal workloads run alongside commercial ones is technically possible but the SSP narrative becomes an essay every time.

Build pipelines are usually in. This catches teams off guard. If your CI/CD pipeline produces the artifacts that run inside the boundary, the pipeline itself is part of the boundary — your CI runners, your registry, your signing infrastructure. SI-7 and SA-12 evidence requires it. You don’t get to authorize the production cluster and leave GitHub Actions out of scope, because GitHub Actions is the supply chain. (This is where the Building for Compliance series gets very relevant — that pipeline hardening work directly serves the boundary scoping decision.)

Corporate IT stays out. Your laptops, your email, your corporate SSO, your HR system — these are emphatically not in the FedRAMP boundary. They’re external to the system that processes federal data. The mistake some teams make is letting the boundary creep into corporate IT because “well, the engineers use their laptops to access production.” That access is governed by the boundary’s IA controls (specifically, how privileged access is authenticated and audited from outside the boundary), not by pulling corporate IT into the boundary itself.

Observability is a judgment call. If your logs, metrics, and traces flow to a SaaS provider — Datadog, New Relic, an external Grafana Cloud — that provider is an external system. If your observability stack runs inside the boundary (self-hosted Prometheus, Loki, Tempo, Grafana, all inside your authorized account), it’s in scope. There are good arguments for both. Self-hosted means you carry more control burden but a tighter boundary; SaaS means a thinner boundary but a dependency you have to document and risk-assess. I’ve seen both work.

Leveraged Authorizations — The Cloud Choice

The single biggest cost/timeline lever in your boundary decision is which leveraged authorization you sit on. The shortlist for SaaS teams today:

AWS GovCloud (US-East / US-West). Two regions, FedRAMP High and DoD IL5 authorized, isolated from AWS commercial. Account creation requires confirming US-person operation (which is a real constraint for non-US teams), but operating in GovCloud is structurally the cleanest path. Every API call lands on FIPS-validated endpoints by default; the leveraged authorization is High; the boundary inherits the most.

AWS Commercial with FedRAMP Moderate / High inheritance. Many AWS commercial regions carry FedRAMP authorizations, and you can build a Moderate or High system on AWS commercial. The trade-off is that you have to opt into FIPS endpoints explicitly, you have to be deliberate about which services are in your boundary (not every AWS service is FedRAMP-authorized in commercial regions), and the data sovereignty story is messier.

Azure Government / GCP Government. Same logic as AWS GovCloud — isolated cloud, US-person operation, FedRAMP-and-higher authorized. Pick based on which platform your application is already built on; switching clouds for FedRAMP alone is rarely the right move.

For non-US teams, the GovCloud / Azure Gov / GCP Gov question is also a workforce question. US-person operation isn’t a paperwork formality — it’s a real constraint that affects how you staff the on-call rotation for the FedRAMP workload. Some teams set up a separate US-staffed operations subsidiary; some partner with a US-based managed-services firm; some accept the constraint and hire accordingly. None of these are wrong, but they all need to be planned, not improvised.

What 20x Changes About the Boundary

Under legacy FedRAMP, the boundary was expressed as a diagram in a Word-format SSP — a labeled Visio with components, data flows, and trust zones. Auditors read it, asked questions, marked it up, and the diagram-plus-narrative was the boundary.

Under 20x, the boundary is expressed as OSCAL components — a structured, machine-readable representation where each system component has defined properties (type, function, status, leveraged or owned), each information flow is a typed edge between components, and each control implementation references the components it applies to. The boundary becomes a graph, not a picture.

That shift sounds bureaucratic but has real consequences for how you build:

The boundary becomes versionable. Component changes go through Git, just like code. Adding a service to the boundary is a pull request that updates the OSCAL components file.
The boundary becomes diffable. A 3PAO can diff your boundary between assessments and see exactly what changed, rather than re-reading a 600-page document looking for the new paragraph.
The boundary becomes queryable. “Show me every component inside the boundary that handles cryptographic operations” is a graph query, not a search through a PDF.
The boundary stays accurate. If your IaC generates OSCAL components as a byproduct of the deploy, your boundary documentation cannot drift from reality — because reality emits the documentation.

The last point is the killer feature for platform engineers. Most boundary drift in legacy programs comes from a system architecture change happening faster than the SSP being updated. If the boundary is a data structure generated from the same source-of-truth your platform deploys from, the drift problem dissolves. This is the part of 20x I find most exciting, and it’s the topic of a later post.

Common Boundary Traps

A short list of the boundary mistakes I’ve seen cause the most pain. None of these are exotic. All of them are avoidable.

Trap 1 — Treating the boundary as a documentation exercise. The boundary is an architectural decision, not a writeup. If you assign it to a GRC analyst to “document” rather than treating it as a platform engineering design problem, you’ll end up with a boundary that doesn’t match how the system actually operates.

Trap 2 — Pulling corporate IT into scope by accident. As discussed above. The fix is to draw the privileged-access path (engineer’s laptop → SSO → bastion → boundary) and treat the laptop and SSO as external systems whose interactions are governed by IA controls, not as boundary components.

Trap 3 — Forgetting the build pipeline. Authorizing the production cluster but leaving CI/CD out of the boundary. Auditors catch this every time. The fix is to scope the pipeline into the boundary from day one, with the supply-chain hardening from Building for Compliance already in place.

Trap 4 — Vague SaaS dependencies. “We use Datadog” written into the SSP without specifying the data flow, the data classification, and Datadog’s own authorization status. Each external SaaS needs an interconnection table entry that names what data crosses the line and the provider’s authorization status.

Trap 5 — Boundary creep from “in case we need it later.” Engineers including services in the boundary because “we might use this for the federal workload eventually.” Every service in the boundary is a control burden forever. If you’re not using it today, leave it out. You can add it later via the standard change-management path.

Trap 6 — Multi-tenant ambiguity. Running federal customers alongside commercial ones in the same cluster, same database, same caches. Possible to authorize, but the SSP has to explain in painful detail how the tenant isolation is enforced. The cleaner pattern, almost always, is a dedicated environment for the federal workload.

What I’d Tell My Past Self

The boundary is the first thing I’d insist on getting right. Earlier than the SSP, earlier than the 3PAO selection, earlier than the control mapping. The boundary determines everything downstream — the cost, the timeline, the operational overhead, the architectural decisions that will haunt you for years.

The reframe that helped me: the boundary isn’t a line you draw around your existing system. It’s a line you choose where to put, and then make the system match. You have real agency here. You can refactor toward a tighter boundary — split out the federal workload into its own account, swap a non-FedRAMP SaaS for a FedRAMP-authorized one, move observability inside the perimeter or push it outside — and every one of those refactors pays back through the entire authorization lifecycle.

The teams that do this well treat boundary refactoring as a Q-by-Q platform initiative before they start the formal FedRAMP engagement. The teams that do it badly try to draw the boundary around whatever the system happens to look like when the federal opportunity lands, and spend the next 18 months apologizing for that drawing.

Conclusion

That’s the boundary, from a platform engineering point of view. Three rings, a data-flow-driven scoping discipline, a leveraged authorization that does most of the heavy lifting, and a 20x shift that turns the boundary from a diagram into a graph.

The series continues. Now that we have a boundary, the next thing worth deep-diving — IMO — is what the OSCAL submission actually looks like and what your evidence pipeline emits to populate it. That’s where the “platform engineering as compliance” story gets very concrete. There’s also the Kubernetes-in-the-boundary conversation waiting, which I expect will be a meaty post in its own right.

If you’re sizing a boundary right now and hitting a question this post didn’t answer, I’d love to hear it — the series gets sharper when the conversation gets specific.

FedRAMP, From the Platform Side — Part 4: Drawing the Boundary

TL;DR

Introduction

What the Boundary Actually Is

The Three Concentric Rings

Where to Draw the Line — Practical Patterns

Leveraged Authorizations — The Cloud Choice

What 20x Changes About the Boundary

Common Boundary Traps

What I’d Tell My Past Self

Conclusion

Further Reading

Discussion

TL;DR

Introduction

What the Boundary Actually Is

The Three Concentric Rings

Where to Draw the Line — Practical Patterns

Leveraged Authorizations — The Cloud Choice

What 20x Changes About the Boundary

Common Boundary Traps

What I’d Tell My Past Self

Conclusion

Further Reading

Related Posts on This Site

Discussion