SOC 2 for ISVs, Part 4: Continuous Compliance — Making SOC 2 a Byproduct, Not a Project

TL;DR

There are two ways to do SOC 2: as a project (panic for 8 weeks before audit, screenshot everything, never want to do it again) or as a byproduct (compliance signal flows out of how you already build, and the audit becomes mostly an export job). The second only works if you wire policy-as-code, drift detection, and evidence collection into your platform up front. This post is about that wiring.

Introduction

In Part 3 we mapped controls to the AWS/GCP/Kubernetes stack. That gets you to a successful first audit. But there’s a quiet trap waiting on the other side of that audit: the controls that worked because everyone was paying attention will quietly degrade once everyone goes back to shipping features.

Six months later, you’re staring down the Type II observation window, scrambling to figure out why three production IAM users magically appeared, why the staging cluster’s audit logs stopped flowing two months ago, and why your access review evidence has a gap in March. This is the difference between point-in-time compliance and continuous compliance — the latter is the only kind worth pursuing.

The shift: enforcement vs. evidence

Most teams start by thinking about evidence — “how do I prove the control worked?” That’s the auditor’s view. The better engineering view is: “how do I prevent the control from being violated in the first place, and capture the prevention as evidence automatically?”

That shift — from detective to preventive controls, from manual evidence collection to automated capture — is the entire continuous-compliance playbook. It rests on three pillars:

Policy-as-code — controls expressed as machine-checkable rules
Drift detection — knowing when reality diverges from declared state
Evidence automation — turning normal system activity into auditor-ready artifacts

1. Policy-as-code

The core idea: stop writing policies as Word documents that nobody reads, and start writing them as code that runs every time something changes.

OPA / Rego is the lingua franca here. You express controls as Rego rules and evaluate them at three useful points:

In CI — block PRs that violate policy before they merge. Terraform plan output goes into conftest; Kubernetes manifests go into kube-conform or similar.
At admission — Kubernetes admission controllers (Gatekeeper, Kyverno) reject non-compliant resources before they hit the cluster.
At runtime — periodic scans of live state to catch anything that slipped through.

Concrete examples I’ve used in client engagements:

No public S3 buckets — SCP denies the API call at the org level (the unbreakable layer), Rego rule on Terraform plans catches it in CI, AWS Config rule flags drift in live state.
No containers running as root — Gatekeeper constraint blocking the deployment, period.
All EKS/GKE clusters must have audit logging enabled — Terraform module enforces it, OPA verifies the IaC, runtime scan confirms.
No long-lived IAM access keys — IAM Access Analyzer + custom Rego on Terraform.

The big shift: when an auditor asks “how do you prevent unauthorized public buckets?”, the answer isn’t “we have a policy document.” It’s “here’s the Rego rule, here’s the CI run that blocked the most recent attempt, here’s the Gatekeeper denial log.” That’s enforcement plus evidence in one motion.

Cloud-account guardrails: SCPs and Org Policies

OPA and Gatekeeper catch violations at the IaC and Kubernetes layers — but they can be bypassed by anyone with console access and bad judgment. The strongest preventive control sits one level higher: at the cloud account/organization boundary, where even an account root user can’t override it.

On AWS, that’s Service Control Policies (SCPs) applied at the AWS Organizations level. SCPs are guardrails that deny actions across every account in an OU, regardless of the user’s IAM permissions. A few SCPs every ISV should have on day one:

Deny s3:PutBucketPublicAccessBlock removal and deny public ACLs/policies — makes a public S3 bucket structurally impossible, not just discouraged. This is the SOC 2 control auditors most love to test.
Deny disabling CloudTrail, GuardDuty, Config, or Security Hub — protects your evidence pipeline from being silently turned off.
Deny IAM user creation in production accounts — forces everyone through SSO + IRSA, with no escape hatch.
Region restrictions — deny actions outside approved regions to prevent shadow infrastructure (and unintended data residency violations).
Deny root user actions except break-glass — root should be unusable for day-to-day work.

On GCP, the equivalent is Organization Policies (constraints applied at the org/folder/project level): storage.publicAccessPrevention, iam.disableServiceAccountKeyCreation, compute.requireOsLogin, gcp.resourceLocations, and so on. Same idea, different mechanism.

SCP and org policies

Why this matters for SOC 2: when the auditor asks “what prevents a developer from making a bucket public during an incident?”, the OPA answer is good, but “the SCP denies it at the API call regardless of who’s logged in” is unbeatable. CloudTrail logs the denied API call, which is itself the evidence. One control. One log line. Zero ambiguity.

The trap to avoid: SCPs and Org Policies are powerful enough that a misconfigured one can break production. Roll them out incrementally — start in non-prod OUs, observe the deny logs (use Audit mode where available), then promote. Treat the SCPs themselves as IaC: Terraform-managed, PR-reviewed, and version-controlled like any other policy-as-code artifact.

2. Drift detection

Even with strong preventive controls, drift happens. Someone clicks in the console under pressure during an incident. A break-glass process runs and never gets reverted. A managed service quietly changes its defaults. Drift detection is how you find these before the auditor does.

The patterns that actually work:

Terraform drift detection — scheduled terraform plan against production with alerts on non-empty diffs. Atlantis, Terraform Cloud, and Spacelift all do this; rolling your own with GitHub Actions is fine for smaller estates.
GitOps reconciliation logs — ArgoCD’s “OutOfSync” status is a drift signal. Alert on prolonged out-of-sync states.
CSPM continuous scans — AWS Config, GCP Security Command Center, or third-party CSPM running 24/7 against your live cloud accounts.
Identity drift — periodic scans of effective IAM permissions vs. declared permissions, looking for direct attachments outside your IaC.

The medical-ISV engagement I keep referencing is a clean example: across two clouds with a non-trivial number of accounts and projects, drift was inevitable. We landed on a weekly drift report — generated automatically, reviewed by the platform team, with any non-zero findings becoming tickets. The report itself became audit evidence. The control wasn’t “we don’t drift” (impossible); it was “we detect drift weekly and remediate within SLA.”

3. Evidence automation

This is where most ISVs lose months of engineering time during their first audit. Don’t repeat that mistake.

The principle: every routine activity in your platform that is itself evidence of a control operating should be captured automatically and shipped to a durable store. Stop screenshotting after the fact.

What that looks like in practice:

Access reviews — pull IAM state from AWS/GCP via API on a schedule, generate a CSV, route to managers for sign-off via a workflow tool (Slack approval, ServiceNow, even a GitHub PR), archive the signed result.
Backup restoration tests — automate quarterly restore tests via a CronJob or scheduled pipeline, log the success/failure, archive the run output.
Vulnerability remediation tracking — every scanner finding becomes a ticket with severity, SLA, and resolution evidence linked to a commit or change record.
Onboarding/offboarding — tie the HR system to identity provisioning so adding/removing access is automated; the audit log of those automated actions is the evidence.
Change management — Git history + PR approvals + deployment logs already document every production change. Make sure they’re retained for the observation window.

Compliance platforms like the major SaaS players in this space help by integrating with your cloud accounts, ticketing system, HR system, and code host, then continuously pulling evidence into a structured store. They’re worth the money for most ISVs — building this glue yourself eats 3-6 months of platform-team capacity that’s better spent on product.

Where humans still belong

A common over-correction: trying to automate every single control. Don’t. Some controls are inherently human and trying to automate them produces fake compliance.

Risk assessments need humans thinking about the business, not a script.
Incident retrospectives need actual analysis, not a template filler.
Vendor reviews need someone reading the vendor’s SOC 2 and forming a view.
Tabletop exercises need humans simulating the chaos.

What you can automate around these is the scheduling, tracking, and artifact storage — make sure they happen on cadence, the output gets captured, and nothing falls off the calendar. The thinking stays human.

What “continuous” actually feels like

When this is wired up well, the pre-audit experience changes dramatically. Instead of an 8-week scramble:

The compliance platform has been collecting evidence the whole time
Drift reports are already in the audit folder
Policy-as-code denial logs document every prevention
Access reviews are pre-signed and archived
Vulnerability tickets show clean SLA tracking

The auditor walkthrough becomes a guided tour of systems that already exist, not a frantic show-and-tell. The Type II observation window becomes a non-event because the controls have been operating — and producing evidence — every day for the last 6 to 12 months.

This is the shift that makes SOC 2 sustainable as you grow. Year 1 is hard. Year 2, if you’ve built the continuous foundation, is genuinely lighter.

Conclusion

SOC 2 done well is a forcing function for the platform you should already be building: declarative, policy-enforced, GitOps-driven, observable, and self-documenting. The audit is just the moment you export that platform’s natural state and hand it to a CPA.

In Part 5 — the final post in the series — we’ll cover the audit itself: choosing an auditor, what the readiness assessment looks like, common findings, surviving the Type II observation period, and how SOC 2 becomes the foundation for ISO 27001, HIPAA, or FedRAMP later.