Drift Detection - Starter Kit

Overview

Drift happens. Someone edits a security group in the AWS console, an engineer scales a deployment manually, a failed terraform apply leaves state out of sync. Annie continuously correlates your declared infrastructure (Terraform code + state) with your actual infrastructure (live cloud, live Kubernetes) and surfaces the gaps.

Declared

What your Terraform modules and manifests say should exist.

Managed

What your Terraform state believes it’s tracking.

Actual

What’s really running in AWS, GCP, and Kubernetes right now.

What You Can Ask Annie

Manual changes

“What resources were modified outside of Terraform this week?”
“Show me security groups changed via the AWS console in the last 24 hours”
“Which IAM roles had policies attached manually?”

Orphaned and unmanaged resources

“Which live resources are not in any Terraform state file?”
“Show me EC2 instances with no Terraform backing”
“Which S3 buckets exist in AWS but aren’t declared anywhere?”

Kubernetes drift

“Are my declared replica counts matching the pods actually running?”
“Show me deployments where ready replicas differ from desired”
“Which ConfigMaps were edited directly in the cluster?”

Broken references

“Which Terraform resources reference dependencies that no longer exist?”
“Show me state entries pointing to deleted cloud resources”

Real-World Examples

The 'who opened port 22?' investigation

PagerDuty alerted on an unexpected SSH attempt against a dev box.“Has any security group been changed outside Terraform this week?”

I found 2 manual changes:

sg-dev-debug: Port 22 opened to 0.0.0.0/0 via AWS console on Tuesday 14:03 by user alice@...

sg-prod-db: Inbound rule for 10.0.5.0/24 added via CLI on Wednesday 09:17

Neither change is reflected in the Terraform code or state.

Replica drift after a bad deploy

The checkout service is flaky. I suspect a partial rollout.“Which deployments have fewer ready replicas than declared?”

3 deployments have drift:

checkout: declared 8, ready: 5 (3 pods in ImagePullBackOff)

billing-worker: declared 4, ready: 4 ✓ (no drift)

search-indexer: declared 2, ready: 0 (OOMKilled, CrashLoopBackOff)

The checkout image tag in your Helm values doesn’t exist in the registry.

Finding orphaned cloud resources

I suspect the cloud bill has resources nobody remembers.“Show me EC2 instances and RDS databases not backed by any Terraform state”

I found 7 unmanaged resources across your AWS accounts:

4 EC2 instances in us-east-1 (tagged env=experiment, launched 2024)

2 RDS snapshots in eu-west-1 (no source instance exists)

1 NAT Gateway in us-west-2 (no route table references it)

Best Practices

Check drift before planning

Ask Annie for drift in the target environment before running terraform plan. It often explains surprising diffs.

Scope by environment or module

Drift queries are more useful when narrowed: “…in the prod workspace”, “…in the networking module”.

Combine with change history

Pair drift queries with Change Management to see who made the change and when.

Investigate orphans periodically

Unmanaged resources are both a cost and a security concern. Ask Annie monthly.

Get Started

Connect Terraform

You need code + state connected for drift detection to work.

Connect Kubernetes

Install the agent to get declared-vs-actual drift inside your clusters.

Kubernetes (Live)Integrations

​Overview