- Always use remote state backends (S3+DynamoDB, GCS, Azure Blob) — never use local state for shared infrastructure
- Split state by blast radius: networking, data stores, and application layers in separate state files
- Use terraform_remote_state data source sparingly — prefer SSM parameters or output files for cross-stack references
- Never manually edit state files — use terraform state mv, rm, import for state manipulation
- Back up state before any import, move, or taint operation; S3 versioning provides automatic backups
- Always use remote state: S3+DynamoDB (AWS), GCS (GCP), or Azure Blob with locking enabled
- Split state by blast radius and change frequency: network (rare changes) separate from app (frequent deploys)
- Use terraform_remote_state sparingly — it creates tight coupling; prefer SSM/Consul for cross-stack data sharing
- Never hand-edit state — use terraform state mv, rm, import, and show for all state operations
- S3 versioning on the state bucket provides automatic backups; additionally, snapshot before risky operations
- Use workspaces for identical infrastructure across environments only if configurations are truly identical
- Prefer directory-based separation (environments/prod/, environments/dev/) over workspaces for most use cases
- State file per component: one for VPC, one for RDS, one for ECS — limits blast radius of a bad apply
- Use moved blocks (Terraform 1.1+) for refactoring resource addresses without state surgery
- Use import blocks (Terraform 1.5+) to adopt existing resources declaratively instead of terraform import CLI
- Run terraform refresh cautiously — it can detect drift but also overwrite desired state with actual state
- Monitor state file size — large state files (>10MB) slow down plans; split into smaller state files