At Teckst, we use Terraform for all management and configuration of infrastructure.  The tidal boundary at the intersection of infrastructure and application configuration is largely determined by which kinds of applications will be deployed on which kinds of infrastructure.  Standing up a bunch of Cassandra applications across a bunch of EC2 instances?  Probably something like Ansible or Chef is better suited as they’re stepping into a running instance and creating and/or updating its configuration (though, certainly, this can be done via Terraform and EC2 User Data scripts).  But Teckst is 100% AWS and 100% Lambda so we have a much more limited need .  We need Lambda Functions, API Gateways, SQS Queues, S3 Buckets, IAM Users, etc to be created and wired together; thereafter, our Lambda Function are uploaded by our CI system and run across the configured AWS resources.  In this case, Terraform is perfect for us as it walks our infrastructure right up to the line at which our Lambda Functions take over.

Terraform’s documentation provides little in the way of guidance on structuring larger Terraform projects.  The docs do talk about modules and outputs, but no fleshed-out examples are provided for how you should structure your project.  That said, many guides are available on the web ([1], [2], [3] are the top three Google results as of this writing).

Terraform Modules

Terraform Modules allow you to create modules of infrastructure which accept/require specific Variables and yield specific Outputs.  Besides being a great way to utilize third-party scripts (e.g. a script you find on Github to build a fully configured EC2 instance with NGinx fronting a Django application), Modules allow a clean, logical separation between environments (e.g. Production and Staging).  A good example of organizing a project using Terraform Modules is given in this blog post.  Initially, we approached organizing scripts similarly:

/prod/main.tf
/prod/vpc.tf - production configs for VPC module 
/staging/main.tf
/staging/vpc.tf - staging configs for VPC module
/modules/vpc/main.tf - contains staging configs for VPC module
/modules/vpc/variables.tf
/modules/vpc/outputs.tf

Now all of our prod configuration values were separate from our staging configuration values.  The prod and staging scripts could reference our generic vpc Module.

Terraform Modules and First-Party Scripts

While Terraform Modules are a great way to use third party Terraform scripts to build your infrastructure, they felt awkward when using only first party scripts.  Other ways of organizing the scripts seemed to rely on Copy-and-Paste and that’s a Bad Thing.  But what if Copy-and-Paste is unavoidable?  With Terraform Modules, a lot of time is spent threading Variables up into sub-Modules only to grab and thread Outputs from those sub-Modules up and into other sub-Modules via other Variables.  Consider your VPC and Route53 scripts:

/main.tf
/terraform.tfvars
/vpc/main.tf
/vpc/outputs.tf
/vpc/variables.tf
/route53/main.tf
/route53/outputs.tf
/route53/variables.tf

We’ll need to:

  1. In main.tf: reference the vpc Module, passing in appropriate Variables.
  2. In /vpc/variables.tf: define Variables this Module will accept.
  3. In /vpc/outputs.tf: map the Module’s Variables to the Module’s Outputs.
  4. In /vpc/main.tf: create the VPC using the Variables specified in /main/variables.tf.
  5. In /main.tf: reference the route53 Module, passing in as Variables the Outputs retrieved from the vpc Module.
  6. Then do Steps 2-5 for the /vpc Module.

The above has three significant issues:

  1. It’s tedious.  That’s something like 11 steps to create a VPC and Route53 domain.  That’s about 8 lines in a single Terraform script…  Here we’ve got 2-3 times that scattered across 7 files.
  2. It’s error-prone.  Didn’t get your naming convention right for one of the Outputs of some Module?  That’s going to happen.  And the Terraform errors aren’t too helpful in finding the issue.
  3. It’s based on Copy-and-Paste!  The root of all evil!  Sure, you don’t need to Copy-and-Paste but you’re going to do so as, for the hundredth time, you write into a Variable declaration the name of a Module’s Output…  And as you mispell the Output‘s name…

Less-Bad Terraform Organization

So I’m not sure we can avoid Copy-and-Paste.  If we can’t avoid it, can we organize our Terraform scripts in such a way that we minimize likely harm?  We approached it as follows, attempting to:

  • Strictly limit the surface area of the production and staging configurations so that it was really obvious which was which.
  • Eliminate the differences in everything outside of the above differences in production and staging configurations areas.
  • Make clear how to migrate staging infrastructure alterations forward to production infrastructure.
  • Make clear where Copy-and-Paste is allowed.
  • Simplify our usage of Terraform.

We wound up with the following organization:

/prod/vpc.tf
/prod/route53.tf
/prod/variables.tf
/staging/vpc.tf
/staging/route53.tf
/staging/variables.tf

That’s it.  What’s this get us:

  • Very simple Terraform usage.  No need to trace Variables and Outputs up/down through Modules.  route53.tf can directly reference aws_vpc.main.id.
  • Very limited surface area for configuration.  All environment specific configuration values go in variables.tf.  It’s easy to compare the prod and staging variables.tf to see new or changed variables.
  • Simple migration to prod.  Just meld staging and production.  Besides the variables.tf and with suitable review, the content of staging can be quickly and easily pulled into production.

An example of the content of /staging/variables.tf:

########################
# ACCOUNT SETTINGS
########################
variable "environment"    {default = "staging"}
variable "account_id"     {default = "acct-staging"}
variable "account_number" {default = 123456789 }
variable "aws_region"     {default = "us-east-1"}

########################
# DOMAIN SETTINGS
########################
variable "root_domain"         {default = "not-real.com"}
variable "root_private_domain" {default = "not-real.internal"}

Additional Considerations/Practices

Clearly, no clear-text secrets live in the Terraform scripts.  Any secrets we do need to be clear-text for the Terraform scripts (looking at you, API Gateway…) are decrypted at runtime by our tf_plan.sh and tf_apply.sh helper scripts.

Currently, one person does all of the infrastructure development, so we haven’t really wrestled with conflicts in the terraform.tfstate but that will be an issue regardless of the organization of Terraform files.