Skip to content

Latest commit

 

History

History
540 lines (391 loc) · 15.3 KB

File metadata and controls

540 lines (391 loc) · 15.3 KB

Terraform

Infrastructure-as-Code using HCL manifests to define cloud resources.

Idempotent, queries cloud APIs, detects what is missing or has changed and then applies the necessary changes to reconcile.

Install Terraform

Quick install script is found in the DevOps-Bash-tools repo:

install_terraform.sh

Optionally specify a version argument, otherwise defaults to finding out and installing the latest version.

Terraform Code

See the HariSekhon/Terraform repo for some Terraform code and templates for common files and settings to get you started, such as backend.tf, provider.tf, main.tf etc.

Readme Card

Running Terraform

Download the providers and create or connect to the terraform.tfstate file:

terraform init

Format you code:

terraform fmt

Validate your code:

terraform validate

See the plan of additions/deletions/modifications that Terraform would do:

terraform plan

Apply the changes:

terraform apply

Terraform State

Stored in a terraform.tfstate either locally or more usually in a cloud bucket to be shared among users or from a CI/CD system.

This is just a JSON file so you can read its contents to find out what version of Terraform it is using.

terraform_gcs_backend_version.sh is a convenience script to determine this straight from a GCS bucket.

Terraform CLI Config

HariSekhon/Terraform - .terraformrc

https://developer.hashicorp.com/terraform/cli/config/config-file

Configure things like Terraform Plugin Caching.

tfenv

Install tfenv to manage multiple versions of Terraform.

When combined with direnv this will auto-switch to the saved version of Terraform recorded in .envrc to avoid updating the tfstate file and forcing all colleagues to upgrade their terraform versions or breaking CI/CD.

tfswitch is another option by the same author as tgswitch below.

asdf is another option - one tool for all runtime versions.

Terragrunt

Important for modularity and performance of Terraform code bases.

See Terragrunt for more details.

tgswtich

Install tgswitch to manage multiple versions of Terragrunt.

When combined with direnv this will auto-switch to the saved version of Terragrunt recorded in .envrc.

This is more recently updated than tgenv.

asdf is another option - one tool for all runtime versions.

Linting & Security

Generate Plan JSON

terraform init
terraform plan -out tf.plan
terraform show -json tf.plan  > tf.json

You can then run linting and security scanning on the resulting JSON file:

checkov -f tf.json

Useful Modules

Document Your Terraform Modules

:octocat: terraform-docs/terraform-docs

brew install terraform-docs
terraform-docs markdown table --output-file README.md --output-mode inject /path/to/module

Best Practices

https://www.terraform-best-practices.com/

Caching

https://developer.hashicorp.com/terraform/cli/config/config-file#provider-plugin-cache

This is not thread-safe.

Configure it in .terraformrc:

plugin_cache_dir   = "$HOME/.terraform.d/plugin-cache"

This directory must already exist:

mkdir -p -v ~/.terraform.d/plugin-cache

Otherwise you'll end up with an error like this:

There are some problems with the CLI configuration:
╷
│ Error: The specified plugin cache dir /Users/hari/.terraform.d/plugin-cache cannot be opened: stat /Users/hari/.terraform.d/plugin-cache: no such file or directory
│
╵

As a result of the above problems, Terraform may not behave as intended.

To see how much space you are wasting on duplicate provider downloads for different Terraform code bases without this, or Terragrunt modules which will make this even worse, you can run this script from DevOps-Bash-tools:

terraform_provider_count_sizes.sh

Output on my Mac:

30  597M  hashicorp/aws/5.80.0/darwin_arm64/terraform-provider-aws_v5.80.0_x5
7   637M  hashicorp/aws/5.90.1/darwin_arm64/terraform-provider-aws_v5.90.1_x5
4   637M  hashicorp/aws/5.90.0/darwin_arm64/terraform-provider-aws_v5.90.0_x5
3   599M  hashicorp/aws/5.81.0/darwin_arm64/terraform-provider-aws_v5.81.0_x5
2   593M  hashicorp/aws/5.79.0/darwin_arm64/terraform-provider-aws_v5.79.0_x5
    ...

Output on an Atlantis server pod after deleting all data cache to fix out of space errors and then a single PR run:

14  654M  hashicorp/aws/5.90.1/linux_amd64/terraform-provider-aws_v5.90.1_x5
13  14M   hashicorp/external/2.3.4/linux_amd64/terraform-provider-external_v2.3.4_x5
13  14M   hashicorp/local/2.5.2/linux_amd64/terraform-provider-local_v2.5.2_x5
13  14M   hashicorp/null/3.2.3/linux_amd64/terraform-provider-null_v3.2.3_x5
3   346M  hashicorp/aws/4.67.0/linux_amd64/terraform-provider-aws_v4.67.0_x5
3   621M  hashicorp/aws/5.80.0/linux_amd64/terraform-provider-aws_v5.80.0_x5
3   653M  hashicorp/aws/5.90.0/linux_amd64/terraform-provider-aws_v5.90.0_x5
3   14M   hashicorp/random/3.6.3/linux_amd64/terraform-provider-random_v3.6.3_x5
1   627M  hashicorp/aws/5.82.2/linux_amd64/terraform-provider-aws_v5.82.2_x5
1   630M  hashicorp/aws/5.84.0/linux_amd64/terraform-provider-aws_v5.84.0_x5

For Terragrunt, see Terragrunt Caching.

Vendor Code

Lessons learnt the hard way from the real life project.

Do not accept vendor code unless it passes ALL of the following points:

  • it's in the same format as your internal code base eg. Terraform vs Terragrunt
  • using standard modules from the Hashicorp registry eg. hashicorp/aws
  • has passed all Checkov checks and / or any other linting / security tools your use

If you don't enforce good practices on the vendor code base before accepting it, you'll inherit more problems than you can see and accurately estimate just by reading their code base.

You'll lose tonnes of time:

  • migrating from Terraform to Terragunt modules
  • migrating from custom modules to official portable modules to match the rest of your code base
  • migrating from Terraform embedded Helm to standard ArgoCD deployment using Kustomize or Helm normally
  • inheriting problems in the migrations above
  • debugging and fixing their code

Even simple things like an S3 bucket will then fail your Checkov PR checks for things like:

  • not having KMS encryption, you'll have to go create that, add the dependency and reference it yourself
  • public ACLs tripping Checkov, even if the bucket really is supposed to be public
    • this may also blocked at the AWS Control Tower guardrail policy level, such that you cannot use public buckets
    • the workaround I did in one project was to use CloudFront in front of the bucket
  • you'll miss minor details while trying to manually migrate the whole code base, eg. missing a small aws_elasticache_cache vs aws_elasticache_serverless_cache resource will leave you migrating to the standard AWS elasticache module defaulting to the wrong type and end up with errors like:
Error: "node_type" is required unless "global_replication_group_id" is set.

Leaving you wondering what the node_type should be, instead of realizing you're using the wrong module.

If they had used the module in the first place your brain wouldn't be fried from migrating all their modules and then missing a detail like this.

Terraform Console

Useful for testing.

terraform console

Unfortunately it's a line-based REPL so you can't paste multi-line inputs, see next examples for how to work around this.

Convert Terraform jsonencode() to literal JSON

This is sometimes needed when porting a plain terraform AWS jsonencode() document into an embedded JSON policy.

echo 'jsonencode({ name = "example", values = [1, 2, 3] })' | terraform console
"{\"name\":\"example\",\"values\":[1,2,3]}"

However, the above is not literal, so pipe it through jq -r to remove the quoting:

echo 'jsonencode({ name = "example", values = [1, 2, 3] })' | terraform console | jq -r
{"name":"example","values":[1,2,3]}

Handling Multi-line jsonencode()

Unfortunately since terraform console is a line-based REPL you cannot do this:

terraform console <<EOF | jq -r
jsonencode(
  {
    name = "example",
    values = [1, 2, 3]
  }
)
EOF
│ Error: Missing expression
│
│   on <console-input> line 1:
│   (source code not available)
│
│ Expected the start of an expression, but found the end of the file.

So first flatten it by removing newlines using tr or similar command:

tr -d '\n' <<EOF | terraform console | jq -r
jsonencode(
  {
    name = "example",
    values = [1, 2, 3]
  }
)
EOF
{"name":"example","values":[1,2,3]}

Pipe it through jq once more if you want a multi-line pretty-printed JSON result:

tr -d '\n' | terraform console | jq -r | jq

eg.

tr -d '\n' <<EOF | terraform console | jq -r | jq
jsonencode(
  {
    name = "example",
    values = [1, 2, 3]
  }
)
EOF
{
  "name": "example",
  "values": [
    1,
    2,
    3
  ]
}

Handling Multi-line jsonencode() with multi-line terraform block that depends on newlines

locals {

  result = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = [
          "dynamodb:BatchGetItem",
          "dynamodb:GetItem",
          "dynamodb:Query",
          "dynamodb:Scan",
          "dynamodb:BatchWriteItem",
          "dynamodb:PutItem",
          "dynamodb:UpdateItem"
        ]
        Effect = "Allow"
        Resource = [
          local.dynamodb_project_sources_table_arn,
          local.dynamodb_project_destinations_table_arn,
          local.dynamodb_project_subdomain_mapping_table_arn
        ]
      }
    ]
  })

}
│ Error: Missing attribute separator
│
│   on <console-input> line 1:
│   (source code not available)
│
│ Expected a newline or comma to mark the beginning of the next attribute.

Tried dumping it to a /tmp file and then have the Terraform Console read the file using a single line function:

cat > /tmp/terraform.jsonencode.txt
echo 'file("/tmp/terraform.jsonencode.txt")' | terraform console | jq -r | jq

but this outputs a literal instead of interpreting it as code, output looks like this:

<<EOT
locals {

  result = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = [
          "dynamodb:BatchGetItem",
          "dynamodb:GetItem",
          "dynamodb:Query",
          "dynamodb:Scan",
          "dynamodb:BatchWriteItem",
          "dynamodb:PutItem",
          "dynamodb:UpdateItem"
        ]
        Effect = "Allow"
        Resource = [
          local.dynamodb_project_sources_table_arn,
          local.dynamodb_project_destinations_table_arn,
          local.dynamodb_project_subdomain_mapping_table_arn
        ]
      }
    ]
  })

}

EOT

So you can get the code into Terraform Console but not eval it. Might have to use actual Terraform apply with output instead, which is problematic when trying to port some vendor's code bundle that doesn't actually execute in local environment.

Not solved yet.

hcl2json

:octocat: tmccombs/hcl2json

Convert HCL to JSON to make it easier to work with in other languages.

brew install hcl2json
hcl2json "$file"

outputs the JSON equivalent.

Troubleshooting

Checksum Mismatch in .terraform.lock.hcl

If you get an error like this when running Terraform or Terragrunt:

Error: Required plugins are not installed

The installed provider plugins are not consistent with the packages selected
in the dependency lock file:
  - registry.terraform.io/hashicorp/aws: the cached package for registry.terraform.io/hashicorp/aws 5.80.0 (in .terraform/providers) does not match any of the checksums recorded in the dependency lock file

This is caused by the .terraform.lock.hcl being generated and committed from a machine of a different architecture since default Terraform only includes the checksums for the local architecture.

This surfaces in Atlantis or other CI/CD systems because developers are often using Mac (or heavy forbid Windows) but the CI/CD systems like Atlantis are invariably running on Linux.

Run this command to update the .terraform.lock.hcl file with the checksum for all 3 architectures:

terraform providers lock -platform=windows_amd64 -platform=darwin_amd64 -platform=linux_amd64

and then commit the updated .terraform.lock.hcl file:

git add .terraform.lock.hcl
git commit -m "updated .terraform.lock.hcl file with checksums for all 3 platform architectures" .terraform.lock.hcl