Add Terraform configuration for VPC and RDS #30

rbreslow · 2022-05-12T19:46:01Z

Overview

Remove stray link from README ToC.
Bootstrap a Terraform project that consumes the VPC created by CHOP IS and creates an RDS instance and Bastion host.
Utilize our Terraform container image and the infra wrapper script to enable consistent and portable Terraform execution.

Resolves #7

Checklist

Squashed any fixup! commits
Updated README.md to reflect any changes

Testing Instructions

Launch an instance of the included Terraform container image:

$ docker-compose -f docker-compose.ci.yml run --rm terraform
bash-5.1#

Use infra to generate and apply a Terraform plan:

bash-5.1# ./scripts/infra plan
bash-5.1# ./scripts/infra apply

Note: You should see several warnings about updating resource tags and "values for undeclared variables." The first warning is a side effect of using default_tags along with the RDS module from Azavea. The second warning refers to values I declared in the terraform.tfvars file for my next pull request.

Next, DM me on Slack with the public key you use for SSH so I can add it to the authorized_keys file. Then, while connected to the CHOP VPN, open a shell on the Bastion host:

$ ssh [email protected]
Last login: Fri Apr 29 18:57:13 2022 from ip-10-250-31-1.ec2.internal

       __|  __|_  )
       _|  (     /   Amazon Linux 2 AMI
      ___|\___|___|

https://aws.amazon.com/amazon-linux-2/
20 package(s) needed for security, out of 36 available
Run "sudo yum update" to apply all updates.

Using the database password in the terraform.tfvars file, see that you can connect to the Bastion:

[ec2-user@ip-172-22-24-136 ~]$ psql -U imagedeidetl -d imagedeidetl -h image-deid-etl.ctc077z3pbxl.us-east-1.rds.amazonaws.com
Password for user imagedeidetl: 
psql (13.3, server 14.2)
WARNING: psql major version 13, server major version 14.
         Some psql features might not work.
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
Type "help" for help.

imagedeidetl=> SELECT version();
                                                 version                                                 
---------------------------------------------------------------------------------------------------------
 PostgreSQL 14.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-12), 64-bit
(1 row)

imagedeidetl=>

Note: I installed the PostgreSQL client tooling on the Bastion out-of-band from the Extras Library.

- Bootstrap a Terraform project that consumes the VPC created by CHOP IS and creates an RDS instance and Bastion host. - Utilize our Terraform container image and the `infra` wrapper script to enable consistent and portable Terraform execution.

rbreslow · 2022-05-12T19:50:09Z

deployment/terraform/database.tf

+  project     = var.project
+  environment = var.environment


default_tags eliminates the need to specify tags for every resource in the Terraform project. However, if you define tags twice, both at the resource and default tags level, then the Terraform plan will always be noisy.

This behavior is a known issue in the Azavea-series of Terraform modules: azavea/terraform-aws-postgresql-rds#42.

See: https://support.hashicorp.com/hc/en-us/articles/4406026108435-Known-issues-with-default-tags-in-the-Terraform-AWS-Provider

I still think it's worth using an auto-tagger like https://github.com/bridgecrewio/yor to tag these resources. Default tags are great, but having some more granular resource tracing will help us troubleshoot when running in prod.

There's a pretty good post about using Yor to tag resources created in terraform and child modules. https://gsd.fundapps.io/how-we-make-yor-work-with-terraform-caller-and-child-modules-22216afd775d

I think that an auto-tagger like Yor can make sense for some of D3b's infrastructure where Terraform code is spread across many different repositories, making it hard to see and manage everything all in one place.

I considered it here. Here's my take: since all of the Terraform code for the project is in one place, a git blame would provide as much value while making the code much easier to read. Think, you're on AWS, you see the "Image Deid ETL" project tag, and you know that anything in this repository corresponds 1:1 with the resources in the AWS console.

Also, using an auto-tagger like Yor prevents other folks from spinning up the Terraform project in their own organization since every resource accumulates D3b-specific state.

Would you consider rolling with my approach, here, to see how it feels?

rbreslow · 2022-05-12T19:51:19Z

deployment/terraform/database.tf

+  # TODO: Fork this Terraform module and bring into d3b-center.
+  source = "github.com/azavea/terraform-aws-postgresql-rds?ref=3.0.0"


I am using this Terraform module, which I worked on at Azavea, in the interest of expediency. I think we should fork the module and bring it into our organization so we can make d3b-specific tweaks, like resolving the default_tags issue I've mentioned below.

Have you looked at https://github.com/kids-first/aws-infra-postgres-rds-module?

What do you think, about fixing that issue in the original module? The module will not require a lot of tweaks (except tags). We can decide on that later. This time around, this works great!

Have you looked at https://github.com/kids-first/aws-infra-postgres-rds-module?

Yes, however, I didn't look closely at the modifications made, so I wanted to stick with a familiar interface.

What do you think, about fixing that issue in the original module? The module will not require a lot of tweaks (except tags).

I agree, I think someone should fix it, and I even opened issues on all of Azavea's Terraform modules to fix this. However, I want to fork things for two reasons:

It's not as easy as dropping the project and environment tags from the module input. For example, at Azavea, we used environment to namespace resources because a single AWS account never housed more than one application. So, you have resources named like rds${var.environment}EnhancedMonitoringRole that would have to change. Or, you'd need an optional parameter like resource_namespace. I'd want to invest time in determining an idiomatic solution.

We don't have maintainer access to Azavea's repositories. It's unclear whether or not I could get that sort of access, and it's unclear whether they'd have the capacity to review our contributions.

An afterthought: maybe we could have a conversation with their Ops team and see if there's any interest in restructuring things as part of some "social good" shared Philly DevOps group.

deployment/terraform/firewall.tf

deployment/terraform/network.tf

rbreslow · 2022-05-12T19:57:31Z

scripts/infra

+        aws s3 cp "s3://${IMAGE_DEID_ETL_SETTINGS_BUCKET}/terraform/terraform.tfvars" \
+          "${IMAGE_DEID_ETL_SETTINGS_BUCKET}.tfvars"


Copy the terraform.tfvars file from the settings bucket to your local computer.

Curious, why not keep it named as terraform.tfvars? That way you don't need to pass in the -var-file parameter in your plan/apply commands.

Second that.

Also why to keep it in s3 not github?

Curious, why not keep it named as terraform.tfvars?

To make it clear that the .tfplan and .tfvars files are both targeting/from a remote backend. And, if you switch between a staging/production backend, there's no chance of applying a plan generated for the wrong environment.

Also why to keep it in s3 not github?

I understand that, for existing infrastructure like Kids First, there is a distinction between config variables and secrets. We feed both into the Terraform project as input variables. But, we store config in the d3b-deployment-config repository, while we store secrets in S3.

I don't think that we should mix Git and S3. When you scatter input variables in different places and different formats, it's hard to see and manage all the config in one place.

For this project, I'd like to treat config variables and secrets as if they're the same data type and store them in S3. I'm leaning towards S3 because of IAM and object encryption. I'm also OK with bucket versioning since you can see which IAM user modified a file, and we can correlate timestamps with deployments. You can also use CloudTrail to kick-off notifications if anyone updates the config.

I'd feel better about Git if we used something like mozilla/sops to encrypt data with AWS KMS. We could even show diffs in cleartext.

However, I wouldn't want to store the encrypted config in the image-deid-etl repository because I've designed this project to be used by anyone with an AWS account. So, the state has to come from somewhere else.

scripts/infra

README.md

devbyaccident

Can you to run https://github.com/bridgecrewio/checkov on this and take a look at the errors? Some of them won't be a big deal, but they will show on our bridgecrew scans if deployed before fixing.

devbyaccident · 2022-05-12T20:08:04Z

deployment/terraform/database.tf

+  # TODO: Fork this Terraform module and bring into d3b-center.
+  source = "github.com/azavea/terraform-aws-postgresql-rds?ref=3.0.0"


Have you looked at https://github.com/kids-first/aws-infra-postgres-rds-module?

devbyaccident · 2022-05-12T20:14:47Z

deployment/terraform/database.tf

+  project     = var.project
+  environment = var.environment


I still think it's worth using an auto-tagger like https://github.com/bridgecrewio/yor to tag these resources. Default tags are great, but having some more granular resource tracing will help us troubleshoot when running in prod.

There's a pretty good post about using Yor to tag resources created in terraform and child modules. https://gsd.fundapps.io/how-we-make-yor-work-with-terraform-caller-and-child-modules-22216afd775d

devbyaccident · 2022-05-12T20:18:34Z

scripts/infra

+        aws s3 cp "s3://${IMAGE_DEID_ETL_SETTINGS_BUCKET}/terraform/terraform.tfvars" \
+          "${IMAGE_DEID_ETL_SETTINGS_BUCKET}.tfvars"


Curious, why not keep it named as terraform.tfvars? That way you don't need to pass in the -var-file parameter in your plan/apply commands.

devbyaccident · 2022-05-12T20:21:12Z

scripts/infra

+        case "$1" in
+        plan)
+            # Clear stale modules & remote state, then re-initialize
+            rm -rf .terraform terraform.tfstate*


Nitpicking here, but can you change this to remove .terraform* instead of .terraform? If/when we deploy this in parallel, we'll change the TF_DATA_DIR to avoid collisions.

I don't have experience with a "parallel" deployment. If we automate deployments via a CI process, there shouldn't be any shared working directory or accumulated state, so I don't anticipate that there could be a collision.

Are you OK with returning to this once we encounter an issue/discuss parallel deployments?

rbreslow

Can you to run https://github.com/bridgecrewio/checkov on this and take a look at the errors?

I've opened #33 and #32 to capture this.

rbreslow · 2022-05-13T17:53:12Z

scripts/infra

+        aws s3 cp "s3://${IMAGE_DEID_ETL_SETTINGS_BUCKET}/terraform/terraform.tfvars" \
+          "${IMAGE_DEID_ETL_SETTINGS_BUCKET}.tfvars"


Curious, why not keep it named as terraform.tfvars?

To make it clear that the .tfplan and .tfvars files are both targeting/from a remote backend. And, if you switch between a staging/production backend, there's no chance of applying a plan generated for the wrong environment.

Also why to keep it in s3 not github?

I understand that, for existing infrastructure like Kids First, there is a distinction between config variables and secrets. We feed both into the Terraform project as input variables. But, we store config in the d3b-deployment-config repository, while we store secrets in S3.

I don't think that we should mix Git and S3. When you scatter input variables in different places and different formats, it's hard to see and manage all the config in one place.

For this project, I'd like to treat config variables and secrets as if they're the same data type and store them in S3. I'm leaning towards S3 because of IAM and object encryption. I'm also OK with bucket versioning since you can see which IAM user modified a file, and we can correlate timestamps with deployments. You can also use CloudTrail to kick-off notifications if anyone updates the config.

I'd feel better about Git if we used something like mozilla/sops to encrypt data with AWS KMS. We could even show diffs in cleartext.

However, I wouldn't want to store the encrypted config in the image-deid-etl repository because I've designed this project to be used by anyone with an AWS account. So, the state has to come from somewhere else.

rbreslow added 2 commits May 12, 2022 15:30

Remove stray link from README ToC

73d0cb2

rbreslow self-assigned this May 12, 2022

rbreslow commented May 12, 2022

View reviewed changes

rbreslow requested review from alubneuski and devbyaccident May 12, 2022 19:58

rbreslow commented May 12, 2022

View reviewed changes

README.md Show resolved Hide resolved

devbyaccident reviewed May 12, 2022

View reviewed changes

alubneuski approved these changes May 13, 2022

View reviewed changes

rbreslow merged commit bf5240e into develop May 13, 2022

rbreslow deleted the feature/rb/vpc-and-rds branch May 13, 2022 14:54

rbreslow commented May 13, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Terraform configuration for VPC and RDS #30

Add Terraform configuration for VPC and RDS #30

rbreslow commented May 12, 2022 •

edited

Loading

rbreslow May 12, 2022

devbyaccident May 12, 2022

rbreslow May 13, 2022 •

edited

Loading

rbreslow May 12, 2022

devbyaccident May 12, 2022

alubneuski May 12, 2022

rbreslow May 13, 2022

rbreslow May 12, 2022

devbyaccident May 12, 2022

alubneuski May 12, 2022

alubneuski May 12, 2022

rbreslow May 13, 2022

devbyaccident left a comment

devbyaccident May 12, 2022

devbyaccident May 12, 2022

devbyaccident May 12, 2022

devbyaccident May 12, 2022

rbreslow May 13, 2022

rbreslow left a comment

rbreslow May 13, 2022

		# TODO: Fork this Terraform module and bring into d3b-center.
		source = "github.com/azavea/terraform-aws-postgresql-rds?ref=3.0.0"

		aws s3 cp "s3://${IMAGE_DEID_ETL_SETTINGS_BUCKET}/terraform/terraform.tfvars" \
		"${IMAGE_DEID_ETL_SETTINGS_BUCKET}.tfvars"

Add Terraform configuration for VPC and RDS #30

Add Terraform configuration for VPC and RDS #30

Conversation

rbreslow commented May 12, 2022 • edited Loading

Overview

Checklist

Testing Instructions

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rbreslow May 13, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

devbyaccident left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rbreslow left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rbreslow commented May 12, 2022 •

edited

Loading

rbreslow May 13, 2022 •

edited

Loading