Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Terraform configuration for VPC and RDS #30

Merged
merged 2 commits into from
May 13, 2022

Conversation

rbreslow
Copy link
Contributor

@rbreslow rbreslow commented May 12, 2022

Overview

  • Remove stray link from README ToC.
  • Bootstrap a Terraform project that consumes the VPC created by CHOP IS and creates an RDS instance and Bastion host.
  • Utilize our Terraform container image and the infra wrapper script to enable consistent and portable Terraform execution.

Resolves #7

Checklist

  • Squashed any fixup! commits
  • Updated README.md to reflect any changes

Testing Instructions

Launch an instance of the included Terraform container image:

$ docker-compose -f docker-compose.ci.yml run --rm terraform
bash-5.1#

Use infra to generate and apply a Terraform plan:

bash-5.1# ./scripts/infra plan
bash-5.1# ./scripts/infra apply

Note: You should see several warnings about updating resource tags and "values for undeclared variables." The first warning is a side effect of using default_tags along with the RDS module from Azavea. The second warning refers to values I declared in the terraform.tfvars file for my next pull request.

Next, DM me on Slack with the public key you use for SSH so I can add it to the authorized_keys file. Then, while connected to the CHOP VPN, open a shell on the Bastion host:

$ ssh [email protected]
Last login: Fri Apr 29 18:57:13 2022 from ip-10-250-31-1.ec2.internal

       __|  __|_  )
       _|  (     /   Amazon Linux 2 AMI
      ___|\___|___|

https://aws.amazon.com/amazon-linux-2/
20 package(s) needed for security, out of 36 available
Run "sudo yum update" to apply all updates.

Using the database password in the terraform.tfvars file, see that you can connect to the Bastion:

[ec2-user@ip-172-22-24-136 ~]$ psql -U imagedeidetl -d imagedeidetl -h image-deid-etl.ctc077z3pbxl.us-east-1.rds.amazonaws.com
Password for user imagedeidetl: 
psql (13.3, server 14.2)
WARNING: psql major version 13, server major version 14.
         Some psql features might not work.
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
Type "help" for help.

imagedeidetl=> SELECT version();
                                                 version                                                 
---------------------------------------------------------------------------------------------------------
 PostgreSQL 14.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-12), 64-bit
(1 row)

imagedeidetl=> 

Note: I installed the PostgreSQL client tooling on the Bastion out-of-band from the Extras Library.

rbreslow added 2 commits May 12, 2022 15:30
- Bootstrap a Terraform project that consumes the VPC created by CHOP IS
  and creates an RDS instance and Bastion host.
- Utilize our Terraform container image and the `infra` wrapper script
  to enable consistent and portable Terraform execution.
@rbreslow rbreslow self-assigned this May 12, 2022
Comment on lines +105 to +106
project = var.project
environment = var.environment
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default_tags eliminates the need to specify tags for every resource in the Terraform project. However, if you define tags twice, both at the resource and default tags level, then the Terraform plan will always be noisy.

This behavior is a known issue in the Azavea-series of Terraform modules: azavea/terraform-aws-postgresql-rds#42.

See: https://support.hashicorp.com/hc/en-us/articles/4406026108435-Known-issues-with-default-tags-in-the-Terraform-AWS-Provider

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think it's worth using an auto-tagger like https://github.com/bridgecrewio/yor to tag these resources. Default tags are great, but having some more granular resource tracing will help us troubleshoot when running in prod.

There's a pretty good post about using Yor to tag resources created in terraform and child modules. https://gsd.fundapps.io/how-we-make-yor-work-with-terraform-caller-and-child-modules-22216afd775d

Copy link
Contributor Author

@rbreslow rbreslow May 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that an auto-tagger like Yor can make sense for some of D3b's infrastructure where Terraform code is spread across many different repositories, making it hard to see and manage everything all in one place.

I considered it here. Here's my take: since all of the Terraform code for the project is in one place, a git blame would provide as much value while making the code much easier to read. Think, you're on AWS, you see the "Image Deid ETL" project tag, and you know that anything in this repository corresponds 1:1 with the resources in the AWS console.

Also, using an auto-tagger like Yor prevents other folks from spinning up the Terraform project in their own organization since every resource accumulates D3b-specific state.

Would you consider rolling with my approach, here, to see how it feels?

Comment on lines +71 to +72
# TODO: Fork this Terraform module and bring into d3b-center.
source = "github.com/azavea/terraform-aws-postgresql-rds?ref=3.0.0"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am using this Terraform module, which I worked on at Azavea, in the interest of expediency. I think we should fork the module and bring it into our organization so we can make d3b-specific tweaks, like resolving the default_tags issue I've mentioned below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think, about fixing that issue in the original module? The module will not require a lot of tweaks (except tags). We can decide on that later. This time around, this works great!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you looked at https://github.com/kids-first/aws-infra-postgres-rds-module?

Yes, however, I didn't look closely at the modifications made, so I wanted to stick with a familiar interface.

What do you think, about fixing that issue in the original module? The module will not require a lot of tweaks (except tags).

I agree, I think someone should fix it, and I even opened issues on all of Azavea's Terraform modules to fix this. However, I want to fork things for two reasons:

  1. It's not as easy as dropping the project and environment tags from the module input. For example, at Azavea, we used environment to namespace resources because a single AWS account never housed more than one application. So, you have resources named like rds${var.environment}EnhancedMonitoringRole that would have to change. Or, you'd need an optional parameter like resource_namespace. I'd want to invest time in determining an idiomatic solution.

  2. We don't have maintainer access to Azavea's repositories. It's unclear whether or not I could get that sort of access, and it's unclear whether they'd have the capacity to review our contributions.

An afterthought: maybe we could have a conversation with their Ops team and see if there's any interest in restructuring things as part of some "social good" shared Philly DevOps group.

deployment/terraform/firewall.tf Show resolved Hide resolved
deployment/terraform/network.tf Show resolved Hide resolved
Comment on lines +26 to +27
aws s3 cp "s3://${IMAGE_DEID_ETL_SETTINGS_BUCKET}/terraform/terraform.tfvars" \
"${IMAGE_DEID_ETL_SETTINGS_BUCKET}.tfvars"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy the terraform.tfvars file from the settings bucket to your local computer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, why not keep it named as terraform.tfvars? That way you don't need to pass in the -var-file parameter in your plan/apply commands.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also why to keep it in s3 not github?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, why not keep it named as terraform.tfvars?

To make it clear that the .tfplan and .tfvars files are both targeting/from a remote backend. And, if you switch between a staging/production backend, there's no chance of applying a plan generated for the wrong environment.

Also why to keep it in s3 not github?

I understand that, for existing infrastructure like Kids First, there is a distinction between config variables and secrets. We feed both into the Terraform project as input variables. But, we store config in the d3b-deployment-config repository, while we store secrets in S3.

I don't think that we should mix Git and S3. When you scatter input variables in different places and different formats, it's hard to see and manage all the config in one place.

For this project, I'd like to treat config variables and secrets as if they're the same data type and store them in S3. I'm leaning towards S3 because of IAM and object encryption. I'm also OK with bucket versioning since you can see which IAM user modified a file, and we can correlate timestamps with deployments. You can also use CloudTrail to kick-off notifications if anyone updates the config.

I'd feel better about Git if we used something like mozilla/sops to encrypt data with AWS KMS. We could even show diffs in cleartext.

However, I wouldn't want to store the encrypted config in the image-deid-etl repository because I've designed this project to be used by anyone with an AWS account. So, the state has to come from somewhere else.

scripts/infra Show resolved Hide resolved
README.md Show resolved Hide resolved
Copy link
Member

@devbyaccident devbyaccident left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you to run https://github.com/bridgecrewio/checkov on this and take a look at the errors? Some of them won't be a big deal, but they will show on our bridgecrew scans if deployed before fixing.

Comment on lines +71 to +72
# TODO: Fork this Terraform module and bring into d3b-center.
source = "github.com/azavea/terraform-aws-postgresql-rds?ref=3.0.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +105 to +106
project = var.project
environment = var.environment
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think it's worth using an auto-tagger like https://github.com/bridgecrewio/yor to tag these resources. Default tags are great, but having some more granular resource tracing will help us troubleshoot when running in prod.

There's a pretty good post about using Yor to tag resources created in terraform and child modules. https://gsd.fundapps.io/how-we-make-yor-work-with-terraform-caller-and-child-modules-22216afd775d

Comment on lines +26 to +27
aws s3 cp "s3://${IMAGE_DEID_ETL_SETTINGS_BUCKET}/terraform/terraform.tfvars" \
"${IMAGE_DEID_ETL_SETTINGS_BUCKET}.tfvars"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, why not keep it named as terraform.tfvars? That way you don't need to pass in the -var-file parameter in your plan/apply commands.

case "$1" in
plan)
# Clear stale modules & remote state, then re-initialize
rm -rf .terraform terraform.tfstate*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicking here, but can you change this to remove .terraform* instead of .terraform? If/when we deploy this in parallel, we'll change the TF_DATA_DIR to avoid collisions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have experience with a "parallel" deployment. If we automate deployments via a CI process, there shouldn't be any shared working directory or accumulated state, so I don't anticipate that there could be a collision.

Are you OK with returning to this once we encounter an issue/discuss parallel deployments?

@rbreslow rbreslow merged commit bf5240e into develop May 13, 2022
@rbreslow rbreslow deleted the feature/rb/vpc-and-rds branch May 13, 2022 14:54
Copy link
Contributor Author

@rbreslow rbreslow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you to run https://github.com/bridgecrewio/checkov on this and take a look at the errors?

I've opened #33 and #32 to capture this.

Comment on lines +26 to +27
aws s3 cp "s3://${IMAGE_DEID_ETL_SETTINGS_BUCKET}/terraform/terraform.tfvars" \
"${IMAGE_DEID_ETL_SETTINGS_BUCKET}.tfvars"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, why not keep it named as terraform.tfvars?

To make it clear that the .tfplan and .tfvars files are both targeting/from a remote backend. And, if you switch between a staging/production backend, there's no chance of applying a plan generated for the wrong environment.

Also why to keep it in s3 not github?

I understand that, for existing infrastructure like Kids First, there is a distinction between config variables and secrets. We feed both into the Terraform project as input variables. But, we store config in the d3b-deployment-config repository, while we store secrets in S3.

I don't think that we should mix Git and S3. When you scatter input variables in different places and different formats, it's hard to see and manage all the config in one place.

For this project, I'd like to treat config variables and secrets as if they're the same data type and store them in S3. I'm leaning towards S3 because of IAM and object encryption. I'm also OK with bucket versioning since you can see which IAM user modified a file, and we can correlate timestamps with deployments. You can also use CloudTrail to kick-off notifications if anyone updates the config.

I'd feel better about Git if we used something like mozilla/sops to encrypt data with AWS KMS. We could even show diffs in cleartext.

However, I wouldn't want to store the encrypted config in the image-deid-etl repository because I've designed this project to be used by anyone with an AWS account. So, the state has to come from somewhere else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create deployment subfolder with base Terraform configuration
3 participants