-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Terraform configuration for VPC and RDS #30
Conversation
- Bootstrap a Terraform project that consumes the VPC created by CHOP IS and creates an RDS instance and Bastion host. - Utilize our Terraform container image and the `infra` wrapper script to enable consistent and portable Terraform execution.
project = var.project | ||
environment = var.environment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default_tags
eliminates the need to specify tags for every resource in the Terraform project. However, if you define tags twice, both at the resource and default tags level, then the Terraform plan will always be noisy.
This behavior is a known issue in the Azavea-series of Terraform modules: azavea/terraform-aws-postgresql-rds#42.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think it's worth using an auto-tagger like https://github.com/bridgecrewio/yor to tag these resources. Default tags are great, but having some more granular resource tracing will help us troubleshoot when running in prod.
There's a pretty good post about using Yor to tag resources created in terraform and child modules. https://gsd.fundapps.io/how-we-make-yor-work-with-terraform-caller-and-child-modules-22216afd775d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that an auto-tagger like Yor can make sense for some of D3b's infrastructure where Terraform code is spread across many different repositories, making it hard to see and manage everything all in one place.
I considered it here. Here's my take: since all of the Terraform code for the project is in one place, a git blame
would provide as much value while making the code much easier to read. Think, you're on AWS, you see the "Image Deid ETL" project tag, and you know that anything in this repository corresponds 1:1 with the resources in the AWS console.
Also, using an auto-tagger like Yor prevents other folks from spinning up the Terraform project in their own organization since every resource accumulates D3b-specific state.
Would you consider rolling with my approach, here, to see how it feels?
# TODO: Fork this Terraform module and bring into d3b-center. | ||
source = "github.com/azavea/terraform-aws-postgresql-rds?ref=3.0.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am using this Terraform module, which I worked on at Azavea, in the interest of expediency. I think we should fork the module and bring it into our organization so we can make d3b-specific tweaks, like resolving the default_tags
issue I've mentioned below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you looked at https://github.com/kids-first/aws-infra-postgres-rds-module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think, about fixing that issue in the original module? The module will not require a lot of tweaks (except tags). We can decide on that later. This time around, this works great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you looked at https://github.com/kids-first/aws-infra-postgres-rds-module?
Yes, however, I didn't look closely at the modifications made, so I wanted to stick with a familiar interface.
What do you think, about fixing that issue in the original module? The module will not require a lot of tweaks (except tags).
I agree, I think someone should fix it, and I even opened issues on all of Azavea's Terraform modules to fix this. However, I want to fork things for two reasons:
-
It's not as easy as dropping the
project
andenvironment
tags from the module input. For example, at Azavea, we usedenvironment
to namespace resources because a single AWS account never housed more than one application. So, you have resources named likerds${var.environment}EnhancedMonitoringRole
that would have to change. Or, you'd need an optional parameter likeresource_namespace
. I'd want to invest time in determining an idiomatic solution. -
We don't have maintainer access to Azavea's repositories. It's unclear whether or not I could get that sort of access, and it's unclear whether they'd have the capacity to review our contributions.
An afterthought: maybe we could have a conversation with their Ops team and see if there's any interest in restructuring things as part of some "social good" shared Philly DevOps group.
aws s3 cp "s3://${IMAGE_DEID_ETL_SETTINGS_BUCKET}/terraform/terraform.tfvars" \ | ||
"${IMAGE_DEID_ETL_SETTINGS_BUCKET}.tfvars" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy the terraform.tfvars
file from the settings bucket to your local computer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious, why not keep it named as terraform.tfvars
? That way you don't need to pass in the -var-file
parameter in your plan/apply commands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also why to keep it in s3 not github?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious, why not keep it named as terraform.tfvars?
To make it clear that the .tfplan
and .tfvars
files are both targeting/from a remote backend. And, if you switch between a staging/production backend, there's no chance of applying a plan generated for the wrong environment.
Also why to keep it in s3 not github?
I understand that, for existing infrastructure like Kids First, there is a distinction between config variables and secrets. We feed both into the Terraform project as input variables. But, we store config in the d3b-deployment-config repository, while we store secrets in S3.
I don't think that we should mix Git and S3. When you scatter input variables in different places and different formats, it's hard to see and manage all the config in one place.
For this project, I'd like to treat config variables and secrets as if they're the same data type and store them in S3. I'm leaning towards S3 because of IAM and object encryption. I'm also OK with bucket versioning since you can see which IAM user modified a file, and we can correlate timestamps with deployments. You can also use CloudTrail to kick-off notifications if anyone updates the config.
I'd feel better about Git if we used something like mozilla/sops to encrypt data with AWS KMS. We could even show diffs in cleartext.
However, I wouldn't want to store the encrypted config in the image-deid-etl repository because I've designed this project to be used by anyone with an AWS account. So, the state has to come from somewhere else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you to run https://github.com/bridgecrewio/checkov on this and take a look at the errors? Some of them won't be a big deal, but they will show on our bridgecrew scans if deployed before fixing.
# TODO: Fork this Terraform module and bring into d3b-center. | ||
source = "github.com/azavea/terraform-aws-postgresql-rds?ref=3.0.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you looked at https://github.com/kids-first/aws-infra-postgres-rds-module?
project = var.project | ||
environment = var.environment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think it's worth using an auto-tagger like https://github.com/bridgecrewio/yor to tag these resources. Default tags are great, but having some more granular resource tracing will help us troubleshoot when running in prod.
There's a pretty good post about using Yor to tag resources created in terraform and child modules. https://gsd.fundapps.io/how-we-make-yor-work-with-terraform-caller-and-child-modules-22216afd775d
aws s3 cp "s3://${IMAGE_DEID_ETL_SETTINGS_BUCKET}/terraform/terraform.tfvars" \ | ||
"${IMAGE_DEID_ETL_SETTINGS_BUCKET}.tfvars" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious, why not keep it named as terraform.tfvars
? That way you don't need to pass in the -var-file
parameter in your plan/apply commands.
case "$1" in | ||
plan) | ||
# Clear stale modules & remote state, then re-initialize | ||
rm -rf .terraform terraform.tfstate* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpicking here, but can you change this to remove .terraform*
instead of .terraform
? If/when we deploy this in parallel, we'll change the TF_DATA_DIR to avoid collisions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have experience with a "parallel" deployment. If we automate deployments via a CI process, there shouldn't be any shared working directory or accumulated state, so I don't anticipate that there could be a collision.
Are you OK with returning to this once we encounter an issue/discuss parallel deployments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you to run https://github.com/bridgecrewio/checkov on this and take a look at the errors?
aws s3 cp "s3://${IMAGE_DEID_ETL_SETTINGS_BUCKET}/terraform/terraform.tfvars" \ | ||
"${IMAGE_DEID_ETL_SETTINGS_BUCKET}.tfvars" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious, why not keep it named as terraform.tfvars?
To make it clear that the .tfplan
and .tfvars
files are both targeting/from a remote backend. And, if you switch between a staging/production backend, there's no chance of applying a plan generated for the wrong environment.
Also why to keep it in s3 not github?
I understand that, for existing infrastructure like Kids First, there is a distinction between config variables and secrets. We feed both into the Terraform project as input variables. But, we store config in the d3b-deployment-config repository, while we store secrets in S3.
I don't think that we should mix Git and S3. When you scatter input variables in different places and different formats, it's hard to see and manage all the config in one place.
For this project, I'd like to treat config variables and secrets as if they're the same data type and store them in S3. I'm leaning towards S3 because of IAM and object encryption. I'm also OK with bucket versioning since you can see which IAM user modified a file, and we can correlate timestamps with deployments. You can also use CloudTrail to kick-off notifications if anyone updates the config.
I'd feel better about Git if we used something like mozilla/sops to encrypt data with AWS KMS. We could even show diffs in cleartext.
However, I wouldn't want to store the encrypted config in the image-deid-etl repository because I've designed this project to be used by anyone with an AWS account. So, the state has to come from somewhere else.
Overview
infra
wrapper script to enable consistent and portable Terraform execution.Resolves #7
Checklist
fixup!
commitsTesting Instructions
Launch an instance of the included Terraform container image:
Use
infra
to generate and apply a Terraform plan:Note: You should see several warnings about updating resource tags and "values for undeclared variables." The first warning is a side effect of using
default_tags
along with the RDS module from Azavea. The second warning refers to values I declared in theterraform.tfvars
file for my next pull request.Next, DM me on Slack with the public key you use for SSH so I can add it to the authorized_keys file. Then, while connected to the CHOP VPN, open a shell on the Bastion host:
Using the database password in the
terraform.tfvars
file, see that you can connect to the Bastion:Note: I installed the PostgreSQL client tooling on the Bastion out-of-band from the Extras Library.