Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add S3 tile caching to Terraform config #686

Merged
merged 3 commits into from
Jan 31, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions deployment/terraform/app.tf
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,7 @@ data "template_file" "pfb_app_https_ecs_task" {
batch_tilemaker_job_queue_name = "${var.batch_tilemaker_job_queue_name}"
batch_tilemaker_job_definition_name_revision = "${var.batch_tilemaker_job_definition_name_revision}"
tilegarden_root = "${var.tilegarden_root}"
tilegarden_cache_bucket = "${lower(var.environment)}-pfb-tile-cache-${var.aws_region}}"
}
}

Expand Down Expand Up @@ -199,6 +200,7 @@ data "template_file" "pfb_app_async_queue_ecs_task" {
batch_tilemaker_job_queue_name = "${var.batch_tilemaker_job_queue_name}"
batch_tilemaker_job_definition_name_revision = "${var.batch_tilemaker_job_definition_name_revision}"
tilegarden_root = "${var.tilegarden_root}"
tilegarden_cache_bucket = "${lower(var.environment)}-pfb-tile-cache-${var.aws_region}}"
}
}

Expand Down Expand Up @@ -238,6 +240,7 @@ data "template_file" "pfb_app_management_ecs_task" {
batch_tilemaker_job_queue_name = "${var.batch_tilemaker_job_queue_name}"
batch_tilemaker_job_definition_name_revision = "${var.batch_tilemaker_job_definition_name_revision}"
tilegarden_root = "${var.tilegarden_root}"
tilegarden_cache_bucket = "${lower(var.environment)}-pfb-tile-cache-${var.aws_region}}"
}
}

Expand Down
48 changes: 47 additions & 1 deletion deployment/terraform/cdn.tf
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,10 @@ resource "aws_cloudfront_distribution" "tilegarden" {
https_port = 443
origin_protocol_policy = "https-only"
origin_ssl_protocols = ["TLSv1.2", "TLSv1.1", "TLSv1"]
origin_read_timeout = 60
}

domain_name = "${var.tilegarden_api_gateway_domain_name}"
origin_path = "/latest"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we're routing /latest requests directly to the cache, we no longer need to set up this path in the origin config.

origin_id = "tilegardenOriginEastId"

custom_header {
Expand All @@ -17,16 +17,57 @@ resource "aws_cloudfront_distribution" "tilegarden" {
}
}

origin {
custom_origin_config {
http_port = 80
https_port = 443

# S3 websites don't support TLS :/
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an established way around this? It seems icky to send the second leg of the request over plaintext.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not if we're using the S3 website endpoint. I haven't looked at the other parts of the request path, but has there been consideration to hitting the dynamic endpoint first?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had assumed that we would need to use the S3 website endpoint in order to use redirect rules, is there a way of accomplishing that with the dynamic endpoint?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're just hitting static content, I think going over HTTP for this leg is OK.

origin_protocol_policy = "http-only"
origin_ssl_protocols = ["TLSv1.2", "TLSv1.1", "TLSv1"]
}

domain_name = "${aws_s3_bucket.tile_cache.website_endpoint}"
origin_id = "tilegardenCacheOriginEastId"

custom_header {
name = "Accept"
value = "image/png"
}
}

aliases = ["tiles.${var.r53_public_hosted_zone}"]
price_class = "${var.cloudfront_price_class}"
enabled = true
is_ipv6_enabled = true
comment = "${var.project} (${var.environment})"

# Tilegarden cache origin
default_cache_behavior {
allowed_methods = ["GET", "HEAD", "OPTIONS"]
cached_methods = ["GET", "HEAD"]
target_origin_id = "tilegardenCacheOriginEastId"

forwarded_values {
query_string = true

cookies {
forward = "none"
}
}

viewer_protocol_policy = "redirect-to-https"
min_ttl = 0
default_ttl = "300" # Five minutes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I was testing out staging, I got some tiles that timed out, then the error message was cached for 5 minutes even though the Lambda function had actually succeeded and written the tile to S3.

So that points to one issue that should be addressed, I assume on the origin definition--the "Origin Response Timeout" should be no shorter than the timeout on the Lambda function (currently 1 minute).

I also think it makes sense to set the TTL for the "new tiles" endpoint as short as is reasonable, so that if anything goes wrong it won't be as long until it tries again. There might not actually be a period during which serving a cached response on the API Gateway origin saves a Lambda invocation--before the first one finishes there's no cached response to return, and the tile gets written to S3 before it gets returned from the Lambda function, so there might not be a time during which a request to the S3 cache would miss but one to the API Gateway origin would hit.

So 5 minutes seems fine for the S3 origin (longer would probably be fine, too, but I don't know if there would be any advantage to it) but it should be overridden for the "new tiles" cache_behavior below, probably to also be 1 minute or slightly more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad you caught this, very subtle but important behavior here. I'll bump the Origin Read Timeout to 60 seconds, which should hopefully make timeouts less common.

I agree, I don't see any benefit to caching the Lambda origin. Can you think of any downside to just setting min_ttl, default_ttl, and max_ttl to 0 for that origin?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you think of any downside to just setting min_ttl, default_ttl, and max_ttl to 0 for that origin?

Actually now I'm thinking I might have missed an interaction in what I wrote above--would the redirect from / to /latest/ be cached using the default TTL from default_cache_behavior? I.e. having gotten a redirect on the initial attempt, will it keep redirecting until the timeout has passed? I was thinking the cached error was entirely the fault of the cache rule on the Lambda endpoint, but now I'm thinking I was getting two cached responses. In which case having a longer TTL on the root/S3 endpoint than on the Lambda endpoint would mean extra invocations until the cached redirect expires.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I.e. having gotten a redirect on the initial attempt, will it keep redirecting until the timeout has passed?

Good point, I think it will indeed cache the redirect -- see my note in the testing instructions above. Sounds like the TTLs should be identical for both origins.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, seems like. The change to origin_read_timeout should take care of the original issue I saw, though, so hopefully there won't be any caching of bad responses anyway.

max_ttl = "86400" # One day
}

# Tilegarden lambda origin for generating new tiles
cache_behavior {
allowed_methods = ["GET", "HEAD", "OPTIONS"]
cached_methods = ["GET", "HEAD"]
target_origin_id = "tilegardenOriginEastId"
path_pattern = "/latest/*"

forwarded_values {
query_string = true
Expand All @@ -53,4 +94,9 @@ resource "aws_cloudfront_distribution" "tilegarden" {
ssl_support_method = "sni-only"
minimum_protocol_version = "TLSv1"
}

tags {
Project = "${var.project}"
Environment = "${var.environment}"
}
}
27 changes: 24 additions & 3 deletions deployment/terraform/iam.tf
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,27 @@ data "aws_iam_policy_document" "anonymous_read_storage_bucket_policy" {
}
}

data "aws_iam_policy_document" "anonymous_read_tile_cache_bucket_policy" {
policy_id = "S3TileCacheAnonymousReadPolicy"

statement {
sid = "S3ReadOnly"

effect = "Allow"

principals {
type = "AWS"
identifiers = ["*"]
}

actions = ["s3:GetObject"]

resources = [
"arn:aws:s3:::${lower(var.environment)}-pfb-tile-cache-${var.aws_region}/*",
]
}
}

#
# Custom policies
#
Expand Down Expand Up @@ -145,8 +166,8 @@ resource "aws_iam_role_policy_attachment" "ecs_for_ec2_policy_container_instance
}

resource "aws_iam_instance_profile" "app_container_instance" {
name = "${aws_iam_role.app_container_instance_ec2.name}"
role = "${aws_iam_role.app_container_instance_ec2.name}"
name = "${aws_iam_role.app_container_instance_ec2.name}"
role = "${aws_iam_role.app_container_instance_ec2.name}"
}

#
Expand All @@ -168,6 +189,6 @@ resource "aws_iam_role_policy_attachment" "batch_ec2_s3_policy" {
}

resource "aws_iam_instance_profile" "batch_container_instance" {
name = "${aws_iam_role.batch_container_instance_ec2.name}"
name = "${aws_iam_role.batch_container_instance_ec2.name}"
role = "${aws_iam_role.batch_container_instance_ec2.name}"
}
42 changes: 40 additions & 2 deletions deployment/terraform/storage.tf
Original file line number Diff line number Diff line change
Expand Up @@ -34,11 +34,49 @@ resource "aws_s3_bucket" "storage" {
}

lifecycle_rule {
id = "osm_extracts"
id = "osm_extracts"
enabled = true
prefix = "/osm-data-cache"
prefix = "/osm-data-cache"

expiration {
days = 7
}
}
}

resource "aws_s3_bucket" "tile_cache" {
bucket = "${lower(var.environment)}-pfb-tile-cache-${var.aws_region}"
acl = "public-read"
policy = "${data.aws_iam_policy_document.anonymous_read_tile_cache_bucket_policy.json}"

cors_rule {
allowed_headers = ["Authorization"]
allowed_methods = ["GET"]
allowed_origins = ["*"]
expose_headers = []
max_age_seconds = "3000"
}

website {
index_document = "index.html"

routing_rules = <<EOF
[{
"Condition": {
"HttpErrorCodeReturnedEquals": "404"
},
"Redirect": {
"HostName": "tiles.${var.r53_public_hosted_zone}",
"HttpRedirectCode": "302",
"Protocol": "https",
"ReplaceKeyPrefixWith": "latest/"
}
}]
EOF
}

tags {
Project = "${var.project}"
Environment = "${var.environment}"
}
}
4 changes: 4 additions & 0 deletions deployment/terraform/task-definitions/app.json
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,10 @@
{
"name": "PFB_TILEGARDEN_ROOT",
"value": "${tilegarden_root}"
},
{
"name": "PFB_TILEGARDEN_CACHE_BUCKET",
"value": "${tilegarden_cache_bucket}"
}
],
"logConfiguration": {
Expand Down
8 changes: 6 additions & 2 deletions deployment/terraform/task-definitions/django-q.json
Original file line number Diff line number Diff line change
Expand Up @@ -79,8 +79,12 @@
"value": "${batch_tilemaker_job_definition_name_revision}"
},
{
"name": "PFB_TILEGARDEN_ROOT",
"value": "${tilegarden_root}"
"name": "PFB_TILEGARDEN_ROOT",
"value": "${tilegarden_root}"
},
{
"name": "PFB_TILEGARDEN_CACHE_BUCKET",
"value": "${tilegarden_cache_bucket}"
}
],
"logConfiguration": {
Expand Down
4 changes: 4 additions & 0 deletions deployment/terraform/task-definitions/management.json
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,10 @@
{
"name": "PFB_TILEGARDEN_ROOT",
"value": "${tilegarden_root}"
},
{
"name": "PFB_TILEGARDEN_CACHE_BUCKET",
"value": "${tilegarden_cache_bucket}"
}
],
"logConfiguration": {
Expand Down
1 change: 1 addition & 0 deletions deployment/terraform/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,7 @@ variable "pfb_app_alb_ingress_cidr_block" {

# CloudFront distribution
variable "tilegarden_api_gateway_domain_name" {}

variable "cloudfront_price_class" {
default = "PriceClass_100"
}
1 change: 1 addition & 0 deletions src/tilegarden/.env.example
Original file line number Diff line number Diff line change
Expand Up @@ -32,3 +32,4 @@ PFB_DB_DATABASE=
PFB_DB_PASSWORD=
PFB_DB_PORT=
PFB_DB_USER=
PFB_TILEGARDEN_CACHE_BUCKET=
4 changes: 2 additions & 2 deletions src/tilegarden/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,8 @@
],
"scripts": {
"build-all-xml": "./scripts/build-all-xml.sh src/config src/config",
"deploy": "yarn compile && claudia update --no-optional-dependencies ${LAMBDA_TIMEOUT:+--timeout ${LAMBDA_TIMEOUT}} ${LAMBDA_MEMORY:+--memory ${LAMBDA_MEMORY}} ${LAMBDA_SECURITY_GROUPS:+--security-group-ids ${LAMBDA_SECURITY_GROUPS}} ${LAMBDA_SUBNETS:+--subnet-ids ${LAMBDA_SUBNETS}}",
"deploy-new": "yarn compile && claudia create --no-optional-dependencies --api-module dist/api --name ${PROJECT_NAME} --region ${LAMBDA_REGION} ${LAMBDA_ROLE:+--role ${LAMBDA_ROLE}} ${LAMBDA_TIMEOUT:+--timeout ${LAMBDA_TIMEOUT}} ${LAMBDA_MEMORY:+--memory ${LAMBDA_MEMORY}} ${LAMBDA_SECURITY_GROUPS:+--security-group-ids ${LAMBDA_SECURITY_GROUPS}} ${LAMBDA_SUBNETS:+--subnet-ids ${LAMBDA_SUBNETS}} && yarn parse-id",
"deploy": "yarn compile && claudia update --no-optional-dependencies ${LAMBDA_TIMEOUT:+--timeout ${LAMBDA_TIMEOUT}} ${LAMBDA_MEMORY:+--memory ${LAMBDA_MEMORY}} ${LAMBDA_SECURITY_GROUPS:+--security-group-ids ${LAMBDA_SECURITY_GROUPS}} ${LAMBDA_SUBNETS:+--subnet-ids ${LAMBDA_SUBNETS}} ${PFB_TILEGARDEN_CACHE_BUCKET:+--set-env PFB_TILEGARDEN_CACHE_BUCKET=${PFB_TILEGARDEN_CACHE_BUCKET}}",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following #673, Tilegarden reads the cache bucket name from process.env.PFB_TILEGARDEN_CACHE_BUCKET. In order to get environment variables loaded into the Lambda execution environment, however, Claudia needs us to configure them via the --set-env flag.

"deploy-new": "yarn compile && claudia create --no-optional-dependencies --api-module dist/api --name ${PROJECT_NAME} --region ${LAMBDA_REGION} ${LAMBDA_ROLE:+--role ${LAMBDA_ROLE}} ${LAMBDA_TIMEOUT:+--timeout ${LAMBDA_TIMEOUT}} ${LAMBDA_MEMORY:+--memory ${LAMBDA_MEMORY}} ${LAMBDA_SECURITY_GROUPS:+--security-group-ids ${LAMBDA_SECURITY_GROUPS}} ${LAMBDA_SUBNETS:+--subnet-ids ${LAMBDA_SUBNETS}} ${PFB_TILEGARDEN_CACHE_BUCKET:+--set-env PFB_TILEGARDEN_CACHE_BUCKET=${PFB_TILEGARDEN_CACHE_BUCKET}} && yarn parse-id",
"destroy": "claudia destroy",
"dev": "nodemon -e js,mss,json,mml,mss --ignore dist/ --ignore '*.temp.mml' --exec yarn local",
"lint": "eslint src",
Expand Down
4 changes: 4 additions & 0 deletions src/tilegarden/src/api.js
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,10 @@ const writeToS3 = (tile, req) => {
const { z, x, y, job_id, config } = req.pathParameters
key = `tile/${job_id}/${config}/${z}/${x}/${y}`
/* eslint-enable camelcase */
} else if (key.startsWith('/')) {
// API Gateway request.path object starts with a leading slash,
// which would cause the uploaded object to have the wrong prefix.
key = key.slice(1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While debugging, I noticed that cache tiles were getting written to the S3 bucket in an empty root directory (showing up as a whitespace character for the directory name). I'm pretty sure this is due to another difference between the API Gateway request object representations in claudia-local-api and AWS.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch and thanks for fixing it up.

}

const upload = new aws.S3().putObject({
Expand Down