Add S3 tile caching to Terraform config #686

jeancochrane · 2019-01-29T20:03:50Z

Overview

Support Tilegarden caching by creating an S3 bucket for cached tiles, configuring CloudFront to try the cache first, and updating the app containers to enable caching.

Notes

I had to make a few small adjustments to Tilegarden to get this up and running. Because of that, I've requested @KlaasH's review on this as well.

Testing Instructions

In the VM, run a plan/apply cycle and confirm that only the task definitions get updated (to match new batch jobs that plan creates):

vagrant@vagrant-ubuntu-trusty-64:~$ export GIT_COMMIT=a7daa2a
vagrant@vagrant-ubuntu-trusty-64:~$ export ENVIRONMENT=staging
vagrant@vagrant-ubuntu-trusty-64:~$ export PFB_AWS_ECR_ENDPOINT="950872791630.dkr.ecr.us-east-1.amazonaws.com"
vagrant@vagrant-ubuntu-trusty-64:~$ export PFB_SETTINGS_BUCKET=staging-pfb-config-us-east-1
vagrant@vagrant-ubuntu-trusty-64:~$ ./scripts/infra plan
...
vagrant@vagrant-ubuntu-trusty-64:~$ ./scripts/infra apply

Visit the staging site at https://staging.pfb.azavea.com, select All Places, and choose a place to load up some tiles
If the view has been visited before: you should see rapid tile display, and the network tab should show that the tiles are coming from tiles.staging.pfb.azavea.com/
- This behavior is visible on https://staging.pfb.azavea.com/#/places/65302256-225c-49db-be9b-19bb51a96bb9/
If the view has not been visited before: Lambda should take a few seconds to generate the tiles, and the network tab should show that the tiles are coming from tiles.staging.pfb.azavea.com/latest (the non-cached endpoint)
- Hard refreshing the page may not confirm that the tile has been written to the cache, since (I believe) CloudFront will cache the 302 redirect to /latest for some short amount of time. To test that the cache is getting written, you can do two things:
  1. Check the cache S3 bucket staging-pfb-tile-cache-us-east-1 to confirm that the path to the tile you've requested exists in the cache
  2. Create a CloudFront invalidation for /tiles/* to clear the CloudFront cache, and then check your endpoint again to confirm that the tile got cached

Closes #634.

Support Tilegarden caching by creating an S3 bucket for cached tiles, configuring CloudFront to try the cache first, and updating the app containers to enable caching.

Fix two issues with the tile cache endpoint: * Make sure the CDN has the right endpoint * Make sure that Tilegarden configures environment variables

jeancochrane · 2019-01-29T20:04:56Z

deployment/terraform/cdn.tf

@@ -8,7 +8,6 @@ resource "aws_cloudfront_distribution" "tilegarden" {
    }

    domain_name = "${var.tilegarden_api_gateway_domain_name}"
-    origin_path = "/latest"


Now that we're routing /latest requests directly to the cache, we no longer need to set up this path in the origin config.

jeancochrane · 2019-01-29T20:05:31Z

deployment/terraform/cdn.tf

+    custom_origin_config {
+      http_port              = 80
+      https_port             = 443
+      # S3 websites don't support TLS :/


Is there an established way around this? It seems icky to send the second leg of the request over plaintext.

Not if we're using the S3 website endpoint. I haven't looked at the other parts of the request path, but has there been consideration to hitting the dynamic endpoint first?

I had assumed that we would need to use the S3 website endpoint in order to use redirect rules, is there a way of accomplishing that with the dynamic endpoint?

Since we're just hitting static content, I think going over HTTP for this leg is OK.

jeancochrane · 2019-01-29T20:08:19Z

src/tilegarden/package.json

@@ -24,8 +24,8 @@
  ],
  "scripts": {
    "build-all-xml": "./scripts/build-all-xml.sh src/config src/config",
-    "deploy": "yarn compile && claudia update --no-optional-dependencies ${LAMBDA_TIMEOUT:+--timeout ${LAMBDA_TIMEOUT}} ${LAMBDA_MEMORY:+--memory ${LAMBDA_MEMORY}} ${LAMBDA_SECURITY_GROUPS:+--security-group-ids ${LAMBDA_SECURITY_GROUPS}} ${LAMBDA_SUBNETS:+--subnet-ids ${LAMBDA_SUBNETS}}",
-    "deploy-new": "yarn compile && claudia create --no-optional-dependencies --api-module dist/api --name ${PROJECT_NAME} --region ${LAMBDA_REGION} ${LAMBDA_ROLE:+--role ${LAMBDA_ROLE}} ${LAMBDA_TIMEOUT:+--timeout ${LAMBDA_TIMEOUT}} ${LAMBDA_MEMORY:+--memory ${LAMBDA_MEMORY}} ${LAMBDA_SECURITY_GROUPS:+--security-group-ids ${LAMBDA_SECURITY_GROUPS}} ${LAMBDA_SUBNETS:+--subnet-ids ${LAMBDA_SUBNETS}} && yarn parse-id",
+    "deploy": "yarn compile && claudia update --no-optional-dependencies ${LAMBDA_TIMEOUT:+--timeout ${LAMBDA_TIMEOUT}} ${LAMBDA_MEMORY:+--memory ${LAMBDA_MEMORY}} ${LAMBDA_SECURITY_GROUPS:+--security-group-ids ${LAMBDA_SECURITY_GROUPS}} ${LAMBDA_SUBNETS:+--subnet-ids ${LAMBDA_SUBNETS}} ${PFB_TILEGARDEN_CACHE_BUCKET:+--set-env PFB_TILEGARDEN_CACHE_BUCKET=${PFB_TILEGARDEN_CACHE_BUCKET}}",


Following #673, Tilegarden reads the cache bucket name from process.env.PFB_TILEGARDEN_CACHE_BUCKET. In order to get environment variables loaded into the Lambda execution environment, however, Claudia needs us to configure them via the --set-env flag.

jeancochrane · 2019-01-29T20:09:44Z

src/tilegarden/src/api.js

+        } else if (key.startsWith('/')) {
+            // API Gateway request.path object starts with a leading slash,
+            // which would cause the uploaded object to have the wrong prefix.
+            key = key.slice(1)


While debugging, I noticed that cache tiles were getting written to the S3 bucket in an empty root directory (showing up as a whitespace character for the directory name). I'm pretty sure this is due to another difference between the API Gateway request object representations in claudia-local-api and AWS.

Good catch and thanks for fixing it up.

KlaasH · 2019-01-30T16:24:36Z

deployment/terraform/cdn.tf

+
+    viewer_protocol_policy = "redirect-to-https"
+    min_ttl                = 0
+    default_ttl            = "300"               # Five minutes


When I was testing out staging, I got some tiles that timed out, then the error message was cached for 5 minutes even though the Lambda function had actually succeeded and written the tile to S3.

So that points to one issue that should be addressed, I assume on the origin definition--the "Origin Response Timeout" should be no shorter than the timeout on the Lambda function (currently 1 minute).

I also think it makes sense to set the TTL for the "new tiles" endpoint as short as is reasonable, so that if anything goes wrong it won't be as long until it tries again. There might not actually be a period during which serving a cached response on the API Gateway origin saves a Lambda invocation--before the first one finishes there's no cached response to return, and the tile gets written to S3 before it gets returned from the Lambda function, so there might not be a time during which a request to the S3 cache would miss but one to the API Gateway origin would hit.

So 5 minutes seems fine for the S3 origin (longer would probably be fine, too, but I don't know if there would be any advantage to it) but it should be overridden for the "new tiles" cache_behavior below, probably to also be 1 minute or slightly more.

I'm glad you caught this, very subtle but important behavior here. I'll bump the Origin Read Timeout to 60 seconds, which should hopefully make timeouts less common.

I agree, I don't see any benefit to caching the Lambda origin. Can you think of any downside to just setting min_ttl, default_ttl, and max_ttl to 0 for that origin?

Can you think of any downside to just setting min_ttl, default_ttl, and max_ttl to 0 for that origin?

Actually now I'm thinking I might have missed an interaction in what I wrote above--would the redirect from / to /latest/ be cached using the default TTL from default_cache_behavior? I.e. having gotten a redirect on the initial attempt, will it keep redirecting until the timeout has passed? I was thinking the cached error was entirely the fault of the cache rule on the Lambda endpoint, but now I'm thinking I was getting two cached responses. In which case having a longer TTL on the root/S3 endpoint than on the Lambda endpoint would mean extra invocations until the cached redirect expires.

I.e. having gotten a redirect on the initial attempt, will it keep redirecting until the timeout has passed?

Good point, I think it will indeed cache the redirect -- see my note in the testing instructions above. Sounds like the TTLs should be identical for both origins.

Yeah, seems like. The change to origin_read_timeout should take care of the original issue I saw, though, so hopefully there won't be any caching of bad responses anyway.

* Adjust Lambda Tilegarden CDN origin so that its timeout matches the lambda function timeout * Run terraform fmt to clean up some bad formatting

KlaasH

This looks good to me 👍

hectcastro

👍

Jean Cochrane added 2 commits January 29, 2019 12:40

Add resources for S3 caching to Tilegarden

e79f679

Support Tilegarden caching by creating an S3 bucket for cached tiles, configuring CloudFront to try the cache first, and updating the app containers to enable caching.

Pass correct tile cache endpoint to CDN and Tilegarden

6282f7a

Fix two issues with the tile cache endpoint: * Make sure the CDN has the right endpoint * Make sure that Tilegarden configures environment variables

jeancochrane requested review from hectcastro and KlaasH January 29, 2019 20:03

chitchens added the in review label Jan 29, 2019

jeancochrane commented Jan 29, 2019

View reviewed changes

KlaasH reviewed Jan 30, 2019

View reviewed changes

Adjust timeout for Lambda Tilegarden and Run terraform fmt

766f429

* Adjust Lambda Tilegarden CDN origin so that its timeout matches the lambda function timeout * Run terraform fmt to clean up some bad formatting

jeancochrane force-pushed the feature/jfc/s3-tile-cache branch from a9a3cd3 to 766f429 Compare January 30, 2019 20:59

KlaasH approved these changes Jan 30, 2019

View reviewed changes

hectcastro approved these changes Jan 31, 2019

View reviewed changes

jeancochrane merged commit bf9a357 into develop Jan 31, 2019

hectcastro removed the in review label Jan 31, 2019

jeancochrane deleted the feature/jfc/s3-tile-cache branch January 31, 2019 17:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add S3 tile caching to Terraform config #686

Add S3 tile caching to Terraform config #686

jeancochrane commented Jan 29, 2019

jeancochrane Jan 29, 2019

jeancochrane Jan 29, 2019

hectcastro Jan 30, 2019

jeancochrane Jan 30, 2019

hectcastro Jan 31, 2019

jeancochrane Jan 29, 2019

jeancochrane Jan 29, 2019

KlaasH Jan 30, 2019

KlaasH Jan 30, 2019

jeancochrane Jan 30, 2019

KlaasH Jan 30, 2019

jeancochrane Jan 30, 2019

KlaasH Jan 30, 2019

KlaasH left a comment

hectcastro left a comment

Add S3 tile caching to Terraform config #686

Add S3 tile caching to Terraform config #686

Conversation

jeancochrane commented Jan 29, 2019

Overview

Notes

Testing Instructions

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KlaasH left a comment

Choose a reason for hiding this comment

hectcastro left a comment

Choose a reason for hiding this comment