-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add S3 tile caching to Terraform config #686
Conversation
Support Tilegarden caching by creating an S3 bucket for cached tiles, configuring CloudFront to try the cache first, and updating the app containers to enable caching.
Fix two issues with the tile cache endpoint: * Make sure the CDN has the right endpoint * Make sure that Tilegarden configures environment variables
@@ -8,7 +8,6 @@ resource "aws_cloudfront_distribution" "tilegarden" { | |||
} | |||
|
|||
domain_name = "${var.tilegarden_api_gateway_domain_name}" | |||
origin_path = "/latest" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we're routing /latest
requests directly to the cache, we no longer need to set up this path in the origin config.
custom_origin_config { | ||
http_port = 80 | ||
https_port = 443 | ||
# S3 websites don't support TLS :/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an established way around this? It seems icky to send the second leg of the request over plaintext.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not if we're using the S3 website endpoint. I haven't looked at the other parts of the request path, but has there been consideration to hitting the dynamic endpoint first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had assumed that we would need to use the S3 website endpoint in order to use redirect rules, is there a way of accomplishing that with the dynamic endpoint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're just hitting static content, I think going over HTTP for this leg is OK.
@@ -24,8 +24,8 @@ | |||
], | |||
"scripts": { | |||
"build-all-xml": "./scripts/build-all-xml.sh src/config src/config", | |||
"deploy": "yarn compile && claudia update --no-optional-dependencies ${LAMBDA_TIMEOUT:+--timeout ${LAMBDA_TIMEOUT}} ${LAMBDA_MEMORY:+--memory ${LAMBDA_MEMORY}} ${LAMBDA_SECURITY_GROUPS:+--security-group-ids ${LAMBDA_SECURITY_GROUPS}} ${LAMBDA_SUBNETS:+--subnet-ids ${LAMBDA_SUBNETS}}", | |||
"deploy-new": "yarn compile && claudia create --no-optional-dependencies --api-module dist/api --name ${PROJECT_NAME} --region ${LAMBDA_REGION} ${LAMBDA_ROLE:+--role ${LAMBDA_ROLE}} ${LAMBDA_TIMEOUT:+--timeout ${LAMBDA_TIMEOUT}} ${LAMBDA_MEMORY:+--memory ${LAMBDA_MEMORY}} ${LAMBDA_SECURITY_GROUPS:+--security-group-ids ${LAMBDA_SECURITY_GROUPS}} ${LAMBDA_SUBNETS:+--subnet-ids ${LAMBDA_SUBNETS}} && yarn parse-id", | |||
"deploy": "yarn compile && claudia update --no-optional-dependencies ${LAMBDA_TIMEOUT:+--timeout ${LAMBDA_TIMEOUT}} ${LAMBDA_MEMORY:+--memory ${LAMBDA_MEMORY}} ${LAMBDA_SECURITY_GROUPS:+--security-group-ids ${LAMBDA_SECURITY_GROUPS}} ${LAMBDA_SUBNETS:+--subnet-ids ${LAMBDA_SUBNETS}} ${PFB_TILEGARDEN_CACHE_BUCKET:+--set-env PFB_TILEGARDEN_CACHE_BUCKET=${PFB_TILEGARDEN_CACHE_BUCKET}}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following #673, Tilegarden reads the cache bucket name from process.env.PFB_TILEGARDEN_CACHE_BUCKET
. In order to get environment variables loaded into the Lambda execution environment, however, Claudia needs us to configure them via the --set-env
flag.
} else if (key.startsWith('/')) { | ||
// API Gateway request.path object starts with a leading slash, | ||
// which would cause the uploaded object to have the wrong prefix. | ||
key = key.slice(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While debugging, I noticed that cache tiles were getting written to the S3 bucket in an empty root directory (showing up as a whitespace character for the directory name). I'm pretty sure this is due to another difference between the API Gateway request object representations in claudia-local-api
and AWS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch and thanks for fixing it up.
|
||
viewer_protocol_policy = "redirect-to-https" | ||
min_ttl = 0 | ||
default_ttl = "300" # Five minutes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I was testing out staging, I got some tiles that timed out, then the error message was cached for 5 minutes even though the Lambda function had actually succeeded and written the tile to S3.
So that points to one issue that should be addressed, I assume on the origin definition--the "Origin Response Timeout" should be no shorter than the timeout on the Lambda function (currently 1 minute).
I also think it makes sense to set the TTL for the "new tiles" endpoint as short as is reasonable, so that if anything goes wrong it won't be as long until it tries again. There might not actually be a period during which serving a cached response on the API Gateway origin saves a Lambda invocation--before the first one finishes there's no cached response to return, and the tile gets written to S3 before it gets returned from the Lambda function, so there might not be a time during which a request to the S3 cache would miss but one to the API Gateway origin would hit.
So 5 minutes seems fine for the S3 origin (longer would probably be fine, too, but I don't know if there would be any advantage to it) but it should be overridden for the "new tiles" cache_behavior below, probably to also be 1 minute or slightly more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm glad you caught this, very subtle but important behavior here. I'll bump the Origin Read Timeout to 60 seconds, which should hopefully make timeouts less common.
I agree, I don't see any benefit to caching the Lambda origin. Can you think of any downside to just setting min_ttl
, default_ttl
, and max_ttl
to 0
for that origin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you think of any downside to just setting min_ttl, default_ttl, and max_ttl to 0 for that origin?
Actually now I'm thinking I might have missed an interaction in what I wrote above--would the redirect from /
to /latest/
be cached using the default TTL from default_cache_behavior
? I.e. having gotten a redirect on the initial attempt, will it keep redirecting until the timeout has passed? I was thinking the cached error was entirely the fault of the cache rule on the Lambda endpoint, but now I'm thinking I was getting two cached responses. In which case having a longer TTL on the root/S3 endpoint than on the Lambda endpoint would mean extra invocations until the cached redirect expires.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I.e. having gotten a redirect on the initial attempt, will it keep redirecting until the timeout has passed?
Good point, I think it will indeed cache the redirect -- see my note in the testing instructions above. Sounds like the TTLs should be identical for both origins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, seems like. The change to origin_read_timeout
should take care of the original issue I saw, though, so hopefully there won't be any caching of bad responses anyway.
* Adjust Lambda Tilegarden CDN origin so that its timeout matches the lambda function timeout * Run terraform fmt to clean up some bad formatting
a9a3cd3
to
766f429
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Overview
Support Tilegarden caching by creating an S3 bucket for cached tiles, configuring CloudFront to try the cache first, and updating the app containers to enable caching.
Notes
Testing Instructions
plan
creates):All Places
, and choose a place to load up some tilestiles.staging.pfb.azavea.com/
tiles.staging.pfb.azavea.com/latest
(the non-cached endpoint)/latest
for some short amount of time. To test that the cache is getting written, you can do two things:staging-pfb-tile-cache-us-east-1
to confirm that the path to the tile you've requested exists in the cache/tiles/*
to clear the CloudFront cache, and then check your endpoint again to confirm that the tile got cachedCloses #634.