Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write tiles to S3 cache #673

Merged
merged 6 commits into from
Jan 24, 2019
Merged

Write tiles to S3 cache #673

merged 6 commits into from
Jan 24, 2019

Conversation

KlaasH
Copy link
Contributor

@KlaasH KlaasH commented Jan 23, 2019

Overview

Adds a step in the tile generation process where it checks for a PFB_TILEGARDEN_CACHE_BUCKET variable in the environment and, if it's set, writes the tile into that bucket, using the request path as the key.

Currently there's no way to serve the tiles from cache. #634 covers that part.

Also adds a script to clear all files in the configured cache bucket. I'm not sure that will be the way to do it in deployed environments (where it's harder to get a shell, and where the number of items might make the "change settings to make all objects expire, change settings back" deletion strategy preferable), but it seems useful in development.

Demo

vagrant@pfb-network-connectivity:/vagrant$ aws s3 ls --recursive s3://dev-pfb-tilecache-us-east-1/
2019-01-23 17:00:57        334 tile/80bbb2a8-5399-40f2-95a8-537917f4197c/census_blocks/13/2103/2894.png
2019-01-23 17:00:56        334 tile/80bbb2a8-5399-40f2-95a8-537917f4197c/census_blocks/13/2103/2895.png
2019-01-23 17:00:56        334 tile/80bbb2a8-5399-40f2-95a8-537917f4197c/census_blocks/13/2103/2896.png
2019-01-23 17:00:56        334 tile/80bbb2a8-5399-40f2-95a8-537917f4197c/census_blocks/13/2103/2897.png
2019-01-23 17:00:57        334 tile/80bbb2a8-5399-40f2-95a8-537917f4197c/census_blocks/13/2103/2898.png
2019-01-23 17:00:56        334 tile/80bbb2a8-5399-40f2-95a8-537917f4197c/census_blocks/13/2104/2898.png
2019-01-23 17:00:57        334 tile/80bbb2a8-5399-40f2-95a8-537917f4197c/census_blocks/13/2105/2898.png
2019-01-23 17:00:57       3395 tile/80bbb2a8-5399-40f2-95a8-537917f4197c/census_blocks/13/2106/2898.png
2019-01-23 17:00:56       4447 tile/80bbb2a8-5399-40f2-95a8-537917f4197c/census_blocks/13/2107/2898.png
2019-01-23 17:00:56        334 tile/80bbb2a8-5399-40f2-95a8-537917f4197c/census_blocks/13/2108/2898.png
2019-01-23 17:00:57        334 tile/80bbb2a8-5399-40f2-95a8-537917f4197c/census_blocks/13/2109/2898.png
2019-01-23 17:52:21        334 tile/80bbb2a8-5399-40f2-95a8-537917f4197c/ways/13/2103/2894.png
2019-01-23 17:52:20        334 tile/80bbb2a8-5399-40f2-95a8-537917f4197c/ways/13/2103/2895.png
2019-01-23 17:52:20        334 tile/80bbb2a8-5399-40f2-95a8-537917f4197c/ways/13/2103/2896.png
2019-01-23 17:52:21        334 tile/80bbb2a8-5399-40f2-95a8-537917f4197c/ways/13/2103/2897.png
2019-01-23 17:52:21        334 tile/80bbb2a8-5399-40f2-95a8-537917f4197c/ways/13/2103/2898.png
2019-01-23 17:52:19        334 tile/80bbb2a8-5399-40f2-95a8-537917f4197c/ways/13/2104/2898.png
2019-01-23 17:52:20        334 tile/80bbb2a8-5399-40f2-95a8-537917f4197c/ways/13/2105/2898.png
2019-01-23 17:52:19       4204 tile/80bbb2a8-5399-40f2-95a8-537917f4197c/ways/13/2106/2898.png
2019-01-23 17:52:19       7566 tile/80bbb2a8-5399-40f2-95a8-537917f4197c/ways/13/2107/2898.png
2019-01-23 17:52:19      16902 tile/80bbb2a8-5399-40f2-95a8-537917f4197c/ways/13/2108/2898.png
2019-01-23 17:52:21        334 tile/80bbb2a8-5399-40f2-95a8-537917f4197c/ways/13/2109/2898.png

Notes

  • Since I'm expecting serving from cache to be set up as an S3 Website, the key has to match the full request path exactly. So the tiles have to go in the root of the bucket, not a subdirectory. Or possibly there's a way to configure an S3 website to strip off a prefix, and it could be more flexible. But I didn't manage to find an option like that in my limited looking.
  • I don't think it's going to be possible to test the full caching setup locally, since Cloudfront can't use your local machine as an origin. See 593.
  • This also switches to using the AWS_PROXY request format for Claudia API Builder. See docs.
  • Unfortunately claudia-local-api doesn't fully reproduce the AWS Proxy-style request object (see Support AWS_PROXY request format claudia-local-api#1). As noted in the code, this works around that by reconstructing the path from the parameters.

Testing Instructions

  • In the VM, run ./scripts/update and ./scripts/server
  • Go to a neighborhood in your dev front-end. Make sure the tiles are coming from Tilegarden, not S3 (i.e. that your TILEGARDEN_ROOT variable is set for the container).
  • Any tiles you've just generated should now be in the dev-pfb-tilecache-us-east-1 bucket (you can run ./scripts/clear-tile-cache to clear out the clutter if it's hard to tell).

Resolves #635
Resolves #636

Copy link
Contributor

@jeancochrane jeancochrane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me! I was able to push tiles up in dev, and also remove the env var to block the upload. The clear script worked fine, too. Seems like there are a few linter errors in the Jenkins log, but I trust you'll fix those.

Good work reconstructing the path from parameters, that looks like it was slightly tricky.


set -e

BUCKET_NAME=$(docker-compose run --rm tilegarden bash -c 'echo -n $PFB_TILEGARDEN_CACHE_BUCKET')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it's worth allowing the user to set this bucket themselves, without having to edit common.yml? e.g.:

if [[ -z "${BUCKET_NAME}" ]]; then
    BUCKET_NAME=$(docker-compose run --rm tilegarden bash -c 'echo -n $PFB_TILEGARDEN_CACHE_BUCKET')
fi

I agree with your comment in the PR body that it's hard to think of cases where this script would be useful other than clearing the preconfigured dev tile cache, so I leave the decision up to you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, on reflection that's probably worth having. It might not be useful, but then again it might, in which case it's a lot easier to write it in now than muddle through without it later.

@KlaasH
Copy link
Contributor Author

KlaasH commented Jan 23, 2019

Whoops, I guess I rushed this out the door a bit. Will clean up linting and tests.

@@ -18,6 +18,7 @@ services:
- PFB_AWS_BATCH_TILEMAKER_JOB_DEFINITION_NAME_REVISION
- PFB_AWS_BATCH_TILEMAKER_JOB_DEFINITION_NAME
- PFB_TILEGARDEN_ROOT=http://localhost:9400
- PFB_TILEGARDEN_CACHE_BUCKET=dev-pfb-tilecache-us-east-1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not put the tiles for development users somewhere in the existing ${DEV_USER}-pfb-storage-us-east-1 bucket we're using for other S3 storage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm expecting that for serving tiles from the cache we'll want to use an S3 Website, since that's what I did in the Tilegarden branch where I've been working on caching. And as far as I can tell that serves the whole bucket. If there turns out to be a way to restrict it and use the existing bucket, that would be worth pursuing.

It's true that a shared dev cache could cause some inconvenience, though given that tiles are namespaced by job ID, it would be more in the way of clutter than any weird tiler behavior. I'm inclined to run with it, though--with the fallback of making a personal bucket and overriding the variable if one is needed--until we know whether it's a problem in practice.

Copy link
Contributor

@flibbertigibbet flibbertigibbet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on appeasing the linter.

Tested this and it seems to be working well. Consider using the existing user buckets instead of a shared new environment bucket, which may ease debugging in development.

See https://claudiajs.com/news/2016/09/27/claudia-2.0.html#request

The `req` object we've been using is described as "for backwards
compatibility" and it seems like the AWS_PROXY one is potentially
more robust and portable.
Adds a function to write a tile to S3 if a `PFB_TILEGARDEN_CACHE_BUCKET`
variable is set in the environment. If the variable isn't truthy, it does
nothing.
Uses the full request path as the key.
@KlaasH KlaasH force-pushed the feature/kjh/write-tiles-to-cache branch from d64edd6 to a402de0 Compare January 24, 2019 14:09
@KlaasH
Copy link
Contributor Author

KlaasH commented Jan 24, 2019

Linter appeased, bucket argument added to clear-tile-cache. Will merge shortly, if the build succeeds.

@KlaasH KlaasH merged commit 257a883 into develop Jan 24, 2019
@KlaasH KlaasH deleted the feature/kjh/write-tiles-to-cache branch January 24, 2019 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add script to clear S3 tile cache Write tiles to S3 cache bucket
4 participants