Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caching Feature #126

Open
mattmeye opened this issue Sep 29, 2018 · 5 comments
Open

Caching Feature #126

mattmeye opened this issue Sep 29, 2018 · 5 comments
Assignees

Comments

@mattmeye
Copy link

Hello, I'm thinking about using aws lambda for tile generation since a few days. In my case I prefer to cache the generated tiles with cloud front and store generated tiles on a s3 bucket. With aws lambda edge I will generate (or start another process to do that) the missed tiles. At the end I have to thinking about map and tile updates. Do you plan a feature like this in the future or know a current project?
Kind regards, matt

@mattdelsordo
Copy link
Contributor

I'm not sure if this is a feature that's being planned on, but you can definitely use Tilegarden to do this. It would look something like:

  1. Deploy a Tilegarden instance.
  2. Write a second lambda function that gets triggered on an S3 event (I'm not sure about the specifics here but I'm pretty sure there's a system in place for this on AWS).
  3. With the second function: if the tile is missing at the desired spot in the bucket (or if it's out of date), fetch it from your Tilegarden instance and save it to the bucket.

I've seen some articles online that discuss Lambda interaction with S3 more in-depth, but to my knowledge there isn't a current project that handles this. I hope that helps!

@KlaasH
Copy link
Collaborator

KlaasH commented Oct 1, 2018

Yeah, we are planning to add this feature, though I'm not sure what the timeline will be, exactly.

The example I know of is this: WikiWatershed/model-my-watershed#1215, which stores tiles in S3 and serves them with a CloudFront distribution that redirects to the actual tile server when a tile is missing. In that case the tile server is a Windshaft instance, which handles the redirected request and also writes the tile into the S3 bucket for next time.

Cache invalidation is tricky, and depends heavily on aspects of the data that these components won't necessarily know about. I think the above example deals with data that's updated infrequently and only by maintainers, so the cache invalidation strategy is "manually clear the bucket when necessary." The next-simplest approach would probably be adding a TTL parameter to the S3 bucket, though it's easy to imagine situations where that would be too aggressive and also ones where it would not be aggressive enough.

@mattmeye
Copy link
Author

mattmeye commented Oct 2, 2018

Thank you very much for your feedback and the link to this example. Currently I think it is possible to check invalidation with some logic in a lamda@edge function. I'm not an expert in the osm data yet, but I think that Im able to reach (or make it reachable) the status of invalidation in the postgres database inside a lambda edge function too. Otherwise I will try to extend the osm update process to store this information, or delete the tile by this process. I will figure out some solutions and give my feedback back. Actually I noticed that the lambda@edge functions are a lot cheaper than the normal lambda, so I will start to figure out this way.

@KlaasH
Copy link
Collaborator

KlaasH commented Nov 26, 2018

The feature/kjh/s3-tile-cache makes some changes to api.js and adds additional Terraform config to get this mostly working.

The basic structure is:

  • Creates an S3 bucket to hold cached tiles
  • Configures an S3 Website to serve tiles from that bucket
  • Adds an origin to the CloudFront distribution to point to that S3 website
  • Configures a fallback behavior on the S3 website such that, when a tile is not found, the S3 website redirects back to the CloudFront distribution, adding a latest/ prefix to the request path
  • Adds a cache rule for the CloudFront distribution so that the API Gateway origin, which was formerly the only origin, now only handles requests that start with latest/
  • Adds code to api.js so that, if there is a CACHE_BUCKET configured in the environment, it writes all tiles to that bucket, using the request path as the key

So the effect is that the CloudFront distribution now serves tiles from S3 when they exist, with a seamless fallback to API Gateway/Lambda when they don't, and the Lambda function adds a tile to the cache the first time it's generated.

This is all great, but there's a fatal flaw: the S3 website uses the request path as the key but ignores the querystring. This means that all the configuration parameters we put in the querystring (layers/layers, filter/filters, utfFields, config, s3bucket) need to be converted to path parameters.

I will make issues for making that transition. In the meantime, S3 caching works to the extent that either 1) all the defaults are acceptable, so you can get the tiles you want with no querystring or 2) you're comfortable fudging it because you're confident the parameters given in the querystring won't change, so the fact that they're ignored for caching purposes won't cause situations where cached tiles don't match the provided parameters. I.e. it doesn't completely not work, but it's broken.

Note also: it works to write tiles to S3 in local development, but not to read them from there. Or at least, I doubt it's possible to get an S3 website redirecting to localhost, and I didn't try.

@mattmeye
Copy link
Author

mattmeye commented Nov 26, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants