Skip to content
This repository has been archived by the owner on Sep 12, 2018. It is now read-only.

NG: caching as a first class citizen #647

Closed
bshi opened this issue Oct 22, 2014 · 9 comments
Closed

NG: caching as a first class citizen #647

bshi opened this issue Oct 22, 2014 · 9 comments
Milestone

Comments

@bshi
Copy link
Contributor

bshi commented Oct 22, 2014

Caching seems to be an important feature and it might be nice to elevate it to work out of the box in a registry cluster. If content addressable storage is a goal (#616 (comment)), it seems https://github.com/golang/groupcache would be an ideal candidate as it's specifically designed for this class of distribution problem.

@dmp42
Copy link
Contributor

dmp42 commented Oct 22, 2014

Thanks! How do this compare to, say, redis?

@wking
Copy link
Contributor

wking commented Oct 22, 2014

On Wed, Oct 22, 2014 at 01:55:25PM -0700, Olivier Gambier wrote:

Thanks! How do this compare to, say, redis?

It's content-addressable, so (from its README):

  • does not support versioned values. If key "foo" is value "bar",
    key "foo" must always be "bar".

That means it's not going to work for anything we edit (e.g. tag files
which list the tagged image id associated with that tag).

@bshi
Copy link
Contributor Author

bshi commented Oct 22, 2014

groupcache solves a narrower class of problems but by virtue of the extra constraints addresses several difficult caching problems (hot spots, thundering herd, etc) inherent in the more general purpose caches like memcache or redis. The two are not mutually exclusive. As @wking points out - groupcache, unmodified, will not be suitable for mutable state.

@wking
Copy link
Contributor

wking commented Oct 22, 2014

On Wed, Oct 22, 2014 at 02:28:44PM -0700, Bo Shi wrote:

The two are not mutually exclusive.

So the questions seems to be:

  • Do we have content-addressable data besides the image tarballs? I
    can't think of any off the top of my head, but I haven't looked
    through our current storage data in detail, and I'm not fluent in
    the v2 stuff.
  • Is it worth caching image tarballs? I expect many tarballs will be
    large, and optional caching based on size seems like more trouble
    than it's worth.

If we do have content-addressable non-image data or want to cache
small images, groupcache sounds like a good fit.

@bshi
Copy link
Contributor Author

bshi commented Oct 22, 2014

Is it worth caching image tarballs? I expect many tarballs will be large,
and optional caching based on size seems like more trouble than it's worth.

Good point - it also occurred to me that the benefit may be marginal. Anecdotally at least, a lot of images I've interacted with are composed of small numbers of large layers and many tiny layers. Would it be difficult to crawl the public index to generate a histogram of layers and their sizes?

Do we have content-addressable data besides the image tarballs?

Coming from a different angle (assuming V2 is still under design), is there data that isn't content-addressable that could be made content-addressable?

@wking
Copy link
Contributor

wking commented Oct 22, 2014

On Wed, Oct 22, 2014 at 03:04:57PM -0700, Bo Shi wrote:

Do we have content-addressable data besides the image tarballs?

Coming from a different angle (assuming V2 is still under design),
is there data that isn't content-addressable that could be made
content-addressable?

My (sadly dead 1) detached signatures (moby/moby#6070) would
have let the image metadata and detached signatures both be
content-addressable. With the current embedded signatures
(moby/moby#8093), I think the image metadata at least will need to
be mutable.

@dmp42
Copy link
Contributor

dmp42 commented Oct 28, 2014

A couple notes:

Content-adressibility is for layers only. There doesn't seem to be a benefit in content-adressibility for manifest files.

"Caching" layers is not something we have been looking into - many people offload actual delivery to a CDN (we do), which does provide more benefits than caching big objects on the service would - and this pattern is likely to be kept for v2.

Making it possible to use alternative cache engines for manifest files (memcache or otherwise) is something that we should consider - though (being lazy) I would likely support redis as being the default, officially maintained. Would be nice to have alternatives, sure (provided it's easy to mutate the objects).

@dmp42
Copy link
Contributor

dmp42 commented Oct 28, 2014

Bottom-line being: I would rather focus on making the content easy to be cached at the transport layer (http), rather than dedicating too much intelligence in application level caching.

@stevvooe
Copy link
Contributor

stevvooe commented Jan 9, 2015

The next generation registry and the distribution project are making plans to ensure content addressability across layers and manifests. While there are no immediate plans to integrate groupcache, As the V2 registry currently stands, distribution can benefit directly from http caching. It follows that groupcache would only help in addition.

This issue is going to be closed. A Proposal and PR is welcome for groupcache integration in docker/distribution.

@stevvooe stevvooe closed this as completed Jan 9, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants