Skip to content
This repository has been archived by the owner on Sep 12, 2018. It is now read-only.

NG: storage driver model #616

Closed
dmp42 opened this issue Oct 8, 2014 · 26 comments
Closed

NG: storage driver model #616

dmp42 opened this issue Oct 8, 2014 · 26 comments

Comments

@dmp42
Copy link
Contributor

dmp42 commented Oct 8, 2014

Designing the communication API with the drivers is actually the easy part.

The hard part is:

  • actual implementation (resumable push is a challenge)
  • find the proper abstraction in go world

Here are some things that we want to take into account:

  • the possibility to have language-agnostic drivers might be a plus - that would obviously mean we need an agnostic transport (http, spdy, ...)
  • we want to have a communication channel that is extremely robust and reasonably "simple" to debug in order not to add too much complexity on top of the driver itself
  • ability to NOT having to recompile the registry itself to enjoy using new drivers is definitely something to consider

@noxiouz @bacongobbler

This was referenced Oct 8, 2014
@bacongobbler
Copy link
Contributor

find the proper abstraction in go world

The standard idiom for this seems to be something along using an interface (following your simplification in #612's OP):

type Filestore interface {
    Get(key string) (string, error)
    Put(key string, data string) error
    List(delimiter string) []string
    Move(src, dest string) error
    WriteStream(path string, r io.Reader) error
    ReadStream(path string, w io.Writer) error
}

Then, in the registry we can just reference that interface and perform operations against it:

var fileStore store.Filestore

func main() {
    fileStore = store.NewS3FileStore(...)
}

func putImage(key string, datastream io.Reader) error {
    if err := fileStore.WriteStream(key, datastream); err != nil {
        return err
    }
    return nil
}

This is just off the top of my head, but it seems to be the standard idiom in this case :)

As for the case with resumable push, I'm not too sure on how it can be performed in the context of a file store. A filestore is simply a glorified key/value store which reads and writes to an external filesystem. This is probably the registry's territory, as it would know which bits of data still need to be written and which bits have already been pushed. There could be a way to accomplish this. Maybe by doing a server-side check to see what bits have been pushed, and resume pushing the ones that have not already been pushed?

@wking
Copy link
Contributor

wking commented Oct 9, 2014

On Wed, Oct 08, 2014 at 08:51:39PM -0700, Matthew Fisher wrote:

As for the case with resumable push, I'm not too sure on how it can
be performed in the context of a file store. A filestore is simply a
glorified key/value store which reads and writes to an external
filesystem. This is probably the registry's territory, as it would
know which bits of data still need to be written and which bits have
already been pushed. There could be a way to accomplish this. Maybe
by doing a server-side check to see what bits have been pushed, and
resume pushing the ones that have not already been pushed?

For resumable-push, I think the storage needs to be more than a
glorified key/value store. You also need stat (or some analog) to see
how big the already-stored object is. Then you can return that size
to the client, and the client can start pushing bytes after that
offset. The backend then needs to be able to append the new bytes to
the original, half-finished push. So I'd append the API to:

type Filestore interface {
Get(key string) (string, error)
Put(key string, data string) error
List(delimiter string) []string
GetSize(path string) int
GetCurrentSize(path string) int
WriteStream(path string, data chan string, int size, int offset=0) error
ReadStream(path string, data chan string) error
}

Where GetSize returns the total allocated size and GetCurrentSize
returns the amount actually stored on disk (so GetSize() ==
GetCurrentSize() means the file is stored in its entirety). I see no
need to use the atomic Get/Put interface and the streaming
GetSize/GetCurrentSize/WriteStream/ReadStream interfaces on the same
paths, and would be happy with Filestore subclassing both an
AtomicStore interface and a StreamingStore interface (although I don't
know how to write that in Go).

I also dropped the Move because I we don't need that for
content-addressable storage.

@wking
Copy link
Contributor

wking commented Oct 9, 2014

On Thu, Oct 09, 2014 at 08:26:14AM -0700, W. Trevor King wrote:

would be happy with Filestore subclassing both an AtomicStore
interface and a StreamingStore interface (although I don't know how
to write that in Go).

Actually, I'd be happy with Store subclassing (or providing, or
whatever) both an AtomicStore interface and a StreamingStore interface
;). We shouldn't care if the backend happens to use files or not.
Splitting it up like this would let us use Redis (or whatever) as the
AtomicStore, and S3 (or local storage or whatever) for the
StreamingStore.

@bacongobbler
Copy link
Contributor

would be happy with Filestore subclassing both an
AtomicStore interface and a StreamingStore interface (although I don't
know how to write that in Go).

There certainly are ways to do this:

type FileStore interface {
    Get
    Put
    List
    ...
}

type StreamingFileStore interface {
    FileStore
    ReadStream
    WriteStream
}

type AtomicFileStore interface {
    FileStore
    ...
}

At that point you can use the power of duck-typing to figure out what interface the filestore implements :)

@wking
Copy link
Contributor

wking commented Oct 9, 2014

On Thu, Oct 09, 2014 at 08:26:14AM -0700, W. Trevor King wrote:

type Filestore interface {
Get(key string) (string, error)
Put(key string, data string) error
List(delimiter string) []string
GetSize(path string) int
GetCurrentSize(path string) int
WriteStream(path string, data chan string, int size, int offset=0) error
ReadStream(path string, data chan string) error
}

And if we're serious about the content-addressable part, we should
probably have:

Put(data string) string
WriteStream(data chan string, int size) string
AppendStream(key string, data string) string

and have the storage return an arbitrary key for each stored object
(e.g. by hashing it). In the case of a partial write, WriteStream
would return a temporary key for use by AppendStream, and you'd only
get the final key from WriteStream/AppendStream once you'd finished
writing the object.

@wking
Copy link
Contributor

wking commented Oct 9, 2014

On Thu, Oct 09, 2014 at 08:39:07AM -0700, Matthew Fisher wrote:

would be happy with Filestore subclassing both an AtomicStore
interface and a StreamingStore interface (although I don't know
how to write that in Go).

There certainly are ways to do this:

Actually, I'm fine with just having my AtomicStore and StreamingStore
configured separately. I don't see much gain to wrapping them up in a
single Store class, and it would be nice to be able to mix and match
the backends for both types independently.

@dmp42
Copy link
Contributor Author

dmp42 commented Oct 9, 2014

FWIW, S3 does currently support multipart, resumable push.

@BrianBland
Copy link

Is there still a need for the Move command in this API with all large files being content-addressable?

Also, I'm not aware of any distributed storage systems which benefit from a first-class move command. Most seem to approximate this behavior with a full copy followed by a deletion of the old version.

@BrianBland
Copy link

I'm proposing that we add an offset parameter to the ReadStream method for resumable downloads if we're already planning on supporting resumable uploads.
I see this as looking like ReadStream(path string, offset uint64) (io.ReadCloser, error)

@wking
Copy link
Contributor

wking commented Oct 15, 2014

On Tue, Oct 14, 2014 at 11:36:28AM -0700, Brian Bland wrote:

I'm proposing that we add an offset parameter to the ReadStream
method for resumable downloads if we're already planning on
supporting resumable uploads.

Another alternative would be to set these up as seek-able files.

@visualphoenix
Copy link
Contributor

Youtube handles resumable uploads in an interesting way

@wking
Copy link
Contributor

wking commented Oct 29, 2014

On Wed, Oct 08, 2014 at 08:51:39PM -0700, Matthew Fisher wrote:

List(delimiter string) []string

The Docker hub had 14000 “applications” ($namespace/$repository?) in
June 1. I don't know how many layers that translates to, or how
many namespaces there were, but I'm concerned about performance with a
flat List endpoint 2. I'd prefer:

Count(prefix string) int
List(prefix string, size int, from int) []string

to let you see how many objects matched a (possibly empty) prefix and
iterate through them without making massive requests. I borrowed
size/from from the Elasticsearch API 3, but I don't really care what
the strings are.

Anyhow, something like this would allow the (streaming) storage driver
to shard or fan-out transparently for efficient lookups and listings,
while still preserving a flat namespace for object lookups.

@visualphoenix
Copy link
Contributor

Fwiw, the ability to shard/fan out consistently is a huge +1 for me

On Wednesday, October 29, 2014, W. Trevor King [email protected]
wrote:

On Wed, Oct 08, 2014 at 08:51:39PM -0700, Matthew Fisher wrote:

List(delimiter string) []string

The Docker hub had 14000 “applications” ($namespace/$repository?) in
June 1. I don't know how many layers that translates to, or how
many namespaces there were, but I'm concerned about performance with a
flat List endpoint 2. I'd prefer:

Count(prefix string) int
List(prefix string, size int, from int) []string

to let you see how many objects matched a (possibly empty) prefix and
iterate through them without making massive requests. I borrowed
size/from from the Elasticsearch API 3, but I don't really care what
the strings are.

Anyhow, something like this would allow the (streaming) storage driver
to shard or fan-out transparently for efficient lookups and listings,
while still preserving a flat namespace for object lookups.


Reply to this email directly or view it on GitHub
#616 (comment)
.

Raymond "VisualPhoenix" Barbiero

@wking
Copy link
Contributor

wking commented Oct 29, 2014

On Wed, Oct 29, 2014 at 07:20:26AM -0700, W. Trevor King wrote:

Count(prefix string) int
List(prefix string, size int, from int) []string

Redis lets you do this for lists with 1:

LRANGE key start stop

(but not for sets, where SMEMBERS returns all the values 2). I
expect most of the metadata-side requests will use a stand-alone
search engine (connected by event webhooks [3,4]?) for listings like
this, but a Redis implementation of atomic storage could use lists for
anything where listing would be important (e.g. like we currently list
a repository's tags, #614).

@wking
Copy link
Contributor

wking commented Nov 3, 2014

From #docker-dev today:

11:00 < dmp42> 5. would love some gcs comments regarding the new
driver API before we freeze

I'm not sure how solid this freeze is intended to be, but I'd love to
see semantic versioning for the API, with a version endpoint:

version() → [major, minor, patch]

so the registry could ask what version of the API the driver was
implementing and decide if it was compatible. That would also give us
a clean way to transition to new APIs while maintaining some backwards
compatibility, if we ever need to make a backwards-incompatible bump.

@BrianBland
Copy link

I'm not sure how solid this freeze is intended to be, but I'd love to
see semantic versioning for the API, with a version endpoint:

version() → [major, minor, patch]

I'm pretty sure the freeze is a 0.X version, so still subject to tweaks before final release.

I like the idea of checking storage driver compatibility at startup, but I'm not sure that semantic versioning is necessary or real backwards compatibility is worth the complexity.

We can use a [Major, Minor] version scheme, as patch version doesn't seem to provide much value when applied to an API (correct me if I'm wrong here). When the driver has a newer minor version than the registry, this should still be usable, but should emit some sort of warning, but if the driver is running an older version than the registry or a newer major version, we should probably exit with a compatibility error for sanity purposes.

@dmp42
Copy link
Contributor Author

dmp42 commented Nov 3, 2014

The way registry V1 worked was semver compatible (your driver would depend on core>=X,<X+1 and would work that way).
That might be trickier to get right with go, but then I'm ok bumping major versions more often (understood that it's the driver interface version, not the main registry version).

@visualphoenix
Copy link
Contributor

I think semver is the way to go and we should have backwards compatibility
if possible for at least a major version.

On Monday, November 3, 2014, Olivier Gambier [email protected]
wrote:

The way registry V1 worked was semver compatible (your driver would depend
on core>=X,<X+1 and would work that way).
That might be trickier to get right with go, but then I'm ok bumping major
versions more often (understood that it's the driver interface version, not
the main registry version).


Reply to this email directly or view it on GitHub
#616 (comment)
.

Raymond "VisualPhoenix" Barbiero

@wking
Copy link
Contributor

wking commented Nov 3, 2014

On Mon, Nov 03, 2014 at 12:40:50PM -0800, Olivier Gambier wrote:

(understood that it's the driver interface version, not the main
registry version).

Yeah, and I think the driver API should be versioned independently of
the registry implementation.

@BrianBland, good point about dropping the patch from the API version.
I don't think we need a warning if the driver uses and older minor
version, but an info-level notice would be nice. Here's how I'd
handle differences:

  • Storage versions match: no message:
  • Major versions match, but minor versions differ: log an info-level
    notice.
  • Major versions differ: log an error-level notice, and:
    • if the registry has appropriate compatibility code: carry on.
    • otherwise: exit with an error.

That works whether the driver or the registry has the newer API, and
supports registry-side cross-major compatibility code.

It doesn't allow for negotiation though, which we'd need if we
wanted to support driver-side compatibility code (e.g. if the driver
has endpoints for both the 1.x and 2.x API versions). I don't think
we'd have much need for that though.

@wking
Copy link
Contributor

wking commented Nov 3, 2014

On Mon, Nov 03, 2014 at 01:06:08PM -0800, W. Trevor King wrote:

  • Major versions differ: log an error-level notice, and:
    • if the registry has appropriate compatibility code: carry on.
    • otherwise: exit with an error.

Oops, this should be:

  • Major versions differ:
    • if the registry has appropriate compatibility code: log a
      warning-level notice and carry on.
    • otherwise: log an error-level notice and exit with an error.

@BrianBland
Copy link

If the registry has a newer minor version than the storage driver, this could mean that the registry added a new method to the driver api that the storage driver does not respect, as per semantic versioning. We would either have to keep a mapping of which versions support which methods and operate differently depending on the driver version, or we could just reject the storage driver for being out of date.

@wking
Copy link
Contributor

wking commented Nov 3, 2014

On Mon, Nov 03, 2014 at 01:15:32PM -0800, Brian Bland wrote:

If the registry has a newer minor version than the storage driver,
this could mean that the registry added a new method to the driver
api that the storage driver does not respect, as per semantic
versioning. We would either have to keep a mapping of which versions
support which methods and operate differently depending on the
driver version, or we could just reject the storage driver for being
out of date.

Ah, right. I think I'd turned this around in my head, and had
major-version backwards compatibility applying to the registry, when
it's actually for the API/storage-driver. So we should be dying with
an error for any case where the registry version is greater than the
storage version, unless the registry has appropriate
backwards-compatibility code.

@wking
Copy link
Contributor

wking commented Nov 3, 2014

On Mon, Nov 03, 2014 at 01:06:08PM -0800, W. Trevor King wrote:

It doesn't allow the for negotiation though, which we'd need if we
wanted to support driver-side compatibility code (e.g. if the driver
has endpoints for both the 1.x and 2.x API versions). I don't think
we'd have much need for that though.

It also doesn't handle API bifurcation, e.g. if we want to use
separate atomic and streaming storage APIs 1 down the line. That's
not a problem if we never split the API, or if splitting is infrequent
enough that we don't mind an awkward, manual handoff when we do. If
we did want to detect and handle API bifurcation, we'd want:

version() → {api_name: [major, minor], …}

with a dictionary of supported APIs and their versions. If we extend
that slightly to:

version() → {api_name: [[major, minor], …], …}

it would also allow us to have drivers that simultaneously supported
several major versions of the same API 2, they'd just return
something like:

{'storage': [[1, 8], [2, 3]]}

@BrianBland
Copy link

I'm not sure how a multi-version storage driver would function. How would we handle changing the behavior, parameters, or return values of an API method without having to rename the method each time or providing a version parameter with each API call? I'm not a fan of either of these approaches...

@wking
Copy link
Contributor

wking commented Nov 3, 2014

On Mon, Nov 03, 2014 at 03:39:10PM -0800, Brian Bland wrote:

I'm not sure how a multi-version storage driver would function. How
would we handle changing the behavior, parameters, or return values
of an API method without having to rename the method each time or
providing a version parameter with each API call? I'm not a fan of
either of these approaches...

I don't see major-version bumps as a frequent feature, and the web
endpoints are already namespaced like this (/v1/_ping), so I'm fine
with new function names for anything that would break backwards
compatibility.

@dmp42
Copy link
Contributor Author

dmp42 commented Nov 5, 2014

I believe this, to a large extent, should be closed given #643 was merged.

Special thanks to @BrianBland for his continuous work on this, to @wking @bacongobbler @noxiouz @visualphoenix and others who took time for the thorough reviews and discussions.

What we have certainly still has some rough edges and can use some tire-kicking and bullet tests, but this is a very good first shot.

One single char to sum this first "next-generation" journey: U+2661 ♡

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants