Friday, 2023/02/03
3:00 pm ET
Terrell Russell, Kory Draughn, Alan King, Violet White, Tim Cutts (Amazon), Illyoung Choi (CyVerse), Tony Edgin (CyVerse), Nirav Merchant (CyVerse)
DISCUSSION
- Native C++ S3 Implementation
- Working endpoints: CopyObject, ListObjectsV2, GetObject, DeleteObject, HeadObject, PutObject
- Two plugin interfaces defined:
- Authentication - user mapping
- Bucket Resolution - bucket mapping
- Skipping implementation of multipartUpload for now
- Focused on testing and packaging
- S3 frontend to SFTPGo
- Author doesn't currently have bandwidth for this project
- Multipart Options
- a) Multiobject - write all parts individually to irods, then complete triggers copy/concatenate/whatever
- pro - relatively simple
- con - lots of extra policy, could trigger replication to multiple continents (just a config option)
- requires API plugin for concatenate()
- b) Store-and-forward - write it all down in the bridge, then send it to irods
- pro - simple, no extra policy
- con - slow/delayed, need POTENTIALLY HUGE disk
- c) Efficient store-and-forward - write down / hold non-contiguous parts in bridge - send contiguous parts to irods when ready
- pro - elegant, single write
- con - more complexity, need biggish disk
- maybe off the table because a client can re-send the same numbered part and it should overwrite the earlier same part
- OR... new thread! offset, overwrite, who cares, same size, magic/perfect, just works, don't look at me…
- d) Store-and-register - write it all down where irods can see it, then just register it in irods
- pro - simple, fastest
- con - just reg policy?, adds dependency on co-visibility of bridge and irods
- cannot continue on failure (incomplete writes)
- iRODS doesn't know what happened
- Client has no way to recover
- cannot continue on failure (incomplete writes)
- a) Multiobject - write all parts individually to irods, then complete triggers copy/concatenate/whatever
- Illyoung - HDF5 file format? Because we don't know the sizes ahead of time
- Just write them down as they come, document the parts, and then register the whole thing in iRODS as hdf5 type… would require a server-translation thinger to give the file back to the user - same as option a) concatenate()
- Possible polling/delay option - glacier-like until it's ready
- Feels that (a) is probably the best path forward
- Tim also agrees that this approach is closer to how S3 works
- Core idea is similar to iput/ibun
- Violet - plans on writing to a secret collection before concatenating all parts
- Kory - What happens if policy exists and it modifies permissions on the parts?
- Is multipart upload required?
- No. Minio doesn't implement it
- Just write them down as they come, document the parts, and then register the whole thing in iRODS as hdf5 type… would require a server-translation thinger to give the file back to the user - same as option a) concatenate()
- Possible 'option' flag to pick a/b/c/d mode… we could implement more than one of these
- Nirav - people will use well known tooling to communicate with this
- We can try to support a wide range of tools, but we may hit a wall
- OR, we can list a set of tools which are supported
- Common tools appear to be the AWS client and Boto3
- Nirav - What versions of iRODS does this target?
- Kory - We're targeting versions of iRODS that support the necessary features
- Meaning 4.2.11+ possibly
- We need people who are able to test it out
- Kory - We're targeting versions of iRODS that support the necessary features
- Next Meeting
- Mar 2023