-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8066268338 Key segment pair refactor - do not move the segment out of the KeySegmentPair as it is a shared resource #2166
base: master
Are you sure you want to change the base?
Conversation
@@ -85,7 +85,7 @@ class S3ClientInterface { | |||
|
|||
virtual S3Result<std::monostate> put_object( | |||
const std::string& s3_object_name, | |||
Segment&& segment, | |||
Segment& segment, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally this would be a const Segment&
(also for other storages). However this might require a much bigger refactor and is likely out of scope for this change.
It would make writing put_object(*key_seg.segment_ptr())
much less worrysome.
I had to spend quite a bit of time verifying the put_object
doesn't somehow move some buffers outside of the segment.
The reason it works at all is because the aws-sdk-cpp's PutObjectRequest::SetBody
doesn't take an owning copy of the underlying buffer and copies from it to send the request to S3. This is also probably why we weren't seeing the use-after-move issues manifest all the time in our tests before.
Maybe it's just worth a comment saying why it should not move data out of the segment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is where we chose to draw the line and stop refactoring, making all the consumers of the Segment
const will be an even bigger change. But isn't it implied that this shouldn't move data out of the segment, given the segment is passed by lvalue reference?
const auto total_size = segment.calculate_size(); | ||
/*thread_local*/ std::vector<uint8_t> buffer{}; | ||
buffer.resize(total_size); | ||
bsoncxx::types::b_binary data = {}; | ||
kv.segment().write_to(buffer.data()); | ||
kv.segment_ptr()->write_to(buffer.data()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Just use the reference above and do segment.write_to()
kv.segment().force_own_buffer(); | ||
return std::move(kv.segment()); | ||
kv.segment_ptr()->force_own_buffer(); | ||
return kv.segment().clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need the force_own_buffer
if we're doing clone
. See here, clone will copy underlying buffer even if not owning.
@@ -225,7 +225,7 @@ void do_read_impl( | |||
KeyDecoder&& key_decoder, | |||
ReadKeyOpts opts) { | |||
auto key_seg = do_read_impl(std::move(variant_key), root_folder, bucket_name, s3_client, std::forward<KeyBucketizer>(bucketizer), std::forward<KeyDecoder>(key_decoder), opts); | |||
visitor(key_seg.variant_key(), std::move(key_seg.segment())); | |||
visitor(key_seg.variant_key(), *key_seg.segment_ptr()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO perhaps when reading with a visitor it does make sense to move.
I.e. makes sense to move the ownership to the visitor instead of giving it a reference to something which will soon get destructed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we decide to undo this we should do it for all storages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point thank you, I'll try this
@@ -54,7 +54,7 @@ std::optional<Azure::Core::RequestFailedException> has_failure_trigger(const std | |||
|
|||
void MockAzureClient::write_blob( | |||
const std::string& blob_name, | |||
arcticdb::Segment&& segment, | |||
arcticdb::Segment& segment, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should clone for all mock storages. Looks like we've forgotten it for azure.
@@ -161,7 +165,7 @@ folly::Future<folly::Unit> write_compressed(storage::KeySegmentPair ks) override | |||
} | |||
|
|||
void write_compressed_sync(storage::KeySegmentPair ks) override { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we change this to take a KeySegmentPair&
to be consistent with the others and to make clear that it does not copy the underlying segment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same goes for non-sync variant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have to keep it like this, this is what this bit of the PR description is referring to,
Pass KeySegmentPair by lvalue rather than rvalue reference where possible. The Task layer still needs rvalue references as that is what Folly expects (rightly, the Folly executors need to have ownership).
ARCTICDB_DEBUG(log::storage(), "DecodeTimeseriesDescriptorTask decoding segment with key {}", variant_key_view(key_seg.variant_key())); | ||
|
||
auto maybe_desc = decode_timeseries_descriptor(key_seg.segment()); | ||
auto maybe_desc = decode_timeseries_descriptor(*key_seg.segment_ptr()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe again a bit out of scope but I think decode_timeseries_descriptor
decode_metadata_and_descriptor_fields
etc. can be changed to just take a const Segment&
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They can't be, they modify the segment in place
…mentPair as it is a shared resource
f8744b2
to
56a1983
Compare
This allows us to re-enable some tests in
arcticdb-enterprise
https://github.com/man-group/arcticdb-enterprise/pull/new/aseaton/keysegmentpair-refactor-enterprise .The changes are:
Segment& segment()
in theKeySegmentPair
- this was often used to move out of theSegment
owned by the shared pointer inKeySegmentPair
KeySegmentPair
by lvalue rather than rvalue reference where possible. TheTask
layer still needs rvalue references as that is what Folly expects (rightly, the Folly executors need to have ownership).lookup_match_in_dedup_match
to return a variantDecodeMetadataTask
clone()
segments when adding them to the in in-memory and mock storagesIn particular this lets
CopyCompressedInterstore
safely copy the sameKeySegmentPair
to several targets, whereas before theSegment
could be moved from during the copy to each target.