Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote fdb #54

Merged
merged 219 commits into from
Jan 14, 2025
Merged

Remote fdb #54

merged 219 commits into from
Jan 14, 2025

Conversation

Ozaq
Copy link
Contributor

@Ozaq Ozaq commented Nov 26, 2024

No description provided.

@codecov-commenter
Copy link

codecov-commenter commented Nov 26, 2024

Codecov Report

Attention: Patch coverage is 17.31297% with 2111 lines in your changes missing coverage. Please review.

Project coverage is 50.31%. Comparing base (a8591b3) to head (c2b6477).

Files with missing lines Patch % Lines
src/fdb5/remote/server/ServerConnection.cc 0.00% 331 Missing ⚠️
src/fdb5/remote/client/ClientConnection.cc 0.00% 276 Missing ⚠️
src/fdb5/remote/server/CatalogueHandler.cc 0.00% 264 Missing ⚠️
src/fdb5/remote/client/RemoteStore.cc 0.00% 227 Missing ⚠️
src/fdb5/remote/server/StoreHandler.cc 0.00% 149 Missing ⚠️
src/fdb5/api/RemoteFDB.cc 0.00% 128 Missing ⚠️
src/fdb5/remote/client/RemoteCatalogue.cc 0.00% 98 Missing ⚠️
src/fdb5/remote/Connection.cc 0.00% 89 Missing ⚠️
src/fdb5/remote/client/ClientConnectionRouter.cc 0.00% 66 Missing ⚠️
src/fdb5/remote/RemoteFieldLocation.cc 0.00% 44 Missing ⚠️
... and 62 more
Additional details and impacted files
@@             Coverage Diff             @@
##           develop      #54      +/-   ##
===========================================
- Coverage    52.72%   50.31%   -2.42%     
===========================================
  Files          233      259      +26     
  Lines        13200    14270    +1070     
  Branches      1288     1428     +140     
===========================================
+ Hits          6960     7180     +220     
- Misses        6240     7090     +850     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

@tbkr tbkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite a lot obsolete comment/commented-out code remarks.

There are some comments on changed logic as well, which may slipped through, while writing the code.

if (location_) {
if (withLocation) {
out << sep << *location_;
} else if (withLength) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic changed here:

Now only true if location_ and withLength are both true.

@@ -27,9 +27,9 @@ ControlVisitor::ControlVisitor(eckit::Queue<ControlElement>& queue,
identifiers_(identifiers) {}


bool ControlVisitor::visitDatabase(const Catalogue& catalogue, const Store& store) {
bool ControlVisitor::visitDatabase(const Catalogue& catalogue) { //, const Store& store) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the unneeded comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -37,7 +37,8 @@ class ControlVisitor : public QueryVisitor<ControlElement> {
bool visitIndexes() override { return false; }
bool visitEntries() override { return false; }

bool visitDatabase(const Catalogue& catalogue, const Store& store) override;
bool visitDatabase(const Catalogue& catalogue) override;
// bool visitDatabase(const Catalogue& catalogue, const Store& store) override;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -44,7 +44,8 @@ class DumpVisitor : public QueryVisitor<DumpElement> {
bool visitIndexes() override { return false; }
bool visitEntries() override { return false; }

bool visitDatabase(const Catalogue& catalogue, const Store& store) override {
// bool visitDatabase(const Catalogue& catalogue, const Store& store) override {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -43,14 +43,16 @@ struct ListVisitor : public QueryVisitor<ListElement> {

/// Make a note of the current database. Subtract its key from the current
/// request so we can test request is used in its entirety
bool visitDatabase(const Catalogue& catalogue, const Store& store) override {
// bool visitDatabase(const Catalogue& catalogue, const Store& store) override {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove, below as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -32,7 +32,8 @@ class TocPurgeVisitor : public PurgeVisitor, public TocStatsReportVisitor {
TocPurgeVisitor(const TocCatalogue& catalogue, const Store& store);
~TocPurgeVisitor() override;

bool visitDatabase(const Catalogue& catalogue, const Store& store) override;
// bool visitDatabase(const Catalogue& catalogue, const Store& store) override;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -218,7 +218,8 @@ TocStatsReportVisitor::TocStatsReportVisitor(const TocCatalogue& catalogue, bool

TocStatsReportVisitor::~TocStatsReportVisitor() {}

bool TocStatsReportVisitor::visitDatabase(const Catalogue& catalogue, const Store& store) {
//bool TocStatsReportVisitor::visitDatabase(const Catalogue& catalogue, const Store& store) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -159,7 +159,8 @@ class TocStatsReportVisitor : public virtual StatsReportVisitor {

private: // methods

bool visitDatabase(const Catalogue& catalogue, const Store& store) override;
// bool visitDatabase(const Catalogue& catalogue, const Store& store) override;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -100,14 +100,16 @@ TocWipeVisitor::TocWipeVisitor(const TocCatalogue& catalogue,
TocWipeVisitor::~TocWipeVisitor() {}


bool TocWipeVisitor::visitDatabase(const Catalogue& catalogue, const Store& store) {
// bool TocWipeVisitor::visitDatabase(const Catalogue& catalogue, const Store& store) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove, also below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -37,7 +37,8 @@ class TocWipeVisitor : public WipeVisitor {

private: // methods

bool visitDatabase(const Catalogue& catalogue, const Store& store) override;
// bool visitDatabase(const Catalogue& catalogue, const Store& store) override;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -47,7 +47,7 @@ class Store {

virtual std::string type() const = 0;
virtual bool open() = 0;
virtual void flush() = 0;
virtual size_t flush() = 0;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks innocent at first, but ti me this implies that I now have to check the result here and ensure it is what the calling code expects.

EDIT: Ok I have looked at the use and I think this should be changed back to return void. The use I see is only to add an assert to ensure the flush call to the store matches (in terms of number of flushed locations).

src/fdb5/database/Archiver.cc Outdated Show resolved Hide resolved
public:
ConnectionError(const int);
ConnectionError(const int, const eckit::net::Endpoint&);
static size_t bufferSize() { return 1024*1024; }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems to be a bit excessive, from looking at the mars request logs from one day (2024-11-11) I can see that <1% requests exceed 4Kb (as text) and the largest is ~245Kb. This probably should not be a one size fits all approach. It would be very nice if a MarsRequest type would be able to give an upper bound / exact size for its serialization methods.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, this was increased from 4096 over the summer when we needed to list/read from 40 years of climate-dt data on the lumi databridge.

Probably would be sensible to have a smaller buffer, and transfer larger requests in chunks if need be.

@@ -57,7 +62,7 @@ class EntryVisitor : public eckit::NonCopyable {

// n.b. non-owning
const Catalogue* currentCatalogue_ = nullptr;
const Store* currentStore_ = nullptr;
mutable Store* currentStore_ = nullptr;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this now gets deleted, the above comment does not longer fit. Also this should be a unique_ptr as it is even initialized from the release of a unique_ptr in EntryVisitor.cc:46

Copy link
Contributor Author

@Ozaq Ozaq Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed comments, created a ticket for the unique_ptr

Store& EntryVisitor::store() const {
if (!currentStore_) {
ASSERT(currentCatalogue_);
currentStore_ = currentCatalogue_->buildStore().release();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be a raw pointer but a unique_ptr

Comment on lines +128 to +132
eckit::Buffer buf = controlWriteReadResponse(remote::Message::Stores, generateRequestID());
eckit::MemoryStream s(buf);
size_t numStores;
s >> numStores;
ASSERT(numStores > 0);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like that we deserialize from a stream here although we send a specific message on the control channel. I would expect that there is a specific response for each request and hence the response should arrive back at the caller deserialised

E.g.

StoreResponse response = controlWriteReadResponse(remote::Message::Stores, generateRequestID())
if(response.stores == 0) {
// and so on


if (!archiveFuture_.valid()) {
// Client
bool RemoteFDB::handle(remote::Message message, bool control, uint32_t requestID) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The control parameter is unused

Comment on lines +358 to +386
if (hdr.clientID()) {
bool handled = false;

ASSERT(hdr.control() || single_);

auto pp = promises_.find(hdr.requestID);
if (pp != promises_.end()) {
std::lock_guard<std::mutex> lock(promisesMutex_);
if (hdr.payloadSize == 0) {
ASSERT(hdr.message == Message::Received);
pp->second.set_value(eckit::Buffer(0));
} else {
pp->second.set_value(std::move(payload));
}
promises_.erase(pp);
handled = true;
} else {
Client* client = nullptr;
{
std::lock_guard<std::mutex> lock(clientsMutex_);

auto it = clients_.find(hdr.clientID());
if (it == clients_.end()) {
std::stringstream ss;
ss << "ERROR: connection=" << controlEndpoint_ << " received [clientID="<< hdr.clientID() << ",requestID="<< hdr.requestID << ",message=" << hdr.message << ",payload=" << hdr.payloadSize << "]" << std::endl;
ss << "Unexpected answer for clientID recieved (" << hdr.clientID() << "). ABORTING";
eckit::Log::status() << ss.str() << std::endl;
eckit::Log::error() << "Retrieving... " << ss.str() << std::endl;
throw eckit::SeriousBug(ss.str(), Here());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the end of this code block it is considered to be a serious bug if the client id is unknown, however we only enter this code if there is ANY client ID. If there is no client ID the code will just continue with the next iteration of the for-ever loop. Is this intentional? I would argue that no client id is the same error case as an unknown client id.

src/fdb5/remote/client/RemoteCatalogue.h Outdated Show resolved Hide resolved
void checkUID() const override;
eckit::URI uri() const override;

void sendArchiveData(uint32_t id, const Key& key, std::unique_ptr<FieldLocation> fieldLocation);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is unused as far as I can tell.

src/fdb5/remote/client/RemoteCatalogue.cc Outdated Show resolved Hide resolved
Comment on lines 45 to 58
Buffer keyBuffer(4096);
MemoryStream keyStream(keyBuffer);
keyStream << currentIndexKey_;
keyStream << key;

Buffer locBuffer(4096);
MemoryStream locStream(locBuffer);
locStream << *fieldLocation;

std::vector<std::pair<const void*, uint32_t>> payloads;
payloads.push_back(std::pair<const void*, uint32_t>{keyBuffer, keyStream.position()});
payloads.push_back(std::pair<const void*, uint32_t>{locBuffer, locStream.position()});

dataWrite(Message::Blob, id, payloads);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is serialization and deserialization code in many places and I think this needs to be abstracted away into the read/write methods.

With the current implementation I see problems if we ever want to evolve the protocol, either by changing the serialization format or by introducing new / deprecating old fields.

I would like to see something along the lines of:

struct MessageBlob {
  // Contains everything the message entails
};

auto msg = createMessage(...) // Or use a builder
write(msg) // This takes care of buffer handling & serialization

@danovaro danovaro marked this pull request as ready for review January 14, 2025 09:07
@danovaro danovaro merged commit 6883e58 into develop Jan 14, 2025
137 of 140 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants