[test: shm] Add testcase for fast reconnection #1916

DownerCase · 2025-01-17T13:11:27Z

Description

Will eventually track down and resolve an issue with subscribers connecting but not receiving data.

Current status: Test case refined, problem understood (see issue for details).

Related issues

Fixes [eCAL 6] Linux - SHM Subscribers can appear connected but do not receive data #1914

Cherry-pick to

github-actions

clang-tidy made some suggestions

ecal/tests/cpp/pubsub_test/src/pubsub_test_shm.cpp

github-actions

clang-tidy made some suggestions

ecal/tests/cpp/pubsub_test/src/pubsub_test_shm.cpp

github-actions

clang-tidy made some suggestions

ecal/core/src/ecal_event.cpp

DownerCase · 2025-01-24T13:56:21Z

Responding to @hannemn's comment from here:

@KerstinKeller Yes, this definitely needs to be solved more cleanly, even if it is the safest solution for a short term release now. I still had an approach where the validity of the event handle is checked, but I like idea with reference counter that @DownerCase proposed even more. :-) However, this requires most likely a larger test loop.

I explored a few different options targeting either the unlinking of the shm file or interrupting the file observer. I was unable to find a solution for the later option that didn't involve adding something to named_event_t, at which point you totally break compatibility with things on eCAL v5.X

So, unless you changed the scheme used for the shm files since v5, trying to use a v6 subscriber with a v5 publisher will have the subscriber try and remove the shm file out from under the publisher as it'll see a 0 refcount. I don't know what your policy is w.r.t compatibility between v5 and v6 (I also haven't tested anything).

Regarding checking the validity of the event handle, that prevents you from doing pthread_*_destroy (not that it was being done previously anyway!) because at destructor time you can't know if the resource is in use elsewhere.

KerstinKeller · 2025-01-24T14:12:03Z

Hi @DownerCase, thanks for digging so deep into it.
At the moment, we would like to avoid to break wire compatibility (that's what we call the possibility to communicate with eCAL over different eCAL versions).
And when we do break it, it would be great to put in some additional features, like Shared Locks for reading them SHM files, etc.

@hannemn also tried interrupting the Memfile Observer, this is another area where it might be worth refactoring / restructuring the code a bit.

The current solution, to go back to eCAL 5 behavior, and not to unlink any file until process destruction, also has the benefit that it will work seemlessly with eCAL 5.

However, it would be great, if we could just take your testcase and see if this now works with current master.

DownerCase · 2025-01-24T15:02:35Z

Hi @DownerCase, thanks for digging so deep into it. At the moment, we would like to avoid to break wire compatibility (that's what we call the possibility to communicate with eCAL over different eCAL versions). And when we do break it, it would be great to put in some additional features, like Shared Locks for reading them SHM files, etc.

However, it would be great, if we could just take your testcase and see if this now works with current master.

@KerstinKeller Totally understandable, I've rebased onto master and removed the ref counting changes. I believe the test will pass now.

KerstinKeller

The testcase looks good, I am just wondering about printing to std::cerr. Does this show up in the test logs?

KerstinKeller · 2025-01-27T08:15:32Z

ecal/tests/cpp/pubsub_test/src/pubsub_test_shm.cpp

+            const eCAL::SDataTypeInformation & /*data_type_info_*/,
+            const eCAL::SReceiveCallbackData & /*data_*/) {
+          {
+            std::cerr << "Callback received message\n";


@DownerCase why are you printing to cerr here?

Force of habit mostly, cerr's unbuffered so don't need to pepper std::endl/flush.

Shows up in the logs all the same: https://github.com/eclipse-ecal/ecal/actions/runs/12952082976/job/36128611552#step:11:805

I am already not so happy with all the tests logging in the unittests (I know we also do have quite a bunch of tests that do it).

I'd prefer if we only see what's the problem on test failures, e.g. you could log to the google test command:
EXPECT_TRUE(data_received) << "Missing data from publisher" ;

Understood, I tried to keep it to a minimum but found the extra messages useful when diagnosing a failure (particularly when finding when internal eCAL debug logs are occurring relative to the test).

As an in-between what about logging via the eCAL logging functionality on the debug4 level (or similar)? So that normally no logs will be visible but they're there when you need to dig into it.

I'll add a descriptive error message to the EXPECT_TRUE

DownerCase changed the title ~~WIP: Resolve topic resubscribing creating "zombie subscribers"~~ WIP: Resolve SHM topic resubscribing creating "zombie subscribers" Jan 17, 2025

github-actions bot reviewed Jan 17, 2025

View reviewed changes

ecal/tests/cpp/pubsub_test/src/pubsub_test_shm.cpp Show resolved Hide resolved

ecal/tests/cpp/pubsub_test/src/pubsub_test_shm.cpp Outdated Show resolved Hide resolved

DownerCase changed the title ~~WIP: Resolve SHM topic resubscribing creating "zombie subscribers"~~ WIP: Resolve SHM topic resubscribing in less than the registration timeout period creating "zombie subscribers" Jan 18, 2025

github-actions bot reviewed Jan 18, 2025

View reviewed changes

ecal/tests/cpp/pubsub_test/src/pubsub_test_shm.cpp Show resolved Hide resolved

github-actions bot reviewed Jan 24, 2025

View reviewed changes

ecal/core/src/ecal_event.cpp Outdated Show resolved Hide resolved

ecal/core/src/ecal_event.cpp Outdated Show resolved Hide resolved

DownerCase changed the title ~~WIP: Resolve SHM topic resubscribing in less than the registration timeout period creating "zombie subscribers"~~ Resolve SHM topic resubscribing in less than the registration timeout period creating "zombie subscribers" Jan 24, 2025

DownerCase marked this pull request as ready for review January 24, 2025 13:57

DownerCase changed the title ~~Resolve SHM topic resubscribing in less than the registration timeout period creating "zombie subscribers"~~ [shm: linux] Resolve SHM topic resubscribing in less than the registration timeout period creating "zombie subscribers" Jan 24, 2025

DownerCase added 4 commits January 24, 2025 14:58

Create reproducible test case

ebe987b

Simplify test with new understanding

70e59a9

Refactor test

4da28bc

Fixup for namespace change

97ba864

DownerCase force-pushed the zombie_subscriber branch from 986fe0a to 97ba864 Compare January 24, 2025 14:59

DownerCase changed the title ~~[shm: linux] Resolve SHM topic resubscribing in less than the registration timeout period creating "zombie subscribers"~~ [test: shm] Add testcase for fast reconnection Jan 24, 2025

KerstinKeller added the cherry-pick-to-NONE Don't cherry-pick these changes label Jan 24, 2025

KerstinKeller reviewed Jan 27, 2025

View reviewed changes

Replace callback log with better test failure message

be2237e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[test: shm] Add testcase for fast reconnection #1916

[test: shm] Add testcase for fast reconnection #1916

DownerCase commented Jan 17, 2025 •

edited

Loading

github-actions bot left a comment

github-actions bot left a comment

github-actions bot left a comment

DownerCase commented Jan 24, 2025

KerstinKeller commented Jan 24, 2025

DownerCase commented Jan 24, 2025

KerstinKeller left a comment

KerstinKeller Jan 27, 2025

DownerCase Jan 27, 2025

KerstinKeller Jan 27, 2025

DownerCase Jan 27, 2025

[test: shm] Add testcase for fast reconnection #1916

Are you sure you want to change the base?

[test: shm] Add testcase for fast reconnection #1916

Conversation

DownerCase commented Jan 17, 2025 • edited Loading

Description

Related issues

Cherry-pick to

github-actions bot left a comment

Choose a reason for hiding this comment

github-actions bot left a comment

Choose a reason for hiding this comment

github-actions bot left a comment

Choose a reason for hiding this comment

DownerCase commented Jan 24, 2025

KerstinKeller commented Jan 24, 2025

DownerCase commented Jan 24, 2025

KerstinKeller left a comment

Choose a reason for hiding this comment

KerstinKeller Jan 27, 2025

Choose a reason for hiding this comment

DownerCase Jan 27, 2025

Choose a reason for hiding this comment

KerstinKeller Jan 27, 2025

Choose a reason for hiding this comment

DownerCase Jan 27, 2025

Choose a reason for hiding this comment

DownerCase commented Jan 17, 2025 •

edited

Loading