loadbalancer: Use a sequential execution concurrency model in DefaultLoadBalancer #2768

bryce-anderson · 2023-11-28T19:46:00Z

Motivation:

We use an atomic field and CAS operations to manage concurrency in the DefaultLoadBalancer. Because the CAS operation has to succeed before changes have taken affect, it's not easy to make events coordinate with that model. A few examples:

it will be awkward to add an event observer to the load balancer since we may end up re-doing server-set updates if we fail a CAS.
there is also a question of ordering of observer events since the observer interactions may get out of order wrt the state mutations that induced them.
we can't currently send an LOAD_BALANCER_NOT_READY_EVENT when the last host expires because we would risk the event racing with SD updates that triggered related events.

CAS failures may also be relatively common due to re-entrance: SD update events can cause hosts to be closed that that may synchronously cause the hosts .afterFinally listeners to remove themselves from the host list, resulting in a followup CAS failure.

Modifications:

use an unbounded processor as a queue for Runnables that are executed sequentially and funnel sensitive operations through that queue.
Send the LOAD_BLANACER_NOT_READY_EVENT when a host expires and results in an empty host set.

Result:

The DefaultLoadBalancer event system is easier to read and is more correct wrt readiness.
We can add an event observer pattern without worrying about race conditions or accidentally replaying events due to a CAS failure.

…LoadBalancer Motivation: We use an atomic field and CAS operations to manage concurrency in the DefaultLoadBalancer. Because the CAS operation has to succeed before changes have taken affect, it's not easy to make events coordinate with that model. A few examples: - it will be awkward to add an event observer to the load balancer since we may end up re-doing server-set updates if we fail a CAS. - there is also a question of ordering of observer events since the observer interactions may get out of order wrt the state mutations that induced them. - we can't currently send an LOAD_BALANCER_NOT_READY_EVENT when the last host expires because we would risk the event racing with SD updates that triggered related events. CAS failures may also be relatively common due to re-entrance: SD update events can cause hosts to be closed that that may synchronously cause the hosts `.afterFinally` listeners to remove themselves from the host list, resulting in a followup CAS failure. Modifications: - use an unbounded processor as a queue for Runnables that are executed sequentially and funnel sensitive operations through that queue. - Send the LOAD_BLANACER_NOT_READY_EVENT when a host expires and results in an empty host set. Result: What is the result of this change?

servicetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/SequentialExecutor.java

bryce-anderson · 2023-11-30T00:10:36Z

servicetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/DefaultLoadBalancer.java

@@ -179,6 +154,34 @@ private void subscribeToEvents(boolean resubscribe) {
        }
    }

+    // This method is called eagerly, meaning the completable will be immediately subscribed to,
+    // so we don't need to do any Completable.defer business.
+    private Completable doClose(final boolean graceful) {


This is ugly and looks a lot like the io.servicetalk.concurrent.api.Executor.submit method. I didn't go with that because all submissions to that type are cancellable (but that's intrinsically racy so I suppose we could just ignore cancels) and there are a bunch of other methods on there I don't think we want to support. However, maybe it's fine: we might have a time source in the healthCheck already.

Opinions appreciated.

servicetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/SequentialExecutor.java

Scottmitch

I took a quick peek ... iiuc this won't impact connection selection on the hot path and is targeted at discovery/control events (closure, ..)?

servicetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/SequentialExecutor.java

bryce-anderson · 2023-12-01T19:05:35Z

I took a quick peek ... iiuc this won't impact connection selection on the hot path and is targeted at discovery/control events (closure, ..)?

Yes, that's right: its entirely for sequencing control events. The hot path remains the same volatile read pattern that existed before.

Scottmitch

general approach lgtm, few comments/suggestions

servicetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/SequentialExecutor.java

Scottmitch · 2023-12-02T02:05:05Z

servicetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/SequentialExecutor.java

+        try {
+            runnable.run();
+        } catch (Exception ex) {
+            logger.error("Exception caught in sequential execution", ex);


do we always want to swallow/log exceptions, or should we save/propagate them after we are done draining? It is simpler to swallow/log but also may limit visibility and error handling outside. that way we also don't need to take a logger as a constructor arg.

There is some ambiguity in the propagation of of exceptions since it may not be the thread that requested the work that runs it. We could add an exception handler to make it configurable.

I added an exception handler as a constructor arg. It is handy for performing resource disposal if we get an exception.

Scottmitch · 2023-12-02T02:27:32Z

servicetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/DefaultLoadBalancer.java

+        CompletableSource.Processor processor = Processors.newCompletableProcessor();
+        sequentialExecutor.execute(() -> {
+            if (!isClosed) {
+                discoveryCancellable.cancel();


lets put other code in try/finally since this may call externally and throw.

You're right. I think I've done what you've suggested but it's worth a double check.

I also feel this is messier than it needs to be, but otoh the io.st.concurrent.api.Executor is a bit thicker of an API than I think we want. Any ideas appreciated.

servicetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/DefaultLoadBalancer.java

servicetalk-loadbalancer/src/test/java/io/servicetalk/loadbalancer/SequentialExecutorTest.java

tkountis

No comments here, i like the new approach, my only consideration would be whether we should extract the "sequential" execution to an operator, it feels awkward to have custom execution model localized, it seems like a feature of itself. see. https://reactivex.io/documentation/operators/serialize.html (we would still have cancellation/close to race with)

servicetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/SequentialExecutor.java

bryce-anderson · 2023-12-06T23:53:13Z

This has a few reviews and it seems like the pattern, which is the most important part, is desirable so I'm going to merge this to unblock other work and we can iterate on it more in the future.

idelpivnitskiy

The approach LGTM, have a few questions and also opened a follow-up with minor corrections: #2775

idelpivnitskiy · 2023-12-08T00:49:14Z

servicetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/DefaultLoadBalancer.java

+        this.sequentialExecutor = new SequentialExecutor((uncaughtException) -> {
+            LOGGER.error("{}: Uncaught exception in SequentialExecutor triggered closing of the load balancer.",
+                    this, uncaughtException);
+            closeAsync().subscribe();


Seems dangerous that one slipped exception can close the entire LB, stopping all traffic. Understand that this way we will be notified immediately, but taking into account that a fix might take time to deliver, will it be safer to keep the traffic flowing and just logging an exception?

I suppose it depends on the reason for failure. Host-set updates may have been corrupted and because we use a differential stream it may never recover. This is catastrophizing a bit, and this problem exists to a small extent even without update corruption, but failing on a host removal could mean that we may forever try sending traffic to a destination that may not be who we intended.

That said, I'm happy to just log for now.

idelpivnitskiy · 2023-12-08T01:02:13Z

servicetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/DefaultLoadBalancer.java

+            try {
+                if (!isClosed) {
+                    discoveryCancellable.cancel();
+                    eventStreamProcessor.onComplete();


Why these 2 lines protected by if (!isClosed)?
toAsyncCloseable has a CAS internally to make sure it's executed only once.

I believe in toAsyncClosable the ClosableResource can be called twice: once for gracefully and then again for a hard close.

idelpivnitskiy · 2023-12-08T01:03:14Z

servicetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/DefaultLoadBalancer.java

    private volatile List<Host<ResolvedAddress, C>> usedHosts = emptyList();
+    private volatile boolean isClosed;


What is a motivation to introduce additional volatile boolean instead of using ClosedList like it was before?

To delete code and be more explicit. In a follow up here I make isClosed non-volatile.

idelpivnitskiy · 2023-12-08T01:08:58Z

servicetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/DefaultLoadBalancer.java

-                                event.address(), event);
-                    }
+        private void sequentialOnNext(Collection<? extends ServiceDiscovererEvent<ResolvedAddress>> events) {
+            assert events != null && !events.isEmpty();


Technically, it's possible to receive an empty collection. Not necessary to fail it

The filtering for null or empty updates still happens on onNext. This was to assert that connection, but maybe it's unnecessary.

idelpivnitskiy · 2023-12-08T01:12:23Z

servicetalk-loadbalancer/src/test/java/io/servicetalk/loadbalancer/SequentialExecutorTest.java

+import static org.hamcrest.Matchers.contains;
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertNotNull;
+import static org.junit.jupiter.api.Assertions.assertNull;


Please prefer using hamcrest for new tests, it generates much better exception messages compare to jUnit asserts

idelpivnitskiy · 2023-12-08T01:12:48Z

servicetalk-loadbalancer/src/test/java/io/servicetalk/loadbalancer/SequentialExecutorTest.java

+        executor = new SequentialExecutor(exceptionHandler);
+        final RuntimeException ex = new RuntimeException("expected");
+        executor.execute(() -> {
+            throw ex;


We have a special DELIBERATE_EXCEPTION type for cases like this

bryce-anderson requested a review from tkountis November 28, 2023 19:46

bryce-anderson force-pushed the bl_anderson/load_balancer_concurrency branch from 963f926 to e4c8e38 Compare November 28, 2023 21:20

bryce-anderson added 2 commits November 29, 2023 15:44

Add SequentialExecutor instead of inlining it

decab60

Replace processor with a custom queue

0085327

bryce-anderson commented Nov 30, 2023

View reviewed changes

servicetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/SequentialExecutor.java Show resolved Hide resolved

bryce-anderson commented Nov 30, 2023

View reviewed changes

bryce-anderson requested review from mgodave, idelpivnitskiy and Scottmitch November 30, 2023 00:11

bryce-anderson marked this pull request as ready for review November 30, 2023 00:11

bryce-anderson changed the title ~~loadbalancer: Use a sequential execution concurrency model in Default…~~ loadbalancer: Use a sequential execution concurrency model in DefaultLoadBalancer Nov 30, 2023

mgodave approved these changes Nov 30, 2023

View reviewed changes

servicetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/SequentialExecutor.java Show resolved Hide resolved

Scottmitch reviewed Dec 1, 2023

View reviewed changes

servicetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/SequentialExecutor.java Outdated Show resolved Hide resolved

bryce-anderson added 4 commits December 1, 2023 15:03

Add a Thread.yield and some tests

964dbc6

better comments

b3d5385

Add copyright notice

938ef26

Fix spotbugs checks

e85067e

bryce-anderson requested a review from Scottmitch December 1, 2023 23:54

Scottmitch reviewed Dec 2, 2023

View reviewed changes

bryce-anderson added 2 commits December 4, 2023 15:05

Scotts feedback

ac6451b

Better docs

6d2a55f

bryce-anderson requested a review from Scottmitch December 4, 2023 22:11

tkountis approved these changes Dec 5, 2023

View reviewed changes

servicetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/SequentialExecutor.java Show resolved Hide resolved

bryce-anderson requested review from daschl and tkountis December 5, 2023 17:50

tkountis approved these changes Dec 5, 2023

View reviewed changes

bryce-anderson mentioned this pull request Dec 5, 2023

loadbalancer: add an observer pattern to DefaultLoadBalancer #2770

Merged

bryce-anderson merged commit e3a10f8 into apple:main Dec 6, 2023
15 checks passed

bryce-anderson deleted the bl_anderson/load_balancer_concurrency branch December 6, 2023 23:53

idelpivnitskiy reviewed Dec 8, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

loadbalancer: Use a sequential execution concurrency model in DefaultLoadBalancer #2768

loadbalancer: Use a sequential execution concurrency model in DefaultLoadBalancer #2768

bryce-anderson commented Nov 28, 2023 •

edited

Loading

bryce-anderson Nov 30, 2023

Scottmitch left a comment

bryce-anderson commented Dec 1, 2023

Scottmitch left a comment

Scottmitch Dec 2, 2023

bryce-anderson Dec 4, 2023

bryce-anderson Dec 4, 2023

Scottmitch Dec 2, 2023

bryce-anderson Dec 4, 2023

tkountis left a comment •

edited

Loading

bryce-anderson commented Dec 6, 2023

idelpivnitskiy left a comment

idelpivnitskiy Dec 8, 2023

bryce-anderson Dec 8, 2023

idelpivnitskiy Dec 8, 2023

bryce-anderson Dec 8, 2023

idelpivnitskiy Dec 8, 2023

bryce-anderson Dec 8, 2023

idelpivnitskiy Dec 8, 2023

bryce-anderson Dec 8, 2023

idelpivnitskiy Dec 8, 2023

idelpivnitskiy Dec 8, 2023

		private volatile List<Host<ResolvedAddress, C>> usedHosts = emptyList();
		private volatile boolean isClosed;

loadbalancer: Use a sequential execution concurrency model in DefaultLoadBalancer #2768

loadbalancer: Use a sequential execution concurrency model in DefaultLoadBalancer #2768

Conversation

bryce-anderson commented Nov 28, 2023 • edited Loading

Choose a reason for hiding this comment

Scottmitch left a comment

Choose a reason for hiding this comment

bryce-anderson commented Dec 1, 2023

Scottmitch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tkountis left a comment • edited Loading

Choose a reason for hiding this comment

bryce-anderson commented Dec 6, 2023

idelpivnitskiy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bryce-anderson commented Nov 28, 2023 •

edited

Loading

tkountis left a comment •

edited

Loading