New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

loadbalancer: better asymptotic behavior of host updates #2745

Merged

bryce-anderson merged 4 commits into apple:main from bryce-anderson:bl_anderson/host_list_optimization

Nov 17, 2023

Contributor

bryce-anderson commented Nov 8, 2023

Motivation:

Our current host update process is quadratic both in time and space due to
searching for hosts using list iteration and performing a COW for each entry
of the SD event set.

Modifications:

Make a Map<Address, SDEvent>. This lets us iterate through existing events
with O(M+N) complexity instead of O(M*N) complexity. As a bonus, we can
now easily detect if we get multiple SD events for the same address in a batch.

Result:

We should see slightly worse performance in the small M*N case but
dramatically better performance in the large case. Since SD events shouldn't
be terribly common and making a hashmap is cheap relative to the total cost
of a SD update, this is the safer tradeoff.


          loadbalancer: Fix asymptotic behavior of host updates

ef63002

Motivation:

Our current host update process is quadratic both in
time and space due to searching for hosts using list
iteration and performing a COW for each entry of the
SD event set.

Modifications:

Make a Map<Address, SDEvent>. This lets us iterate
through existing events with O(M+N) complexity
instead of O(M*N) complexity. As a bonus, we can
now easily detect if we get multiple SD events for
the same address in a batch.

Result:

We should see slightly worse performance in the small
M*N case but dramatically better performance in the
large case. Since SD events shouldn't be terribly
common and making a hashmap is cheap relative to the
total cost of a SD update, this is the safer
tradeoff.

bryce-anderson requested review from mgodave, tkountis and idelpivnitskiy

November 8, 2023 03:04


          Merge branch 'main' into bl_anderson/host_list_optimization

6fb2e3f

idelpivnitskiy reviewed

View reviewed changes

Member

idelpivnitskiy left a comment

My 2c: This feels like a premature optimization that could be done later if it shows up on the flame graph. Taking into account that:

This is not a hot path, it runs on a separate SD thread.
Most of the time SD does not send new events frequently.
Overall number of hosts is expected to be reasonable low. If it's not true, we must subset, which is a different story. When we are talking about small number of hosts, there are more benefits to do quadratic. For example, Java does the same for sorting and the threshold to switch from quadratic alg is quite high. In DualPivotQuicksort it's 47.

At this point, I would value more that pre-existing code was well tested and production proved.

Overall, I'm not opposed to this change, just sharing opinion. If you would like to proceed, these are my comments:

servicetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/Host.java Show resolved Hide resolved

...cetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/NewRoundRobinLoadBalancer.java Outdated Show resolved Hide resolved

...cetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/NewRoundRobinLoadBalancer.java Show resolved Hide resolved

...cetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/NewRoundRobinLoadBalancer.java Outdated

+                              for (ServiceDiscovererEvent<ResolvedAddress> event : events) {
+                                  ServiceDiscovererEvent<ResolvedAddress> old = eventMap.put(event.address(), event);
+                                  if (old != null) {
+                                      LOGGER.error("Multiple ServiceDiscoveryEvent's detected for address {}. Event: {}.",

Member

idelpivnitskiy Nov 12, 2023

According to the current agreement, this is not an error. It's ok to receive multiple events for the same address. LB impl can choose how to treat that behavior:

RR always takes the latest one.
If LB considers meta-data, it should either take the latest mata-data or combine all meta-data between events together.
Some other LB may consider multiple appearance of the same address as an increased priority for that address.

If you would like to say that this particular LB doesn't like that behavior, consider using debug or info level. error will scary users bcz they have queries to monitor any errors.

Contributor Author

bryce-anderson Nov 14, 2023

I switched it to behave same as RR and log at debug.
As an aside, the lack of contract here is kind of difficult to program around and blurs the separation of SD and LB a good amount since they can start to be behavior dependent, not just API dependent. What are the odds we could/want to tighten up the contract?

Member

idelpivnitskiy Nov 14, 2023

IIRC, the first problem was that it was hard to make sure all SD implementations behave identically (I know a few). It's not trivial to explain users how to program it correctly and filter out duplicates, taking into account retries and preserving a state between retries in reactive streams. So, we made the contract easier and said that it's simpler to take care of it at LB layer. And in addition, we considered use-cases described in my first message when LB can have flexibility how to interpret events.

We can absolutely tighten them up for cases when we support both implementations, but it's hard to tighten something that we do not control.

...cetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/NewRoundRobinLoadBalancer.java Outdated Show resolved Hide resolved

...cetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/NewRoundRobinLoadBalancer.java

+                          if (nextHosts.isEmpty()) {
+                              eventStreamProcessor.onNext(LOAD_BALANCER_NOT_READY_EVENT);
+                          } else if (sendReadyEvent) {
+                              eventStreamProcessor.onNext(LOAD_BALANCER_READY_EVENT);

Member

idelpivnitskiy Nov 12, 2023

We should not send this event if LB already had hosts before and this event was already sent

Contributor Author

bryce-anderson Nov 14, 2023

Good point. On that topic, we don't currently send a NOT_READY event when a host drops out from say the EXPIRED state to UNAVAILABLE state and results in an empty list.

Member

idelpivnitskiy Nov 14, 2023

That's a good catch! Please file a ticket

bryce-anderson added 2 commits

November 13, 2023 18:09


          Feedback

cfc6901


          Only send ready event if the list was previously empty

2f87e80

idelpivnitskiy reviewed

View reviewed changes

...cetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/NewRoundRobinLoadBalancer.java

                               List<Host<ResolvedAddress, C>> usedHosts = usedHostsUpdater.get(NewRoundRobinLoadBalancer.this);
                               if (isClosedList(usedHosts)) {
                                   // We don't update if the load balancer is closed.
                                   return;
                               }
-                              nextHosts.clear();
+                              nextHosts = new ArrayList<>(usedHosts.size() + events.size());

Member

idelpivnitskiy Nov 14, 2023

Consider adding a comment to describe rational for this size estimate, might be helpful for someone who comes to this piece later without a context

idelpivnitskiy approved these changes

View reviewed changes

chrisvest reviewed

View reviewed changes

Contributor

chrisvest left a comment

I think it's a good idea to avoid quadratic algorithms where possible, since a rare order-of-magnitude-bigger-N event can potentially trip something from fine to not-fine. Compared to something like DualPivotQuicksort, we have allocations going on, so a threshold for where quadratic would be "fine" would likely be lower.

Only comment I have is about the added TODO comment.

...cetalk-loadbalancer/src/main/java/io/servicetalk/loadbalancer/NewRoundRobinLoadBalancer.java Show resolved Hide resolved

tkountis approved these changes

View reviewed changes

Contributor

tkountis left a comment

nothing to add more here - final revision lgtm

chrisvest approved these changes

View reviewed changes

bryce-anderson merged commit 936e01b into apple:main

15 checks passed

bryce-anderson deleted the bl_anderson/host_list_optimization branch

November 17, 2023 16:32

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet