Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce an optional thread pool for DSL rules and events #3890

Merged
merged 1 commit into from
Apr 29, 2024

Conversation

joerg1985
Copy link
Contributor

@joerg1985 joerg1985 commented Nov 26, 2023

This is basically a reincarnation of the idea in #2182 with a different approach.

The SequentialScheduledExecutorService in this PR will act like a Executors.newSingleThreadScheduledExecutor, but it does use a thread pool to execute the tasks. The mechanism to block the ScheduledExecutorService to run tasks concurrently is based on a chain of CompletableFutures. Each instance has a reference to the last CompletableFuture and will call handleAsync to add a new task. This will ensure tasks are running sequentially.
A .schedule call will register a callback which will then submit the task to perform now, ending in the same handleAsync call on the CompletableFutures, to ensure these are also running sequentially.

The only side effect a rule has, is to not run inside the same thread again and again. This might break things like a ThreadLocal, therefore a Rule has to opt-in to this new behavior (see Rule.isThreadBound).

This feature is currently disabled by default and has to be enabled by setting the system property org.openhab.core.common.SequentialScheduledExecutorService to true.

When deserializing an existing RuleTemplate the threadBound flag should be set true. I am not sure how to do this, i tried in the RuleTemplateDTO by setting threadBound true for a new instance.

Further work: The rule DSLRuleProvider could inspect the DSL and set threadBound to true when a ThreadLocal or Thread is used by the rule. This will ensure no exisiting DSL rule will break.

@joerg1985 joerg1985 requested a review from a team as a code owner November 26, 2023 14:15
@joerg1985 joerg1985 marked this pull request as draft November 26, 2023 14:32
@joerg1985 joerg1985 force-pushed the sequential-executor branch 2 times, most recently from c308ac1 to 507590c Compare November 28, 2023 20:11
@joerg1985 joerg1985 marked this pull request as ready for review November 28, 2023 20:39
@joerg1985
Copy link
Contributor Author

It might be better to use VirtualThreads as soon as java 21 is the minimum version, this would make things easier. Less code and ThreadLocals would still work without special handling.

@joerg1985 joerg1985 marked this pull request as draft December 4, 2023 19:40
@kaikreuzer
Copy link
Member

It will still take quite some time before we move to Java 21 as a minimal requirement...

@joerg1985
Copy link
Contributor Author

@kaikreuzer okay, so i will push this further. Do you think it does make sense to have this Rule.isThreadBound logic or is this over engineered? The jruby rule support requesting it in the old tickets is deprecated and if no one else might need this?

@kaikreuzer
Copy link
Member

@joerg1985 I was asking myself where we would need it. For the DSL rules I don't think it is necessary; besides scheduling timers, there is no thread support available and in the past we already had thread pools for the execution in place.
So unless anyone from the scripting side (@rkoshak?) asks for it, I would leave this feature aside.

@rkoshak
Copy link

rkoshak commented Dec 5, 2023

I'm not sure I understand what the "it" is here under discussion.

From my perspective, in the old days there was a relatively limited thread pool for rules and any individual rule could be running multiple times across more than one thread. This could and often did lead to all the rules failing to run because of just one long running rule ran amok. I want to avoid reintroducing that behavior.

It's also my understanding that before the introduction of the cache, the only reason users of Nashorn JS scripts in the UI was a side effect that the rule's context was reused for each run of the rule: this.foo = (this.foo === undefined) ? 'initial value' : this.foo;. This line means if this.foo already exists, use that, otherwise initialize it to initial value.

I don't know if that worked because the thread was reused or something else. I don't think we necessarily need to be overly concerned with how Nashorn UI rules worked but it would break some user's rules if this behavior were to change.

Finally, there is an issue where the first time a JS Scripting (and therefore Blockly) script is loaded it can take up to 30 seconds to run the first time on 32-bit ARM machines (i.e. the recommended openHABian config). If reusing threads also means reloading rules this problem will become really bad as it won't just be the first time the rule runs, it will be random or it will be every time.

So, as long as we retain the situation where one rule running amok can't impact other rule's ability to run, I'm happy. As long as using a thread pool doesn't require reloading a script for each thread it runs in, I'm happy. I do not want to go back to the days where we needed to implement reentrant locks in our rules to prevent them from running more than once at a time and with or without that a poorly written rule can bring down all rules by exhausting the thread pool.

@joerg1985
Copy link
Contributor Author

@rkoshak thanks for the feedback on this. The PR will not allow a rule to block the others by consuming all the threads or run concurrently. The 'it' question was - as i understood:
Is there any need to ensure some rules must be called on the same thread all the time?
The current idea to realize this is to have a flag for each rule and keep the one rule == one thread logic for rules with this flag set.
In case this logic is not needed this flag could be removed.

@rkoshak
Copy link

rkoshak commented Dec 6, 2023

Is there any need to ensure some rules must be called on the same thread all the time?

I don't know enough about the implications.

Does running the rule on another thread require the Script Actions/Conditions to be reloaded or reinitialized?

  • If not, this should be transparent to the users and it's all good
  • If so, the change may break some older Nashorn JS Script Actions/Conditions and it will cause problems for JS Scripting and Blockly users.

There is a problem in JS Scripting right now where it can take 20-40 seconds to run the rule that first time after it's loaded/reloaded. Having to randomly reload the scripts if it's put on a different Thread would be a nightmare.

@joerg1985
Copy link
Contributor Author

@kaikreuzer Okay it looks like there might be issues, but as long as the user has to opt-in via the org.openhab.core.common.SequentialScheduledExecutorService system property everything should be fine.
In some years when the JDK 21 is used, the system property can be removed and the ThreadPoolManager.newSequentialScheduledExecutorService could return a Executors.newSingleThreadExecutor backed by a virtual thread by default.

So i would suggest to:

  • move the org.openhab.core.common.SequentialScheduledExecutorService check inside the ThreadPoolManager.newSequentialScheduledExecutorService
  • update the arguments of ThreadPoolManager.newSequentialScheduledExecutorService to switch to the virtual threads without changing the signature of the method.
  • remove the threadBound flag from the Rule

Any concerns about this?

@kaikreuzer
Copy link
Member

If I get it right, with the new system property, it would be a "hidden" feature that brave users can try out by activating it. For all others there won't be any change, correct?
If so, yes: With this approach we could get some experience with the behavior and see whether it has any negative impact on the initialization of JS rules. If this is not the case, we might one day decide to make this the default and only allow opt-outs.

@joerg1985
Copy link
Contributor Author

joerg1985 commented Dec 7, 2023

@kaikreuzer you are right, this will protect from trouble.

@joerg1985 joerg1985 force-pushed the sequential-executor branch 3 times, most recently from 7d7d306 to db4a88c Compare December 7, 2023 19:10
@joerg1985 joerg1985 marked this pull request as ready for review December 7, 2023 19:37
@joerg1985
Copy link
Contributor Author

@kaikreuzer i just updated the PR, the flag is gone and the review should be easier now.

@joerg1985
Copy link
Contributor Author

@kaikreuzer is there any thing left that i need to do, before this can be merged?
Just asking, because i have some time the next days.

@morph166955
Copy link

So while I don't disagree with the concept, please please please please please make sure that there is a way to turn this off for power users. I've had so many issues with thread pools causing congestion over the last few years that the concept of adding one is terrifying to me. I have a ton of rules that run concurrently. I also understand that I'm a power user and as such my OH is running on a relatively beefy device that can handle such a load. Again, I can absolutely understand the reason for this especially on lower end platforms, it's why I wrote the original PR that you noted above, just please make sure there is a way to run this without a pool as well. Thank you!

@joerg1985
Copy link
Contributor Author

@morph166955 the current plan is to opt-in via a system property to avoid general issues.

Regarding the congestion of the pool, this PR uses a unlimited pool, in the worst case the number of threads in the pool is equal to the number of rules. There will be some latency to run new threads, but this are only milliseconds.

This sounds like you could be the perfect tester for this, or at least share some numbers? The total number of rules and the number if concurrent executing rules would be interesting for me.

@joerg1985 joerg1985 force-pushed the sequential-executor branch from db4a88c to 847a582 Compare January 10, 2024 18:45
@J-N-K
Copy link
Member

J-N-K commented Feb 13, 2024

@kaikreuzer What is the state here?

Copy link
Member

@kaikreuzer kaikreuzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry that it took me so long - please find some review comments below.
I have to admit that I do not fully understand the implemented mechanism here. In general, my experience with implementing own executor services is that this is a very tricky job and that a lot of things can go wrong. So I would definitely consider this feature for now experimental and we should indeed only enable it on an opt-in basis for testing. Nonetheless, I'd love to see it working and reducing the number of threads that are used, so let's move forward with it!

@joerg1985 joerg1985 force-pushed the sequential-executor branch from 847a582 to 1cbead1 Compare April 11, 2024 19:44
@joerg1985
Copy link
Contributor Author

@kaikreuzer thank you for the review, i addressed most of these points. There are two open points with questions how to continue.

@joerg1985 joerg1985 force-pushed the sequential-executor branch from 1cbead1 to b18c48c Compare April 28, 2024 10:22
@joerg1985
Copy link
Contributor Author

@kaikreuzer now all the points are addressed, so i think this is ready 🚀

@kaikreuzer
Copy link
Member

Thanks @joerg1985!
Do I understand the code correctly, that you didn't introduce any new configuration parameter, but now rather allow using "events" and "rules" to be defined as pools > 0 and this setting makes the new logic become active? If I understand this correctly, this looks like a very neat solution!

@joerg1985
Copy link
Contributor Author

@kaikreuzer yes, this is correct. To enable the new executor the pool size must be set > 0, this allows to enable it only for specific pools.

Copy link
Member

@kaikreuzer kaikreuzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, let's start testing it!

@kaikreuzer kaikreuzer merged commit c3ada84 into openhab:main Apr 29, 2024
5 checks passed
@kaikreuzer kaikreuzer added the enhancement An enhancement or new feature of the Core label Apr 29, 2024
@kaikreuzer kaikreuzer added this to the 4.2 milestone Apr 29, 2024
@kaikreuzer kaikreuzer changed the title Use a thread pool backed sequential executor for DSL rules and events Introduce an optional thread pool for DSL rules and events Apr 29, 2024
@lolodomo
Copy link
Contributor

yes, this is correct. To enable the new executor the pool size must be set > 0, this allows to enable it only for specific pools.

Is it documented somewhere?

@joerg1985
Copy link
Contributor Author

@lolodomo no, i am currently testing (and fixing #4247). As soon as there is a stable version, i will write to the Milestone topic to find more testers.

Do you have a concrete place where this should be documented in the future?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement An enhancement or new feature of the Core
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants