Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[gce] Loss of connection and growing CPU usage when disabling / enabling a Thing #17940

Open
Jetblack31 opened this issue Dec 21, 2024 · 20 comments · May be fixed by #17949
Open

[gce] Loss of connection and growing CPU usage when disabling / enabling a Thing #17940

Jetblack31 opened this issue Dec 21, 2024 · 20 comments · May be fixed by #17949
Assignees
Labels
bug An unexpected problem or unintended behavior of an add-on

Comments

@Jetblack31
Copy link

Hello,

a bug that appears since my last upgrade of OH (currently running 4.3, I guess the issue appeared with 4.2).

Behavior

When disabling / enabling a IPX800 Thing using OH UI, the CPU load increases.
Each time I do it, CPU load increases more (see picture).
The only solution I found to reduce the CPU load is to restart openHAB.

Why do I disable / enable my IPX Thing ?
Actually, since I upgraded from OH2 to OH4 months ago, sometimes IPX800 stops working (connection issue). My solution is to disable/enable once a day the IPX800 Thing to be sure to keep it working.
This connection issue didn't exist with OH2.

Steps to Reproduce

  1. Disable IPX800 Thing with UI
  2. Wait 10 sec
  3. Enable IPX800 Thing with UI

Environment

  • openHAB 4.3 (but same issue with 4.2)
  • Raspian Bulleyes with Raspberry 4
Capture d’écran 2024-12-20 à 22 20 53
@Jetblack31 Jetblack31 added the bug An unexpected problem or unintended behavior of an add-on label Dec 21, 2024
@clinique clinique self-assigned this Dec 21, 2024
@clinique
Copy link
Contributor

clinique commented Dec 21, 2024

I observe 2 problems issues raised in this issue:

  • Connection lost issue, has to be resolved but can be tedious to reproduce. Would you have any evidence in your log file of the error message ?
  • Resources not being freed upon handler disabling. I think I already have some clues on what can be done for this.

@Jetblack31
Copy link
Author

Jetblack31 commented Dec 21, 2024

Regarding the connection lost issue, I've found these kind of messages in the logs :

2024-10-02 04:24:22.007 [WARN ] [.gce.internal.model.M2MMessageParser] - Error received from connector : Connection timed out

2024-10-02 05:00:03.849 [ERROR] [nal.common.AbstractInvocationHandler] - An error occurred while calling method 'ThingHandler.handleCommand()' on 'org.openhab.binding.gce.internal.handler.Ipx800v3Handler@19d5670': Cannot invoke "java.io.PrintWriter.write(String)" because "this.out" is null
at org.openhab.binding.gce.internal.handler.Ipx800DeviceConnector.send(Ipx800DeviceConnector.java:64) ~[?:?]
at org.openhab.binding.gce.internal.model.M2MMessageParser.setOutput(M2MMessageParser.java:110) ~[?:?]
at org.openhab.binding.gce.internal.handler.Ipx800v3Handler.lambda$8(Ipx800v3Handler.java:333) ~[?:?]
at org.openhab.binding.gce.internal.handler.Ipx800v3Handler.handleCommand(Ipx800v3Handler.java:333) ~[?:?]

2024-12-13 04:24:26.483 [WARN ] [ternal.handler.Ipx800DeviceConnector] - Communication error: 'java.net.ConnectException: Connection timed out'. Will retry in 5000 ms
2024-12-13 04:24:26.484 [WARN ] [.gce.internal.model.M2MMessageParser] - Error received from connector : Connection timed out

2024-12-13 04:51:03.996 [WARN ] [ternal.handler.Ipx800DeviceConnector] - Trying to send 'Set010' while the output stream is closed.
2024-12-13 04:52:03.854 [WARN ] [ternal.handler.Ipx800DeviceConnector] - Trying to send 'Set010' while the output stream is closed.
2024-12-13 04:53:03.864 [WARN ] [ternal.handler.Ipx800DeviceConnector] - Trying to send 'Set010' while the output stream is closed.

I hope this can help

@clinique clinique linked a pull request Dec 21, 2024 that will close this issue
@clinique
Copy link
Contributor

Hello @Jetblack31 , I have prepared an updated version to ensure resources are properly freed. Could you please try it and observe how it behaves over time : https://drive.google.com/file/d/1tl9zOYQe9XuEOGZINy6B_iUgUGZsLFzt/view?usp=drive_link .
It is compiled for JDK 17 and labelled OH 5.0.0 but should work on 4.3. Let me know.

@Jetblack31
Copy link
Author

Hello @clinique,

Thank you very much for the binding. However, it doesn't work. What I did :

  • Suppression of the current GCE binding from UI
  • Stop OH
  • place your binding jar file in the add ons folder
  • Start OH

In the logs, I obtain this message :

2024-12-28 10:30:20.882 [WARN ] [org.apache.felix.fileinstall ] - Error while starting bundle: file:/usr/share/openhab/addons/org.openhab.binding.gce-5.0.0-SNAPSHOT.jar
org.osgi.framework.BundleException: Could not resolve module: org.openhab.binding.gce [333]
Unresolved requirement: Require-Capability: osgi.ee; filter:="(&(osgi.ee=JavaSE)(version=21))

at org.eclipse.osgi.container.Module.start(Module.java:463) ~[org.eclipse.osgi-3.18.0.jar:?]
at org.eclipse.osgi.internal.framework.EquinoxBundle.start(EquinoxBundle.java:445) ~[org.eclipse.osgi-3.18.0.jar:?]
at org.apache.felix.fileinstall.internal.DirectoryWatcher.startBundle(DirectoryWatcher.java:1260) [bundleFile:3.7.4]
at org.apache.felix.fileinstall.internal.DirectoryWatcher.startBundles(DirectoryWatcher.java:1233) [bundleFile:3.7.4]
at org.apache.felix.fileinstall.internal.DirectoryWatcher.doProcess(DirectoryWatcher.java:520) [bundleFile:3.7.4]
at org.apache.felix.fileinstall.internal.DirectoryWatcher.process(DirectoryWatcher.java:365) [bundleFile:3.7.4]
at org.apache.felix.fileinstall.internal.DirectoryWatcher.run(DirectoryWatcher.java:316) [bundleFile:3.7.4]

See you!

Jetblack

@clinique
Copy link
Contributor

Hello @Jetblack31 , could you please try this new one : https://drive.google.com/file/d/1Q1DKBQLQzWru8zQBqrcsrDOmw8Sii8Cy/view?usp=drive_link

I installed it on my prod OH4.3 and works.

@Jetblack31
Copy link
Author

Hello @clinique,

I installed the binding, and it works. I will now monitor the CPU usage and keep you posted.
Many thanks,

Jetblack

@Jetblack31
Copy link
Author

Hi @clinique,

I made some tests with the new binding.
The CPU usage is quite high even after a fresh reboot (about 100% for openhab / java read with top)
Each time I disable the IPX800 Thing, the CPU usage increases (by about 50%)
If I repeat disable/enable Thing, the CPU usage increases more, going close to 400% read with top (I have 4 CPU).

So the issue doesn't seem to be fixed unfortunately.

Below the logs when the Thing is disabled/enabled

2024-12-31 10:23:08.476 [DEBUG] [ternal.handler.Ipx800DeviceConnector] - Closing socket
2024-12-31 10:23:08.505 [DEBUG] [ng.gce.internal.action.Ipx800Actions] - bundle org.openhab.binding.gce:5.0.0.202412302357 (247)[org.openhab.binding.gce.internal.action.Ipx800Actions(2)] : ServiceFactory.ungetService()
2024-12-31 10:23:08.506 [DEBUG] [ng.gce.internal.action.Ipx800Actions] - bundle org.openhab.binding.gce:5.0.0.202412302357 (247)[org.openhab.binding.gce.internal.action.Ipx800Actions(2)] : DependencyManager: osgi.ds.satisfying.condition close component unbinding from org.apache.felix.scr.impl.manager.ComponentContextImpl@1fd2413 at tracking count 1 refpairs: [[RefPair: ref: [{org.osgi.service.condition.Condition}={service.id=6, service.bundleid=0, service.scope=singleton, service.pid=0.org.osgi.service.condition.ConditionImpl, osgi.condition.id=true}] service: [null]]]
2024-12-31 10:23:08.507 [DEBUG] [ng.gce.internal.action.Ipx800Actions] - bundle org.openhab.binding.gce:5.0.0.202412302357 (247)[org.openhab.binding.gce.internal.action.Ipx800Actions(2)] : Querying state active
2024-12-31 10:23:08.508 [DEBUG] [ng.gce.internal.action.Ipx800Actions] - bundle org.openhab.binding.gce:5.0.0.202412302357 (247)[org.openhab.binding.gce.internal.action.Ipx800Actions(2)] : Changed state from active to satisfied
2024-12-31 10:23:12.097 [DEBUG] [ng.gce.internal.action.Ipx800Actions] - bundle org.openhab.binding.gce:5.0.0.202412302357 (247)[org.openhab.binding.gce.internal.action.Ipx800Actions(2)] : ServiceFactory.getService()
2024-12-31 10:23:12.098 [DEBUG] [ng.gce.internal.action.Ipx800Actions] - bundle org.openhab.binding.gce:5.0.0.202412302357 (247)[org.openhab.binding.gce.internal.action.Ipx800Actions(2)] : This thread collected dependencies
2024-12-31 10:23:12.099 [DEBUG] [ng.gce.internal.action.Ipx800Actions] - bundle org.openhab.binding.gce:5.0.0.202412302357 (247)[org.openhab.binding.gce.internal.action.Ipx800Actions(2)] : getService (ServiceFactory) dependencies collected.
2024-12-31 10:23:12.136 [DEBUG] [ng.gce.internal.action.Ipx800Actions] - bundle org.openhab.binding.gce:5.0.0.202412302357 (247)[org.openhab.binding.gce.internal.action.Ipx800Actions(2)] : Querying state satisfied
2024-12-31 10:23:12.137 [DEBUG] [ng.gce.internal.action.Ipx800Actions] - bundle org.openhab.binding.gce:5.0.0.202412302357 (247)[org.openhab.binding.gce.internal.action.Ipx800Actions(2)] : For dependency osgi.ds.satisfying.condition, optional: false; to bind: [[RefPair: ref: [{org.osgi.service.condition.Condition}={service.id=6, service.bundleid=0, service.scope=singleton, service.pid=0.org.osgi.service.condition.ConditionImpl, osgi.condition.id=true}] service: [null]]]
2024-12-31 10:23:12.137 [DEBUG] [ng.gce.internal.action.Ipx800Actions] - IPX800 actions service instanciated
2024-12-31 10:23:12.138 [DEBUG] [ng.gce.internal.action.Ipx800Actions] - bundle org.openhab.binding.gce:5.0.0.202412302357 (247)[org.openhab.binding.gce.internal.action.Ipx800Actions(2)] : Changed state from satisfied to active
2024-12-31 10:23:12.224 [DEBUG] [gce.internal.handler.Ipx800v3Handler] - Initializing IPX800 handler for uid 'gce:ipx800v3:9bfa8aa821'
2024-12-31 10:23:14.231 [DEBUG] [gce.internal.handler.Ipx800v3Handler] - About to update port 'C1' with data '0.0'
2024-12-31 10:23:14.233 [DEBUG] [gce.internal.handler.Ipx800v3Handler] - About to update port 'C2' with data '0.0'
2024-12-31 10:23:14.233 [DEBUG] [gce.internal.handler.Ipx800v3Handler] - About to update port 'C3' with data '0.0'
2024-12-31 10:23:14.234 [DEBUG] [gce.internal.handler.Ipx800v3Handler] - About to update port 'C4' with data '0.0'
2024-12-31 10:23:14.234 [DEBUG] [gce.internal.handler.Ipx800v3Handler] - About to update port 'C5' with data '0.0'
2024-12-31 10:23:14.235 [DEBUG] [gce.internal.handler.Ipx800v3Handler] - About to update port 'C6' with data '0.0'
2024-12-31 10:23:14.236 [DEBUG] [gce.internal.handler.Ipx800v3Handler] - About to update port 'C7' with data '0.0'
2024-12-31 10:23:14.237 [DEBUG] [gce.internal.handler.Ipx800v3Handler] - About to update port 'C8' with data '0.0'
2024-12-31 10:23:14.238 [DEBUG] [gce.internal.handler.Ipx800v3Handler] - About to update port 'A1' with data '0.0'
2024-12-31 10:23:14.238 [DEBUG] [gce.internal.handler.Ipx800v3Handler] - About to update port 'A2' with data '0.0'
2024-12-31 10:23:14.239 [DEBUG] [gce.internal.handler.Ipx800v3Handler] - About to update port 'A3' with data '0.0'
2024-12-31 10:23:14.244 [DEBUG] [ternal.handler.Ipx800DeviceConnector] - Connecting to 192.168.2.61:9870...
2024-12-31 10:23:14.240 [DEBUG] [gce.internal.handler.Ipx800v3Handler] - About to update port 'A4' with data '0.0'

@clinique
Copy link
Contributor

Thanks for your feed-back. I have to get back to the drawing table to figure where this CPU drain comes from...

@Jetblack31
Copy link
Author

Jetblack31 commented Dec 31, 2024

An update.
The high CPU usage just after a fresh reboot vanishes after a few minutes. So I guess it is not related to the GCE binding.
Sorry for that.

But when I disable the IPX Thing, CPU usage increases by 100% and continue to increase with further disable/enable.

I tried to disable/enable other Things depending on other bindings (onewire, http...) : no issues with them.

Jetblack

@clinique
Copy link
Contributor

clinique commented Jan 1, 2025

@lsiepel
Copy link
Contributor

lsiepel commented Jan 1, 2025

Do you use the network binding?

@Jetblack31
Copy link
Author

Jetblack31 commented Jan 1, 2025

@clinique : Thank you for the new binding. I've tested it : same behaviour, CPU usage grows each time the Thing is disabled. (I restarted OH and cleaned the cache : no change).

@lsiepel : Indeed, I use the network binding for many devices, including the GCE IPX800. Do you think that there is a link with the issue ?

Below the logs when the thing is disabled/enabled.
It seems that some methods are not found when disabling the Thing.

Jetblack

EDIT : I tried to uninstall the network binding : no effect, same behavior

2025-01-01 09:48:38.759 [DEBUG] [ternal.handler.Ipx800DeviceConnector] - SocketException raised by streams while closing socket
2025-01-01 09:48:38.759 [DEBUG] [ternal.handler.Ipx800DeviceConnector] - Closing socket
2025-01-01 09:48:38.769 [DEBUG] [ng.gce.internal.action.Ipx800Actions] - bundle org.openhab.binding.gce:5.0.0.202501010052 (247)[org.openhab.binding.gce.internal.action.Ipx800Actions(2)] : ServiceFactory.ungetService()
2025-01-01 09:48:38.770 [DEBUG] [ng.gce.internal.action.Ipx800Actions] - bundle org.openhab.binding.gce:5.0.0.202501010052 (247)[org.openhab.binding.gce.internal.action.Ipx800Actions(2)] : getting deactivate: deactivate
2025-01-01 09:48:38.771 [DEBUG] [ng.gce.internal.action.Ipx800Actions] - bundle org.openhab.binding.gce:5.0.0.202501010052 (247)[org.openhab.binding.gce.internal.action.Ipx800Actions(2)] : Locating method deactivate in class org.openhab.binding.gce.internal.action.Ipx800Actions
2025-01-01 09:48:38.771 [DEBUG] [ng.gce.internal.action.Ipx800Actions] - bundle org.openhab.binding.gce:5.0.0.202501010052 (247)[org.openhab.binding.gce.internal.action.Ipx800Actions(2)] : Declared Method org.openhab.binding.gce.internal.action.Ipx800Actions.deactivate([interface org.osgi.service.component.ComponentContext]) not found
2025-01-01 09:48:38.772 [DEBUG] [ng.gce.internal.action.Ipx800Actions] - bundle org.openhab.binding.gce:5.0.0.202501010052 (247)[org.openhab.binding.gce.internal.action.Ipx800Actions(2)] : Locating method deactivate in class java.lang.Object
2025-01-01 09:48:38.772 [DEBUG] [ng.gce.internal.action.Ipx800Actions] - bundle org.openhab.binding.gce:5.0.0.202501010052 (247)[org.openhab.binding.gce.internal.action.Ipx800Actions(2)] : Declared Method java.lang.Object.deactivate([interface org.osgi.service.component.ComponentContext]) not found
2025-01-01 09:48:38.773 [DEBUG] [ng.gce.internal.action.Ipx800Actions] - bundle org.openhab.binding.gce:5.0.0.202501010052 (247)[org.openhab.binding.gce.internal.action.Ipx800Actions(2)] : deactivate method [deactivate] not found, ignoring
2025-01-01 09:48:38.773 [DEBUG] [ng.gce.internal.action.Ipx800Actions] - bundle org.openhab.binding.gce:5.0.0.202501010052 (247)[org.openhab.binding.gce.internal.action.Ipx800Actions(2)] : DependencyManager: osgi.ds.satisfying.condition close component unbinding from org.apache.felix.scr.impl.manager.ComponentContextImpl@1f05a2b at tracking count 1 refpairs: [[RefPair: ref: [{org.osgi.service.condition.Condition}={service.id=6, service.bundleid=0, service.scope=singleton, service.pid=0.org.osgi.service.condition.ConditionImpl, osgi.condition.id=true}] service: [null]]]
2025-01-01 09:48:38.774 [DEBUG] [ng.gce.internal.action.Ipx800Actions] - bundle org.openhab.binding.gce:5.0.0.202501010052 (247)[org.openhab.binding.gce.internal.action.Ipx800Actions(2)] : Querying state active
2025-01-01 09:48:38.775 [DEBUG] [ng.gce.internal.action.Ipx800Actions] - bundle org.openhab.binding.gce:5.0.0.202501010052 (247)[org.openhab.binding.gce.internal.action.Ipx800Actions(2)] : Changed state from active to satisfied
2025-01-01 09:48:42.741 [DEBUG] [ng.gce.internal.action.Ipx800Actions] - bundle org.openhab.binding.gce:5.0.0.202501010052 (247)[org.openhab.binding.gce.internal.action.Ipx800Actions(2)] : ServiceFactory.getService()
2025-01-01 09:48:42.742 [DEBUG] [ng.gce.internal.action.Ipx800Actions] - bundle org.openhab.binding.gce:5.0.0.202501010052 (247)[org.openhab.binding.gce.internal.action.Ipx800Actions(2)] : This thread collected dependencies
2025-01-01 09:48:42.742 [DEBUG] [ng.gce.internal.action.Ipx800Actions] - bundle org.openhab.binding.gce:5.0.0.202501010052 (247)[org.openhab.binding.gce.internal.action.Ipx800Actions(2)] : getService (ServiceFactory) dependencies collected.
2025-01-01 09:48:42.743 [DEBUG] [ng.gce.internal.action.Ipx800Actions] - bundle org.openhab.binding.gce:5.0.0.202501010052 (247)[org.openhab.binding.gce.internal.action.Ipx800Actions(2)] : Querying state satisfied
2025-01-01 09:48:42.743 [DEBUG] [ng.gce.internal.action.Ipx800Actions] - bundle org.openhab.binding.gce:5.0.0.202501010052 (247)[org.openhab.binding.gce.internal.action.Ipx800Actions(2)] : For dependency osgi.ds.satisfying.condition, optional: false; to bind: [[RefPair: ref: [{org.osgi.service.condition.Condition}={service.id=6, service.bundleid=0, service.scope=singleton, service.pid=0.org.osgi.service.condition.ConditionImpl, osgi.condition.id=true}] service: [null]]]
2025-01-01 09:48:42.744 [DEBUG] [ng.gce.internal.action.Ipx800Actions] - IPX800 actions service instanciated
2025-01-01 09:48:42.745 [DEBUG] [ng.gce.internal.action.Ipx800Actions] - bundle org.openhab.binding.gce:5.0.0.202501010052 (247)[org.openhab.binding.gce.internal.action.Ipx800Actions(2)] : Changed state from satisfied to active
2025-01-01 09:48:42.785 [DEBUG] [gce.internal.handler.Ipx800v3Handler] - Initializing IPX800 handler for uid 'gce:ipx800v3:9bfa8aa821'
2025-01-01 09:48:44.689 [DEBUG] [gce.internal.handler.Ipx800v3Handler] - About to update port 'C1' with data '0.0'
2025-01-01 09:48:44.692 [DEBUG] [gce.internal.handler.Ipx800v3Handler] - About to update port 'C2' with data '0.0'
2025-01-01 09:48:44.693 [DEBUG] [gce.internal.handler.Ipx800v3Handler] - About to update port 'C3' with data '0.0'
2025-01-01 09:48:44.694 [DEBUG] [gce.internal.handler.Ipx800v3Handler] - About to update port 'C4' with data '0.0'
2025-01-01 09:48:44.694 [DEBUG] [ternal.handler.Ipx800DeviceConnector] - Connecting to 192.168.2.61:9870...
2025-01-01 09:48:44.695 [DEBUG] [gce.internal.handler.Ipx800v3Handler] - About to update port 'C5' with data '0.0'
2025-01-01 09:48:44.697 [DEBUG] [gce.internal.handler.Ipx800v3Handler] - About to update port 'C6' with data '0.0'
2025-01-01 09:48:44.698 [DEBUG] [gce.internal.handler.Ipx800v3Handler] - About to update port 'C7' with data '0.0'
2025-01-01 09:48:44.700 [DEBUG] [gce.internal.handler.Ipx800v3Handler] - About to update port 'C8' with data '0.0'
2025-01-01 09:48:44.701 [DEBUG] [gce.internal.handler.Ipx800v3Handler] - About to update port 'A1' with data '0.0'

@lsiepel
Copy link
Contributor

lsiepel commented Jan 1, 2025

I wonder if you experience the same cpu load / performance issue when the network binding is uninstalled and you restart openHAB.

@Jetblack31
Copy link
Author

Hi @lsiepel

I uninstalled the network binding and restarted OH.
Then I disabled the GCE IPX Thing and CPU usage increased immediately by 100%.
So disabling Network bonding does not solve my issue :(

Jetblack

@clinique
Copy link
Contributor

clinique commented Jan 1, 2025

@Jetblack31 : I don't have the solution right now but I found a way to reproduce the problem. Stay tuned and happy new year.

@clinique
Copy link
Contributor

clinique commented Jan 4, 2025

@Jetblack31 : I think I'm done. Can you test this please : https://drive.google.com/file/d/1ulXpIKixaL0bLILgFVeJaOu3qqnexFIF/view?usp=drive_link

@Jetblack31
Copy link
Author

@clinique

It works perfectly ! CPU usage does not increase anymore when disabling a Thing.
Thank you so much !

Do you think it also solves the issue of loss of connection ?

Jetblack

@clinique
Copy link
Contributor

clinique commented Jan 6, 2025

I did not try to address anything on this side in this PR but maybe as a side effects of the big revamp I did.
Let's observe how it behaves over time.

@clinique
Copy link
Contributor

clinique commented Jan 9, 2025

@Jetblack31 : would you already have some feed-backs ?

@Jetblack31
Copy link
Author

@clinique

So far, so good ! CPU usage does not increase anymore and I had no connection loss.

I seems it's fixed. Thank you so much.

Jetblack

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An unexpected problem or unintended behavior of an add-on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants