-
-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unregister BLE notifications to improve stability #659
Comments
Is this a stack trace of an "out of memory (OOM) exception"? |
no syssi it's just a crash trace, that refers to out of memory as the reason for the crash. This may or may not have anything to do with your component itself (I'm zero expert on stacktrace interpretation, but is probably to do with the esphome ble implementation which is very very memory intensive. One thing I do know for sure.. with 3 ble clients on my system, if I turn off the jk_bms ble client system is stable. It could just be a straw / camels back thing. |
@syssi if I throttle to 60s these crashes / memory leaks slow down. Is there any method we can call to adjust throttle in lambda? also, these frequent crashes have only started in the last two weeks. They were occasional previously, now minute by minute. |
Please enable the |
Hi @syssi Syssi, Here's the memory graphs. I'm on a 60 second throttle at the momentb and its been stable for 12 hours. Regarding 'this code path requires too much dynamic memory', do you mean in your codebade or esphome's? Is there any method to dynamically change the throttle in lambda? |
The underlying BLE stack (ESP-IDF) consumes the memory. I assume it cannot be improved here. May be it can be adressed at the ESPHome core but I don't think so. |
Syssi I don't know if this is all gobbledigook but some AI output about your ble connection handling... The provided code potentially introduces a memory leak during BLE reconnection due to the improper handling of resources when connections are dropped and re-established. To address this issue, consider the following changes: 1. Proper Resource Cleanup on DisconnectionWhen a BLE device disconnects, ensure that all allocated resources related to the BLE connection, such as registered notifications and dynamic memory, are properly released. For instance:
Update the case ESP_GATTC_DISCONNECT_EVT: {
this->node_state = espbt::ClientState::IDLE;
this->status_notification_received_ = false;
// Cleanup notification registration if necessary
esp_ble_gattc_unregister_for_notify(this->parent()->get_gattc_if(), this->parent()->get_remote_bda(), this->notify_handle_);
// Clear the frame buffer to free memory
this->frame_buffer_.clear();
break;
} 2. Avoid Repeated Notification RegistrationsRepeated calls to
3. Validate Buffers and Free Unused MemoryIn the if (this->frame_buffer_.size() > MAX_RESPONSE_SIZE) {
ESP_LOGW(TAG, "Frame dropped because of invalid length");
this->frame_buffer_.clear();
} Additionally, ensure no dangling references to the buffer persist across connections. 4. Debugging and LoggingAdd detailed logging to track BLE connection states, resource usage, and cleanup operations. This can help identify memory leaks during runtime. 5. Use Smart Pointers or Scoped AllocationsIf the code uses dynamically allocated resources (e.g., Example Updates to the CodeBelow is a streamlined example: case ESP_GATTC_DISCONNECT_EVT: {
ESP_LOGI(TAG, "Device disconnected");
this->node_state = espbt::ClientState::IDLE;
this->status_notification_received_ = false;
// Cleanup notifications
auto status = esp_ble_gattc_unregister_for_notify(this->parent()->get_gattc_if(),
this->parent()->get_remote_bda(),
this->notify_handle_);
if (status != ESP_OK) {
ESP_LOGW(TAG, "Failed to unregister notifications, status=%d", status);
}
// Clear buffers
this->frame_buffer_.clear();
break;
} By following these steps, you can mitigate the risk of memory leaks caused by improper cleanup during BLE reconnections. Ensure that all cleanup operations are robust and thoroughly tested under various scenarios, such as rapid connect/disconnect cycles. |
Good catch! I wasn't aware of the unregister method for BLE notifications. I will give it a try later the week. The other advices aren't important IMO. |
thanks mate... ps.. there's an esp-idf nimble implemntation that I've seen others use at https://github.com/h2zero/esp-nimble-cpp/tree/master - but I don't even have the foggiest idea where to start to integrate it. one dude has a go - vgijssel/setup#127 |
pps - throttle can be set dynamically in lambda by using id(bms0).set_throttle(number of milliseconds) |
@syssi - here's some other comments, that, again, will need to be filtered. As an aside, since I've implemented variable timing using the set_throttle method, the memory leak seems to have largely become a non-issue. I have times in my implementation where I'm using the cell voltage delta with a PID to let the low cells 'catch up' by keeping the high cell at 3.41v. When I'm doing that, I need a refresh of about 2s to feed the PID, so I'm now using set_throttle to up the refresh rate to 2s at those times and leaving it at 60s the rest of the time. It's heaps more stable. In any case, here's the AI thoughts... I asked it whether it thought your ble code could be the cause of the following errors...
Debugging Tips
ConclusionThis code is functional but has some gaps in error handling, resource management, and logging. These issues might contribute to the connection timeouts you're experiencing. Implementing the suggestions above should help stabilize the BLE connection and make troubleshooting easier. |
Just for fun: Ask the AI whether it is important to unregister the notifications and whether leaks really occur here? IMO the Bluetooth stack takes care on disconnect in the last resort to avoid memory leaks here. Nevertheless better safe than sorry: #666 |
Yes, unregistering notifications is important for maintaining proper BLE resource management and avoiding potential issues such as resource leaks or unexpected behavior. While the ESP-IDF BLE stack handles some cleanup when connections are closed, it's good practice to explicitly unregister notifications when they are no longer needed or when the connection is about to be terminated. Why Unregistering Notifications is Important
How to Unregister NotificationsThe ESP-IDF provides the Example Code:Here’s how you can modify the case ESP_GATTC_DISCONNECT_EVT: {
ESP_LOGW(TAG, "Disconnected from device. Cleaning up resources.");
// Unregister for notifications
auto status = esp_ble_gattc_unregister_for_notify(this->parent()->get_gattc_if(),
this->parent()->get_remote_bda(),
this->notify_handle_);
if (status) {
ESP_LOGW(TAG, "Failed to unregister notifications, status=%d", status);
} else {
ESP_LOGI(TAG, "Successfully unregistered notifications.");
}
// Clean up internal state
this->node_state = espbt::ClientState::IDLE;
this->status_notification_received_ = false;
break;
} Are Leaks Likely Without Explicit Unregistration?While the BLE stack should clean up resources on disconnection, there’s still a risk of memory leaks or orphaned states if notifications are not explicitly unregistered. The risk depends on:
Best Practice Summary
|
@syssi, also, the jk bms ble disconnects are down to exactly one per hour, on the hour, which I assume is something that the bms itself does. It was previously more regular. |
I've merged the change into |
Well it has, @syssi - 24 hours now running with no throttle, zero crashes. |
@syssi now 48hrs... no throttle, zero crashes. |
May be we should implement a "reconnect counter" for easier performance measurements. |
Previously this was not a workable solution for me, it would work for a few hours (if lucky) and then crash and needed to restart and repeat... Now it works way better, only some occasional resets. But those are probably normal. |
Great!!! Yes on mine the connection resets every hour, exactly 60 minutes - I think it must be something the BMS does. I agree regarding the rest though - super stable now. You don't even need the throttle anymore. I'm glad that my little finding has been helpful, as @syssi has given me so so so much help over the last year or more. |
@EasilyBoredEngineer Could you add something like this to your YAML to get an better idea about the number of disconnects per day:
A chart showing the value over a few days would be awesome! |
@syssi, I can do, but it will be a few days as I'm currently tuning a PID - pushing shit uphill for a deployment to India - which means lots of deliberate reboots. What I can say, though, is the graphs would simply show one thing - Now, The JK disconnects on the hour, every hour. Nothing in my Yaml accounts for that. Prior to this change the reboots were random, often many many times per hour. It's rock solid now. One an hour. Almost unnoticeable as with 320ms/320ms ble tracker all that happens is you hear the beep, then it's back. |
So I see a reset after precisely 5hours. If there is no throttle it will go to 3600 in an hour. But this is normal behaviour I would say. I will update to previous version(commit) with the counter to see the difference. |
Please keep in mind: The throttle mechanism just discards (doesn't process) received frames. The JK-BMS sends/pushs two cell info frames per second (no polling required!). If you aren't interested in such a high update interval the I don't expect a connection between the |
My BLE connection is stable yet. No disconnect for 5 hours. |
my bms is JK_BD6A17S6P @syssi that information about the throttle mechanism is interesting. I think the relationship between throttle time and stability was definitely beyond doubt in the previous incarnation of this code. I wonder what the mechanism for that was? Keep in mind I also have two other ble-clients on this device, all updating every two seconds (they are other esps) and they don't disconnect at all - so occams would say it's likely a jk thing rather than a core esphome thing. |
In my case it's a:
|
No BLE reconnects for 25 hours. |
@Kappa71 Could you change |
I understand your calculation now! A reconnect occurs every 3600 processed notifications. If I would try to prove this. :-) |
Please keep in mind: If no throttle is used at the YAML the default (
|
Hmmm... but no disconnects on my site using |
Maybe then it's the PB2A16S20P (inverter BMS) which triggers the disconnection every 5hours. |
Hi kappa - alas not in my case as I’m just using the bms - however I can report I’m up with no crashes for 5 days after syssi changes - great work Seb. 👍👍 |
Now running this via BLE on a c3 talking to two Victron sccs and running a WireGuard client. Not even raising a sweat. So impressed. |
G'day Syssi,
I've been having a few issues with stability when using your jk ble code.
I just thought I'd pass on the stacktrace, and an AI analysis of it :)
The text was updated successfully, but these errors were encountered: