Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EVerest crash issue observed when using v0.19.1 framework #223

Open
dev-enphase opened this issue Dec 21, 2024 · 7 comments
Open

EVerest crash issue observed when using v0.19.1 framework #223

dev-enphase opened this issue Dec 21, 2024 · 7 comments

Comments

@dev-enphase
Copy link

We are currently using the latest commit of Everest Core (EVerest/everest-core@6e669fd) in our repository, and we are experiencing a crash. After debugging the issue, we found that the crash occurs when using Everest-Framework v0.19.1 with the latest Everest Core commit (EVerest/everest-core@6e669fd). However, when we use Everest-Framework v0.18.1, everything works as expected, and no crash is observed.

I suspect that there might have been a change in the framework that is causing the crash immediately after Everest starts. One observation we made is that the crash happens only when the OCPP module is loaded. If the OCPP module is not loaded, the system runs fine without any issues.
We are using security level 3 for OCPP server connection. This whole OCPP server connection is working fine with October EVerest core release.

We are compiling and creating the Everest binaries on Ubuntu 22.04.03 with the stable release up to 2024.10.0. The crash issue appears to have been introduced with the following commit:

Commit ID: 50d8bacaab06dea0b5a58f5efd212ff1d25de239
Commit Message: "Bump Everest-Framework to 0.19.1 (#987)"

There are several changes between Everest-Framework v0.18.1 and v0.19.1 that we have been unable to debug to identify the root cause. Could someone from your team assist with investigating this crash issue?

Please let me know if you need any further information from my side.

@dev-enphase
Copy link
Author

@hikinggrass
Copy link
Contributor

Can you capture a core dump of the crash and post the backtrace you will get?
Is the crash reliably reproducible or appears to happen randomly?

As a workaround you can downgrade everest-framework to v0.18.1, there are no API changes between those versions that necessitate changes in everest-core.

@dev-enphase
Copy link
Author

This is reproducible 100% of the time, we also checked with docker image using ubuntu 22.04 version and also ran Everest standalone using our config file. We are getting same result.
We are using third party repo in which we have everest-core as a dependency.

In our setup when we are running EVerest with config file having OCPP enabled with security profile 3 then we are getting this crash, same issue with security profile 2.

I will try to capture core dump and update here.

@dev-enphase
Copy link
Author

@hikinggrass This issue is not related to OCPP or any other module. This is related to system resources are getting exhausated when we are using new everest framework. With everest framework v0.18.1, number of threads creation are less as compared to v0.19.1

Below is the analysis,
##########################################################################
With EVerest framework V0.19.1 and EVerest Commit 6e669fd867afaaca3ab45deaefea1f2a13998998
Total threads -> 1763
Memory consumptions
MemTotal: 1924648 kB
MemFree: 598392 kB
Buffers: 9412 kB
Cached: 911500 kB
SwapCached: 0 kB

Process number of threads
PID COMMAND NLWP
490424 ocpp:OCPP 152
490369 connector_1:Evs 129
490408 occp_custom_dat 126
490475 power_manager:P 123
490516 ui_event_manage 123
490395 host_interface_ 67
490367 auth:Auth 66
490431 ota_update_mana 62
490372 enphase_bsp:Enp 54

Last Log
2024-12-23 06:35:31.372848 [INFO] connector_1:Evs :: Cleaning up any other transaction on start up
2024-12-23 06:35:31.377006 [INFO] enphase_bsp:Enp :: Publishing stop transaction event over MQTT, data : {"key":"TransactionStatus","value":{"Started":false}}
2024-12-23 06:35:31.499042 [INFO] ocpp:OCPP :: All EVSE ready. Starting OCPP1.6 service
2024-12-23 06:35:31.593364 [INFO] ocpp:OCPP :: OCPP not fully initialized, but received a session event on evse_id: 1 that will be queued up: Enabled
2024-12-23 06:35:31.596535 [INFO] ocpp:OCPP :: Starting connection attempts to uri: ws://127.0.0.1:8180/steve/websocket/CentralSystemService/cp001 with security-profile 1
2024-12-23 06:35:31.597484 [INFO] ocpp:OCPP :: Init client loop with ID: ffff5227f180
2024-12-23 06:35:31.615091 [INFO] ocpp:OCPP :: LWS connect with info port: [8180] address: [127.0.0.1] path: [/steve/websocket/CentralSystemService/cp001] protocol: [ocpp1.6] security profile: [1]
2024-12-23 06:35:31.616608 [ERRO] ocpp:OCPP int ocpp::WebsocketLibwebsockets::process_callback(void*, int, void*, void*, size_t) :: CLIENT_CONNECTION_ERROR: [conn fail: 111]. Attempting reconnect
2024-12-23 06:35:31.617164 [ERRO] ocpp:OCPP void ocpp::WebsocketLibwebsockets::on_conn_fail(ocpp::ConnectionData*) :: OCPP client connection to server failed
2024-12-23 06:35:31.618026 [INFO] ocpp:OCPP :: Connection not successful, attempting internal reconnect in: 2000ms
2024-12-23 06:35:31.746245 [INFO] ocpp:OCPP :: Security Event in OCPP occured: StartupOfTheDevice
2024-12-23 06:35:31.746887 [INFO] ocpp:OCPP :: OCPP initialized
2024-12-23 06:35:31.747448 [INFO] ocpp:OCPP :: Processing queued event for evse_id: 1, event: Enabled
2024-12-23 06:35:31.795314 [INFO] connector_1:Evs :: Received external ready to start charging command.
terminate called after throwing an instance of 'std::system_error'
what(): Resource temporarily unavailable
2024-12-23 06:35:31.832137 [CRIT] manager int boot(const boost::program_options::variables_map&) :: Module connector_1 (pid: 490369) exited with status: 134. Terminating all modules.

#############################################################################
With EVerest framework V0.18.1 and EVerest Commit 6e669fd867afaaca3ab45deaefea1f2a13998998
Total threads -> 877
Memory consumptions
MemTotal: 1924648 kB
MemFree: 621264 kB
Buffers: 9432 kB
Cached: 913760 kB
SwapCached: 0 kB

Process number of threads
PID COMMAND NLWP
494783 ocpp:OCPP 120
494764 connector_1:Evs 91
494775 occp_custom_dat 86
494788 power_manager:P 83

494791 ui_event_manage   83
494773 host_interface_   27
494762 auth:Auth         26
494733 manager           25
494785 ota_update_mana   22

With same code base and config file We are getting extra number of threads created with v0.19.1

With v0.18.1
494775 occp_custom_dat 86
494788 power_manager:P 83

With v0.19.1
490408 occp_custom_dat 126
490475 power_manager:P 123

Due to this increase in thread in v0.19.1 everest framework, everest is crashing. We tried increasing the thread max param, stack size and pid max but none of the stuff resolving the issue.

@hikinggrass why the number of thread increased with v0.19.1 with same code base and config file?

we need v0.19.1 as everest start is quite fast with this version. @Pietfried

Please let me know if you need more data on this

@hikinggrass
Copy link
Contributor

Ah interesting, thanks for the detailed reply this is really helpful!
The number of threads probably increases temporarily at startup because of the way the new config distribution re-uses some parts of the existing message processing. I will have a closer look at this, can you in the meantime try if this branch improves things already? #224
It doesn't address the high usage of threads yet, but has other improvements to the startup procedure.

@dev-enphase
Copy link
Author

dev-enphase commented Dec 23, 2024

@hikinggrass This is confirmed that this EVerest crash issue is due to the large number of threads spawns at start of EVerest. To resolve this crash issue we have to fine tune system thread max value by updating "/etc/sysctl.conf" with value
kernel.threads-max = 100000

This value change require system restart.

With the shared pull request and v0.19.1, Everest works without any crash.

@hikinggrass
Copy link
Contributor

hikinggrass commented Jan 2, 2025

@dev-enphase can you re-test this on your system with default max threads and the latest commit (7c539ac ) from #224 ?

There was a bug where MessageHandlers for topics were being kept around even though there were no actual handlers registered anymore, resulting in a substantial amount of idle threads.

For comparison here's the sil-ocpp config without the fix:

COMMAND         THCNT
manager            51
controller          2
yeti_driver_2:J    70
yeti_driver_1:J    70
token_provider_    60
slac:SlacSimula    62
ocpp:OCPP         308
iso15118_charge    64
persistent_stor    60
python3            60
evse_security:E    61
ev_manager_2:Ev    73
error_history:E   282
auth:Auth          92
evse_manager_2:   148
system:System      63
api:API            99
evse_manager_2_    63
grid_connection    65
energy_manager:    62
ev_manager_1:Ev    73
display_message    61
evse_manager_1:   149
evse_manager_1_    65
evse_manager_2_    65
evse_manager_1_    63

And here after the fix:

COMMAND         THCNT
manager            51
controller          2
yeti_driver_2:J    15
yeti_driver_1:J    15
token_provider_     5
slac:SlacSimula     7
ocpp:OCPP         246
iso15118_charge     8
persistent_stor     5
python3             5
evse_security:E     6
ev_manager_2:Ev    16
error_history:E   227
auth:Auth          35
evse_manager_2:    91
system:System       8
api:API            43
evse_manager_2_     7
grid_connection     8
energy_manager:     6
ev_manager_1:Ev    16
display_message     6
evse_manager_1:    91
evse_manager_1_     9
evse_manager_2_     9
evse_manager_1_     7

Some of the modules still have a suspiciously high threadcount even after the fix that still has to be investigated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants