-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suricata service can be stuck for hours if suricata didn't start #217
Comments
Can you confirm if this is still happening on more recent versions of the Suricata service? |
I've just disabled my workaround (script looking for stuck Suricata services every 30 minutes, it was impossible to work without it) - I'll come back in a few days with results. |
I'm confirming this is still an issue - with Suricata 4.5.0.10 and AL v26 |
Want to try testing with the latest stable? The service was updated to use Suricata 8.0-dev and a flag was added to the command to run as daemon which I'm hoping should resolve the issue mentioned (unsure if that flag was available in previous versions). |
So... I'm not really able to test it, as on my server the newest Suricata (service v16 or even v15) doesn't even start:
What is caused by:
Indicating incompatible binary instruction... Strange. To be honest, even when I believe that looking for the root cause is important, my suggestion is to rather prepare for cases like surprisingly dead suricata process. When it's not the main process in the docker container, it could be killed for any reason (even manually - good for testing), and the container will live further. I'd add a check in the error handling to see if the process is still running or exited. |
@kam193 Would you be able to evaluate the latest stable release of Suricata after the patch that @jasper-vdhoven made? I believe this should resolve the longstanding issue we've had with Suricata. |
Hey, I re-enabled Suricata (previously, I had to disable it as it was almost always producing just errors) and let it work for a while. Unfortunately, it doesn't look like fixed at all :( I see the log about removing stale PID, but tasks are again not processed. AL keeps pre-empting tasks with an error:
|
Is there anything you can tell me about your deployment of Suricata to see if I can re-create the same conditions that's causing it to fail?
|
Sure, let's see: ad. 1: ad. 2: ad. 3: I don't have currently any pcap file that I've noticed to be processed at all 😅 According to my observation, it doesn't matter - files are preempted because the Suricata wasn't working and never started doing anything. I don't have the newest analysis, my previous understanding was, that starting the Suricata process was not always successful and it was dying. I didn't see any clear reason why - no logs in Suricata, no oom-kills etc., and the service wasn't checking if the process still exists, just trying to connect in a loop. For some time, I was trying to overcome it by regularly checking the logs and eventually killing the container, but some time ago something changed and it basically stopped processing anything. Finally, I gave up and disabled the service. |
Describe the bug
Recently, I've recognized a few times that Suricata service stopped processing files. After analysing what's going on, I've found that Suricata is not running, and the service seems to have no timeout for the initialization. I'm not sure why Suricata isn't running, I haven't found any tracks in logs, neither from the service nor the Suricata inside the container. Restarting container is enough to repair it.
The behaviour of the service looks a little unhealthy as it's trying to reach Suricata in an infinite loop, throwing
RecoverableError
what prevents scaler from stepping in and recreating the service.To Reproduce
Steps to reproduce the behavior:
Unfortunately, I don't know an easy way. I assume killing the Suricata process or modifying the service not to start Suricata at all could reproduce the behaviour.
Expected behavior
I would like the service to have a timeout / limit for Suricata to start, and just give up throwing a normal error. Using
RecoverableError
for the time of the initialization is perfectly fine, but should be stopped after some time.Screenshots
If applicable, add screenshots to help explain your problem.
Environment (please complete the following information if pertinent):
Additional context
The service container has logs like below. Note the time - between the start and last sample are two hours of throwing just
RecoverableError
.The
/var/log/suricata/suricata.log
is finished at 14:30 with a fail of loading some signatures, what isn't a reason to stop Suricata, and looks the same as when everything works.The text was updated successfully, but these errors were encountered: