Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Need make sure monit init ready before container stop test #16489

Open
weiguo-nvidia opened this issue Jan 13, 2025 · 0 comments
Open

Comments

@weiguo-nvidia
Copy link
Contributor

weiguo-nvidia commented Jan 13, 2025

Issue Description

Test case container_checker/test_container_checker.py::test_container_checker fail because the log containers not running.*syncd.* was recorded 10 mins after stop syncd

Results you see

Root cause

Base on the logic of test_container_checker, the script will check if the tested container is running before the tested container stop. If the tested container is not running, it will execute config reload to recover the tested container

When swss stop, the syncd also be stopped. So when run the syncd stop test, config reload will be executed firstly to recovery syncd. After config reload, the monit needs takes sometimes to init, which lead to generate message delay

Why the issue does not occur on other containers

In others container stop test, the tested container in running status before the test, no need to do config load, monit no need to re-init, so the message can record in syslog immediately

Next action

After config reload, need use command sudo monit summary make sure monit init ready before do other operations

admin@r-panther-03:~$ sudo monit summary
Monit 5.20.0 uptime: 8h 26m
 Service Name                     Status                      Type          
 r-panther-03                     Running                     System        
 rsyslog                          Running                     Process       
 root-overlay                     Accessible                  Filesystem    
 var-log                          Accessible                  Filesystem    
 routeCheck                       Status ok                   Program       
 dualtorNeighborCheck             Status ok                   Program       
 diskCheck                        Status ok                   Program       
 container_checker                Status ok                   Program       
 vnetRouteCheck                   Status ok                   Program       
 memory_check                     Status ok                   Program       
 arp_update_checker               Status ok                   Program       
 controlPlaneDropCheck            Status ok                   Program       
 container_memory_snmp            Status ok                   Program       
 container_memory_gnmi            Status ok                   Program       
 container_eventd                 Status ok                   Program       
 container_memory_bmp             Status ok                   Program       

Results you expected to see

Case always pass

Is it platform specific

generic

Relevant log output

No response

Output of show version

No response

Attach files (if any)

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant