Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve SONiC disk checker to handle disk full case and mount overlay fs to allow remote user login. #3700

Open
wants to merge 20 commits into
base: master
Choose a base branch
from

Conversation

liuh-80
Copy link
Contributor

@liuh-80 liuh-80 commented Dec 27, 2024

Improve SONiC disk checker to handle disk full case and mount overlay fs to allow remote user login.

This PR depends on DB schema change: sonic-net/sonic-buildimage#21351

What I did

Currently disk checker only handle RO disk case, but when disk no free space, remote user also can't login.

How I did it

Check disk free space and mount overlay fs to allow TACACS login.

How to verify it

Create big file on device, make device no free space. then login with remote user should success.
Work item tracking
  • Microsoft ADO: 30608100

Previous command output (if the output of a command-line utility has changed)

New command output (if the output of a command-line utility has changed)

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@patch('os.access', return_value=True)
@patch('os.statvfs', return_value=os.statvfs_result((4096, 4096, 1909350, 1491513, 4096,
971520, 883302, 883302, 4096, 255)))
def test_unmount_disk_full(self, mock_os_statvfs, mock_os_access, mock_rmtree, mock_proc, mock_log):
Copy link
Contributor

@qiluo-msft qiluo-msft Jan 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_unmount_disk_full

As discussed in internal meeting, disk-full is special compared with disk-readonly is that it can recover to normal (delete large files). Does this testcase cover the recovery? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this test will cover recovery case.
I also will add a sonic-mgmt test to cover the disk-full scenario.

@qiluo-msft
Copy link
Contributor

There have been cases, where disk turns Read-only due to kernel bug.

The file level comment (seems a design doc) does not complete anymore, please update.


Refers to: scripts/disk_check.py:6 in 52a715e. [](commit_id = 52a715e, deletion_comment = False)

@liuh-80 liuh-80 removed the request for review from maipbui January 20, 2025 07:25
VladimirKuk pushed a commit to Marvell-switching/sonic-buildimage that referenced this pull request Jan 21, 2025
Add disk full event to sonic-events-host yang model.

Why I did it
This PR need publish disk full event:
sonic-net/sonic-utilities#3700

Work item tracking
Microsoft ADO: 30608100
How I did it
Add disk full event to yang model.

How to verify it
Pass all UT.

Description for the changelog
Add disk full event to sonic-events-host yang model.
for d in dirs:
d_name = get_dname(d)

ret = run_cmd(["umount", "-l", "{}_{}".format(overlay_prefix, d_name)])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth noting that if there's some process in /etc or /home (including any user's shell being in their home directory), then this will most likely fail. I guess since disk_check.py gets executed frequently, it'll eventually work, at least.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not fail because unmount with -l, lazy mode will wait and unmount in background.

@liuh-80 liuh-80 removed the request for review from renukamanavalan January 24, 2025 00:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants