Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-16209 control: Add MD-on-SSD resp flag for display mode #15695

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

tanabarr
Copy link
Contributor

@tanabarr tanabarr commented Jan 7, 2025

Rather than mutating mem_file_bytes to indicate PMem/MD-on-SSD mode in
pool query and create, use an explicit flag in the response instead.
This flag is then used to trigger a display style in the presentation layer.

Before requesting gatekeeper:

  • Two review approvals and any prior change requests have been resolved.
  • Testing is complete and all tests passed or there is a reason documented in the PR why it should be force landed and forced-landing tag is set.
  • Features: (or Test-tag*) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.
  • Commit messages follows the guidelines outlined here.
  • Any tests skipped by the ticket being addressed have been run and passed in the PR.

Gatekeeper:

  • You are the appropriate gatekeeper to be landing the patch.
  • The PR has 2 reviews by people familiar with the code, including appropriate owners.
  • Githooks were used. If not, request that user install them and check copyright dates.
  • Checkpatch issues are resolved. Pay particular attention to ones that will show up on future PRs.
  • All builds have passed. Check non-required builds for any new compiler warnings.
  • Sufficient testing is done. Check feature pragmas and test tags and that tests skipped for the ticket are run and now pass with the changes.
  • If applicable, the PR has addressed any potential version compatibility issues.
  • Check the target branch. If it is master branch, should the PR go to a feature branch? If it is a release branch, does it have merge approval in the JIRA ticket.
  • Extra checks if forced landing is requested
    • Review comments are sufficiently resolved, particularly by prior reviewers that requested changes.
    • No new NLT or valgrind warnings. Check the classic view.
    • Quick-build or Quick-functional is not used.
  • Fix the commit message upon landing. Check the standard here. Edit it to create a single commit. If necessary, ask submitter for a new summary.

@tanabarr tanabarr self-assigned this Jan 7, 2025
Copy link

github-actions bot commented Jan 7, 2025

Ticket title is 'Return VOS file capacity in addition to meta blob size on pool query'
Status is 'In Progress'
Labels: 'md_on_ssd2'
https://daosio.atlassian.net/browse/DAOS-16209

@daosbuild1
Copy link
Collaborator

Test stage Build on Leap 15.5 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15695/1/execution/node/360/log

@daosbuild1
Copy link
Collaborator

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15695/1/execution/node/261/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15695/1/execution/node/336/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15695/1/execution/node/306/log

@daosbuild1
Copy link
Collaborator

Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15695/1/execution/node/322/log

@tanabarr tanabarr force-pushed the tanabarr/control-memfilebytes-mode-mdonssd branch from 95f188d to 5c94599 Compare January 7, 2025 21:04
@tanabarr tanabarr changed the title DAOS-16209 control: Add MD-on-SSD response flag to trigger display sw… DAOS-16209 control: Add MD-on-SSD resp flag for display mode Jan 7, 2025
@tanabarr tanabarr marked this pull request as ready for review January 7, 2025 21:21
@tanabarr tanabarr requested review from a team as code owners January 7, 2025 21:21
@daosbuild1
Copy link
Collaborator

Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15695/2/execution/node/370/log

@daosbuild1
Copy link
Collaborator

Test stage Unit Test on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15695/2/testReport/

@daosbuild1
Copy link
Collaborator

Test stage Unit Test with memcheck on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15695/2/testReport/

@tanabarr tanabarr requested review from mjmac, kjacque and knard38 January 8, 2025 11:21
@tanabarr tanabarr added control-plane work on the management infrastructure of the DAOS Control Plane meta-on-ssd Metadata on SSD Feature labels Jan 8, 2025
@tanabarr tanabarr requested a review from NiuYawei January 8, 2025 11:21
@tanabarr
Copy link
Contributor Author

tanabarr commented Jan 8, 2025

PMem mode output with PR applied:

[tanabarr@wolf-311 daos]$ install-rocky/bin/dmg system query -v -i
Rank UUID                                 Control Address Fault Domain                  State  Reason
---- ----                                 --------------- ------------                  -----  ------
0    39d3d06f-dc11-45bd-8d7e-c09bc2f8dbcf 10.8.3.99:10001 /wolf-311.wolf.hpdd.intel.com Joined
1    77e52fbb-56c2-4b62-a142-eb40d05b594a 10.8.3.99:10001 /wolf-311.wolf.hpdd.intel.com Joined

[tanabarr@wolf-311 daos]$ install-rocky/bin/dmg -i pool create bob -z 50% --mem-ratio 50%
Creating DAOS pool with 50% of all storage
ERROR: dmg: pool create failed: server: code = 620 description = "pool create request contains MD-on-SSD parameters but MD-on-SSD has not been enabled"
ERROR: dmg: server: code = 620 resolution = "either remove MD-on-SSD-specific options from the command request or set bdev_roles in server config file to enable MD-on-SSD"
[tanabarr@wolf-311 daos]$ install-rocky/bin/dmg -i pool create bob -z 50%
Creating DAOS pool with 50% of all storage
Pool created with 38.24%,61.76% storage tier ratio
--------------------------------------------------
  UUID                 : 124b6556-eddb-4a80-9bd8-5c73c3c218cb
  Service Leader       : 0
  Service Ranks        : [0-1]
  Storage Ranks        : [0-1]
  Total Size           : 2.6 TB
  Storage tier 0 (SCM) : 989 GB (494 GB / rank)
  Storage tier 1 (NVMe): 1.6 TB (799 GB / rank)

[tanabarr@wolf-311 daos]$ install-rocky/bin/dmg -i pool query bob -e
Pool 124b6556-eddb-4a80-9bd8-5c73c3c218cb, ntarget=16, disabled=0, leader=0, version=1, state=Ready
Pool health info:
- Enabled ranks: 0-1
- Rebuild idle, 0 objs, 0 recs
Pool space info:
- Target count:16
- Storage tier 0 (SCM):
  Total size: 989 GB
  Free: 939 GB, min:59 GB, max:59 GB, mean:59 GB
- Storage tier 1 (NVME):
  Total size: 1.6 TB
  Free: 1.6 TB, min:100 GB, max:100 GB, mean:100 GB
[tanabarr@wolf-311 daos]$ install-rocky/bin/dmg -i storage query usage
Hosts     SCM-Total SCM-Free SCM-Used NVMe-Total NVMe-Free NVMe-Used
-----     --------- -------- -------- ---------- --------- ---------
localhost 2.1 TB    989 GB   52 %     3.2 TB     1.6 TB    50 %

MD-on-SSD mode output with PR applied:

[tanabarr@wolf-310 daos]$ install-rocky/bin/dmg -i pool create bob -z 50% --mem-ratio 50%
Creating DAOS pool with 50% of all storage
Pool created with 8.65%,91.35% storage tier ratio
-------------------------------------------------
  UUID             : f3931322-14f8-4c47-9b1e-204d3b2f6ac5
  Service Leader   : 0
  Service Ranks    : [0-1]
  Storage Ranks    : [0-1]
  Total Size       : 1.5 TB
  Metadata Storage : 129 GB (64 GB / rank)
  Data Storage     : 1.4 TB (681 GB / rank)
  Memory File Size : 64 GB (32 GB / rank)

[tanabarr@wolf-310 daos]$ install-rocky/bin/dmg -i pool query bob -e
Pool f3931322-14f8-4c47-9b1e-204d3b2f6ac5, ntarget=32, disabled=0, leader=0, version=1, state=Ready
Pool health info:
- Enabled ranks: 0-1
- Rebuild idle, 0 objs, 0 recs
Pool space info:
- Target count:32
- Total memory-file size: 64 GB
- Metadata storage:
  Total size: 129 GB
  Free: 115 GB, min:3.6 GB, max:3.6 GB, mean:3.6 GB
- Data storage:
  Total size: 1.4 TB
  Free: 1.4 TB, min:42 GB, max:42 GB, mean:42 GB
[tanabarr@wolf-310 daos]$ install-rocky/bin/dmg -i storage query usage
Tier Roles
---- -----
T1   data,meta,wal

Rank T1-Total T1-Free T1-Usage
---- -------- ------- --------
0    1.6 TB   749 GB  53 %
1    1.6 TB   749 GB  53 %

kjacque
kjacque previously approved these changes Jan 8, 2025
Copy link
Contributor

@kjacque kjacque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this cleanup.

@tanabarr
Copy link
Contributor Author

tanabarr commented Jan 9, 2025

I added bio.h to srv_drpc.c in order to access bio_configured_nvme() as we discussed. This enables population of a flag to indicate MD-on-SSD / PMem mode returned in pool create and query dRPC responses. This added a dependency on libbio for srv_drpc_tests so that to run the test binary I have to prefix with "LD_LIBRARY_PATH=install/lib64/daos_srv". How do I adjust so that run_test.py can run the test with the added dependency as currently it fails with /var/lib/jenkins/jenkins-1/docker_1/workspace/daos-stack_daos_PR-15695@2/build/dev/gcc/src/mgmt/tests/srv_drpc_tests: error while loading shared libraries: libbio.so: cannot open shared object file: No such file or directory (https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-15695/2/artifact/unit_test_logs/src-mgmt-tests-srv_drpc_tests_31/output.log/*view*/) ? @jolivier23 @NiuYawei

@daosbuild1
Copy link
Collaborator

Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15695/3/execution/node/340/log

@daosbuild1
Copy link
Collaborator

Test stage Unit Test on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15695/3/testReport/

@daosbuild1
Copy link
Collaborator

Test stage Unit Test with memcheck on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15695/3/testReport/

Features: pool
Signed-off-by: Tom Nabarro <[email protected]>
@daosbuild1
Copy link
Collaborator

Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15695/4/execution/node/353/log

@daosbuild1
Copy link
Collaborator

Test stage Unit Test on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15695/4/testReport/

@tanabarr
Copy link
Contributor Author

Despite trying both 30bf30e and 6388819 approaches to look up the bio library path when running srv_drpc_tests, the same "no such" error is appearing when trying to open the shared library. The building and running the tests on wolf without manually having to supply the library path seems to work using both methods. @jolivier23 any ideas on what else I could try?

@daosbuild1
Copy link
Collaborator

Test stage Unit Test with memcheck on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15695/4/testReport/

Features: pool
Signed-off-by: Tom Nabarro <[email protected]>
@daosbuild1
Copy link
Collaborator

Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15695/5/execution/node/333/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15695/5/execution/node/361/log

@daosbuild1
Copy link
Collaborator

Test stage Unit Test on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15695/5/testReport/

@daosbuild1
Copy link
Collaborator

Test stage Unit Test with memcheck on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15695/5/testReport/

Features: pool
Signed-off-by: Tom Nabarro <[email protected]>
@daosbuild1
Copy link
Collaborator

Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15695/6/execution/node/346/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15695/6/execution/node/355/log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
control-plane work on the management infrastructure of the DAOS Control Plane meta-on-ssd Metadata on SSD Feature
Development

Successfully merging this pull request may close these issues.

3 participants