-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-16916 container: use ABT mutex to serialize cont open #15742
base: master
Are you sure you want to change the base?
Conversation
Use ABT mutex to serialize cont open. Signed-off-by: Di Wang <[email protected]>
Ticket title is 'ds_cont_local_open() Assertion 'hdl->sch_cont->sc_open == 1' failed: Unexpected open count for cont c6822a9a: 307' |
src/include/daos_srv/container.h
Outdated
@@ -71,7 +71,7 @@ struct ds_cont_child { | |||
sc_dtx_delay_reset : 1, sc_dtx_registered : 1, sc_props_fetched : 1, sc_stopping : 1, | |||
sc_destroying : 1, sc_vos_agg_active : 1, sc_ec_agg_active : 1, | |||
/* flag of CONT_CAPA_READ_DATA/_WRITE_DATA disabled */ | |||
sc_rw_disabled : 1, sc_scrubbing : 1, sc_rebuilding : 1, sc_open_initializing : 1; | |||
sc_rw_disabled : 1, sc_scrubbing : 1, sc_rebuilding : 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mentioned on your last patch but if you change these to semicolons, clang-format will look better and it's the same behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, sure, missed that. Thanks!
Signed-off-by: Di Wang <[email protected]>
641f9f3
Test stage Build on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15742/3/execution/node/371/log |
Test stage Build on Leap 15.5 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15742/3/execution/node/375/log |
Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15742/3/execution/node/363/log |
Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15742/3/execution/node/358/log |
Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15742/3/execution/node/359/log |
Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15742/3/execution/node/327/log |
Run-GHA: true Allow-unstable-test: true Required-githooks: true
Test stage Build on Leap 15.5 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15742/4/execution/node/360/log |
Test stage Build on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15742/4/execution/node/375/log |
Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15742/4/execution/node/350/log |
Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15742/4/execution/node/334/log |
Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15742/4/execution/node/354/log |
Since cont open process might be yield due if IV fetch needs to fetch the property remotely, so sc_open counter might be larger than 1 if property IV fetch failed. Run-GHA: true Allow-unstable-test: true Required-githooks: true Signed-off-by: Di Wang <[email protected]>
@Nasf-Fan it looks like those sc_open counter might be larger than 1 if iv fetch failed? so I removed those assertion? thoughts? |
Test stage Build on Leap 15.5 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15742/5/execution/node/361/log |
Test stage Build on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15742/5/execution/node/376/log |
Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15742/5/execution/node/327/log |
Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15742/5/execution/node/328/log |
Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15742/5/execution/node/359/log |
Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15742/5/execution/node/350/log |
fix build failure Run-GHA: true Allow-unstable-test: true Signed-off-by: Di Wang <[email protected]>
Functional on EL 8.8 Test Results131 tests 127 ✅ 1h 29m 51s ⏱️ Results for commit abaea27. |
Functional Hardware Large Test Results65 tests 65 ✅ 31m 5s ⏱️ Results for commit abaea27. |
Functional Hardware Medium Verbs Provider Test Results55 tests 54 ✅ 4h 39m 1s ⏱️ Results for commit abaea27. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a conflict to be resolved.
@@ -1630,59 +1630,55 @@ ds_cont_local_open(uuid_t pool_uuid, uuid_t cont_hdl_uuid, uuid_t cont_uuid, | |||
*/ | |||
D_ASSERT(hdl->sch_cont != NULL); | |||
D_ASSERT(hdl->sch_cont->sc_pool != NULL); | |||
|
|||
hdl->sch_cont->sc_open++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: simply moving ABT_mutex_lock() before sc_open++ would make code simpler, and you don't have to change the assertions in DTX code.
Use ABT mutex to serialize cont open.
Before requesting gatekeeper:
Features:
(orTest-tag*
) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.Gatekeeper: