fix process group init logic and add pg_desc parsing (#186) #190
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
This DIFF include the following changes:
parse pg description.
fix pg creation logic.
Test Plan: buck2 run -c fbcode.nvcc_arch='h100a' mode/opt -c hpc_comms.use_ncclx=2.21.5 param_bench/train/comms/pt:launcher -- --launcher mast --dp networkai_mast_job_identity --cluster MastProdCluster --hw grandteton --nnode 8 --ppn 8 --module commsTraceReplay_v2 --trace-path manifold://pytorch_execution_trace/tree/traces/shengfu/nv/pattern2-64gpu --trace-type et --json_mast_flex_pool_id_override_map ~/flex_pool_long_cable.json --reuse-tensors --warmup-iter 5 --num-replays 10
Differential Revision: D67071961
Pulled By: shengfukevin