Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions on large model inference / finetuning #3353

Open
kalyani7195 opened this issue Jan 18, 2025 · 2 comments
Open

Questions on large model inference / finetuning #3353

kalyani7195 opened this issue Jan 18, 2025 · 2 comments

Comments

@kalyani7195
Copy link

kalyani7195 commented Jan 18, 2025

Hi @muellerzr !
I am trying to run Llama 8b model on gpu-a40s using accelerate. I want to first evaluate the model and then add a few trainable parameters and train them. Since the llama 8b checkpoint cannot fit on a single gpu-a40 I am using fsdp configuration. (is it the correct choice?)
when I run accelerate launch the code enters the following method from utils/fsdp_utils.py --

def load_fsdp_model(fsdp_plugin, accelerator, model, input_dir, model_index=0, adapter_only=False):

and then raises the following error:
I went through the documentations -- https://huggingface.co/docs/accelerate/en/usage_guides/distributed_inference
as well as https://huggingface.co/docs/accelerate/en/usage_guides/fsdp -- am I missing something here? any help/documentation/tutorial on how to run/finetune/train large models where GPU memory is not sufficient and uses some sort of model sharding using accelerate would be really helpful!!!
Thanks,
Kalyani

[rank3]: Traceback (most recent call last):
[rank3]:   File "/gscratch/zlab/kmarathe/models/xyz/xyz/scripts/llama/xyz_llama3_1_8b_config_extracted.py", line 375, in <module>
[rank3]:     main(args.ablation_config_path)
[rank3]:   File "/gscratch/zlab/kmarathe/models/xyz/xyz/scripts/llama/xyz_llama3_1_8b_config_extracted.py", line 170, in main
[rank3]:     base_exp_name = evaluate_base_model(exp_ids, exp_names, display_names, base_model_args)
[rank3]:                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/gscratch/zlab/kmarathe/models/xyz/xyz/scripts/llama/xyz_llama3_1_8b_config_extracted.py", line 362, in evaluate_base_model
[rank3]:     train(args)
[rank3]:   File "/gscratch/zlab/kmarathe/models/xyz/xyz/train.py", line 390, in train
[rank3]:     setup_state(tc)
[rank3]:   File "/gscratch/zlab/kmarathe/models/xyz/xyz/train.py", line 234, in setup_state
[rank3]:     load_state(tc.accelerator, tc.state_path)
[rank3]:   File "/gscratch/zlab/kmarathe/models/xyz/xyz/utils.py", line 553, in load_state
[rank3]:     accelerator.load_state(state_path)
[rank3]:   File "/gscratch/zlab/kmarathe/miniconda3/envs/py312/lib/python3.12/site-packages/accelerate/accelerator.py", line 3186, in load_state
[rank3]:     load_fsdp_model(self.state.fsdp_plugin, self, model, input_dir, i)
[rank3]:   File "/gscratch/zlab/kmarathe/miniconda3/envs/py312/lib/python3.12/site-packages/accelerate/utils/fsdp_utils.py", line 144, in load_fsdp_model
[rank3]:     state_dict = torch.load(input_model_file)
[rank3]:                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/gscratch/zlab/kmarathe/miniconda3/envs/py312/lib/python3.12/site-packages/torch/serialization.py", line 1319, in load
[rank3]:     with _open_file_like(f, "rb") as opened_file:
[rank3]:          ^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/gscratch/zlab/kmarathe/miniconda3/envs/py312/lib/python3.12/site-packages/torch/serialization.py", line 659, in _open_file_like
[rank3]:     return _open_file(name_or_buffer, mode)
[rank3]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank3]:   File "/gscratch/zlab/kmarathe/miniconda3/envs/py312/lib/python3.12/site-packages/torch/serialization.py", line 640, in __init__
[rank3]:     super().__init__(open(name, mode))
[rank3]:                      ^^^^^^^^^^^^^^^^
[rank3]: FileNotFoundError: [Errno 2] No such file or directory: '/gscratch/zlab/kmarathe/models/xyz/Experiments/Llama/Llama3_1_8b/experiments/llama3_1_8b_num_experts_ablation/num_experts_64/xyz_runs/c4_llama_dsti_relu_args_num_experts_64_1/state/pytorch_model_fsdp.bin'
@kalyani7195 kalyani7195 changed the title Questions on large model inference on finetuning Questions on large model inference / finetuning Jan 18, 2025
@kalyani7195
Copy link
Author

kalyani7195 commented Jan 18, 2025

Here is my accelerate environment --

compute_environment: LOCAL_MACHINE
debug: false
distributed_type: FSDP
downcast_bf16: 'no'
enable_cpu_affinity: false
fsdp_config:
  fsdp_activation_checkpointing: false
  fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
  fsdp_backward_prefetch: BACKWARD_PRE
  fsdp_cpu_ram_efficient_loading: true
  fsdp_forward_prefetch: false
  fsdp_offload_params: false
  fsdp_sharding_strategy: FULL_SHARD
  fsdp_state_dict_type: FULL_STATE_DICT
  fsdp_sync_module_states: true
  fsdp_use_orig_params: true
  fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer
machine_rank: 0
main_training_function: main
mixed_precision: 'no'
num_machines: 1
num_processes: 4
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false

@kalyani7195
Copy link
Author

#1890 I think this feature request is highly relevant to my question I think

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant