You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @muellerzr !
I am trying to run Llama 8b model on gpu-a40s using accelerate. I want to first evaluate the model and then add a few trainable parameters and train them. Since the llama 8b checkpoint cannot fit on a single gpu-a40 I am using fsdp configuration. (is it the correct choice?)
when I run accelerate launch the code enters the following method from utils/fsdp_utils.py --
Hi @muellerzr !
I am trying to run Llama 8b model on gpu-a40s using accelerate. I want to first evaluate the model and then add a few trainable parameters and train them. Since the llama 8b checkpoint cannot fit on a single gpu-a40 I am using fsdp configuration. (is it the correct choice?)
when I run accelerate launch the code enters the following method from utils/fsdp_utils.py --
def load_fsdp_model(fsdp_plugin, accelerator, model, input_dir, model_index=0, adapter_only=False):
and then raises the following error:
I went through the documentations -- https://huggingface.co/docs/accelerate/en/usage_guides/distributed_inference
as well as https://huggingface.co/docs/accelerate/en/usage_guides/fsdp -- am I missing something here? any help/documentation/tutorial on how to run/finetune/train large models where GPU memory is not sufficient and uses some sort of model sharding using accelerate would be really helpful!!!
Thanks,
Kalyani
The text was updated successfully, but these errors were encountered: