-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accelerate with DeepSpeed not works #3337
Comments
Most of the code you post appears to be unrelated. If you just have the following code, do you still get the error? import logging
from accelerate import Accelerator
accelerator = Accelerator()
logger = logging.getLogger(__name__)
logging.basicConfig(
format="%(asctime)s - %(levelname)s - %(name)s - %(message)s",
datefmt="%m/%d/%Y %H:%M:%S",
level=logging.INFO,
)
logger.info(accelerator.state)
accelerator.state.deepspeed_plugin |
As @BenjaminBossan hints at, the problem is you're using the |
yes I am getting error, if I am using huggingface parser @BenjaminBossan def main():
accelerator = Accelerator()
num_proc = os.cpu_count()
logger = logging.getLogger(__name__)
logging.basicConfig(
format="%(asctime)s - %(levelname)s - %(name)s - %(message)s",
datefmt="%m/%d/%Y %H:%M:%S",
level=logging.INFO,
)
# parse the argument
parser = transformers.HfArgumentParser(
(ModelConfig, DataConfig, PeftConfig, AccelerateConfig, TrainConfig))
model_config, data_config, peft_config, accelerate_config, training_config = parser.parse_args_into_dataclasses()
logger.info(accelerator.state) when I am using this code it get an error, File "/opt/conda/lib/python3.11/site-packages/accelerate/state.py", line 1131, in getattr and even if I am trying to use this line to log the deepspeed config: def main():
accelerator = Accelerator()
num_proc = os.cpu_count()
logger = logging.getLogger(__name__)
logging.basicConfig(
format="%(asctime)s - %(levelname)s - %(name)s - %(message)s",
datefmt="%m/%d/%Y %H:%M:%S",
level=logging.INFO,
)
# parse the argument
parser = transformers.HfArgumentParser(
(ModelConfig, DataConfig, PeftConfig, AccelerateConfig, TrainConfig))
model_config, data_config, peft_config, accelerate_config, training_config = parser.parse_args_into_dataclasses()
if accelerator.state.deepspeed_plugin:
logger.info(accelerator.state.deepspeed_plugin.debugging) I also get an error, I think the error, is caused after using the HfArgument parser |
@khalil-Hennara Could you please provide a complete reproducer that we can run on our machines to reproduce the error? |
@BenjaminBossan I am gonna provide the code and the error also the accelerate env. First of all accelerate env
Second the code, I have remove any dependencies of my code to make it easy to reproduceimport logging
import math
import os
import sys
import fire
import numpy as np
import torch
# third party dependencies
import transformers
from accelerate import Accelerator
from accelerate.logging import get_logger
from accelerate.utils import set_seed, DummyScheduler, DummyOptim
from transformers import default_data_collator, TrainingArguments
from transformers.utils import check_min_version, is_flash_attn_2_available
from peft import LoraConfig, get_peft_model
import datasets
from torch.utils.data import DataLoader
def main():
# parse the argument
accelerator = Accelerator()
parser = transformers.HfArgumentParser((TrainingArguments))
training_config = parser.parse_args_into_dataclasses()[0]
logger = get_logger(__name__)
logging.basicConfig(
format="%(asctime)s - %(levelname)s - %(name)s - %(message)s",
datefmt="%m/%d/%Y %H:%M:%S",
level=logging.INFO,
)
logger.info(accelerator.state)
if __name__ == "__main__":
main() The error message" The launch script
My training script has been build using run_clm_no_trainer.py as an official example. And I have used it many time before training many models, and I didn't face any such problems, also within the https://github.com/huggingface/transformers/blob/main/examples/pytorch/language-modeling/run_clm_no_trainer.py you can notice that the logger has been initilized before the accelerator, and the accelerator also have been modified according to the custom argument from the user. |
Thanks @khalil-Hennara I could simplify the script as follows: import transformers
from accelerate import Accelerator
from transformers import TrainingArguments
def main1():
accelerator = Accelerator()
parser = transformers.HfArgumentParser(TrainingArguments)
# parser.parse_args_into_dataclasses()
repr(accelerator.state)
print("main1 passes")
def main2():
parser = transformers.HfArgumentParser(TrainingArguments)
parser.parse_args_into_dataclasses()
accelerator = Accelerator()
repr(accelerator.state)
print("main2 passes")
def main3():
accelerator = Accelerator()
parser = transformers.HfArgumentParser(TrainingArguments)
parser.parse_args_into_dataclasses()
repr(accelerator.state)
print("main3 should pass but fails")
if __name__ == "__main__":
main1()
main2()
main3() Note that I have 3 different conditions:
In accelerate/src/accelerate/state.py Line 969 in f0b0305
because the |
Thank you @BenjaminBossan for your explaination but I am still get error messages. error message" import transformers
from accelerate import Accelerator
from transformers import TrainingArguments
import argparse
def get_wandb_args(parser: argparse.ArgumentParser) -> argparse.ArgumentParser:
"""Add Weights & Biases arguments to parser"""
wandb_group = parser.add_argument_group('Weights & Biases Arguments', 'W&B configuration parameters')
# Project name
wandb_group.add_argument('--output_dir', type=str, default='kawn_vision',
help='W&B project name (default: kawn_vision)')
return parser
def main1():
accelerator = Accelerator()
parser = transformers.HfArgumentParser(TrainingArguments)
# parser.parse_args_into_dataclasses()
repr(accelerator.state)
print("main1 passes")
def main2():
parser = argparse.ArgumentParser()
args = get_wandb_args(parser).parse_args()
accelerator = Accelerator()
repr(accelerator.state)
print("main2 passes")
def main3():
accelerator = Accelerator()
parser = transformers.HfArgumentParser(TrainingArguments)
parser.parse_args_into_dataclasses()
repr(accelerator.state)
print("main3 should pass but fails")
if __name__ == "__main__":
main1()
main2()
main3() the previous code failed on main3 with following error main3 error" |
System Info
Information
Tasks
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
)Reproduction
accelerator = Accelerator()
# parse the argument
parser = transformers.HfArgumentParser(
(ModelConfig, DataConfig, PeftConfig, AccelerateConfig, TrainConfig))
model_config, data_config, peft_config, accelerate_config, training_config = parser.parse_args_into_dataclasses()
Expected behavior
to work fine, I've used this script many time, before and I didn't face any problems, but I've been try to solve the problem for the last two days, the problem is new. every time I am running the code I am getting this error
--- Logging error ---
Traceback (most recent call last):
File "/opt/conda/lib/python3.11/logging/init.py", line 1110, in emit
msg = self.format(record)
^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/logging/init.py", line 953, in format
return fmt.format(record)
^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/logging/init.py", line 687, in format
record.message = record.getMessage()
^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/logging/init.py", line 375, in getMessage
msg = str(self.msg)
^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/accelerate/state.py", line 949, in repr
repr = PartialState().repr() + f"\nMixed precision type: {self.mixed_precision}\n"
^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/accelerate/state.py", line 1131, in getattr
raise AttributeError(f"'AcceleratorState' object has no attribute '{name}'")
AttributeError: 'AcceleratorState' object has no attribute 'mixed_precision'
and also the script keep working and then when I am trying to access the accelerator.state.deepspeed_plugin as recommended in official Doc, I get the next error
AttributeError:
AcceleratorState
object has no attributedeepspeed_plugin
. This happens ifAcceleratorState._reset_state()
was called and anAccelerator
orPartialState
was not reinitialized.. Did you mean: 'get_deepspeed_plugin'?even thought there is no such line in my code.
The text was updated successfully, but these errors were encountered: