You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Error :- UnexpectedStatusException: Error for Training job huggingface-pytorch-training-2022-01-25-19-23-38-888: Failed. Reason: AlgorithmError: ExecuteUserScriptError:
ExitCode 1
ErrorMessage "raise EnvironmentError(msg)
OSError: Can't load config for 'None'. Make sure that: - 'None' is a correct model identifier listed on 'https://huggingface.co/models' (make sure 'None' is not a path to a local directory with something else, in that case) - or 'None' is the correct path to a directory containing a config.json file"
Command "/opt/conda/bin/python3.8 train.py --epochs 1 --eval_batch_size 20 --fp16 True --hub_model_id sagemaker-distilbert-emotion --hub_strategy every_save --hub_token --learning_rate 3e-05 --model_id distilbert-base-uncased --push_to_hub True --train_batch_size 10"
2022-01-25 19:30:06 Uploading - Uploading generated training model
2022-01-25 19:30:06 Failed - Training job failed
ProfilerReport-1643138618: Stopping
2022-01-25 19:29:56,293 - main - INFO - loaded train_dataset length is: 16000
2022-01-25 19:29:56,293 - main - INFO - loaded test_dataset length is: 2000
404 Client Error: Not Found for url: https://huggingface.co/None/resolve/main/config.json
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/transformers/configuration_utils.py", line 550, in get_config_dict
404 Client Error: Not Found for url: https://huggingface.co/None/resolve/main/config.json
resolved_config_file = cached_path(
File "/opt/conda/lib/python3.8/site-packages/transformers/file_utils.py", line 1491, in cached_path
output_path = get_from_cache(
File "/opt/conda/lib/python3.8/site-packages/transformers/file_utils.py", line 1663, in get_from_cache
r.raise_for_status()
File "/opt/conda/lib/python3.8/site-packages/requests/models.py", line 953, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/None/resolve/main/config.json
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train.py", line 57, in
model = AutoModelForSequenceClassification.from_pretrained(args.model_name)
File "/opt/conda/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 396, in from_pretrained
config, kwargs = AutoConfig.from_pretrained(
File "/opt/conda/lib/python3.8/site-packages/transformers/models/auto/configuration_auto.py", line 558, in from_pretrained
config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/transformers/configuration_utils.py", line 575, in get_config_dict
raise EnvironmentError(msg)
OSError: Can't load config for 'None'. Make sure that:
'None' is a correct model identifier listed on 'https://huggingface.co/models'
(make sure 'None' is not a path to a local directory with something else, in that case)
or 'None' is the correct path to a directory containing a config.json file
2022-01-25 19:29:57,197 sagemaker-training-toolkit ERROR Reporting training FAILURE
2022-01-25 19:29:57,197 sagemaker-training-toolkit ERROR ExecuteUserScriptError:
ExitCode 1
ErrorMessage "raise EnvironmentError(msg)
OSError: Can't load config for 'None'. Make sure that: - 'None' is a correct model identifier listed on 'https://huggingface.co/models' (make sure 'None' is not a path to a local directory with something else, in that case) - or 'None' is the correct path to a directory containing a config.json file"
Command "/opt/conda/bin/python3.8 train.py --epochs 1 --eval_batch_size 20 --fp16 True --hub_model_id sagemaker-distilbert-emotion --hub_strategy every_save --hub_token --learning_rate 3e-05 --model_id distilbert-base-uncased --push_to_hub True --train_batch_size 10"
2022-01-25 19:29:57,197 sagemaker-training-toolkit ERROR Encountered exit_code 1
UnexpectedStatusException: Error for Training job huggingface-pytorch-training-2022-01-25-19-23-38-888: Failed. Reason: AlgorithmError: ExecuteUserScriptError:
ExitCode 1
ErrorMessage "raise EnvironmentError(msg)
OSError: Can't load config for 'None'. Make sure that: - 'None' is a correct model identifier listed on 'https://huggingface.co/models' (make sure 'None' is not a path to a local directory with something else, in that case) - or 'None' is the correct path to a directory containing a config.json file"
Command "/opt/conda/bin/python3.8 train.py --epochs 1 --eval_batch_size 20 --fp16 True --hub_model_id sagemaker-distilbert-emotion --hub_strategy every_save --hub_token --learning_rate 3e-05 --model_id distilbert-base-uncased --push_to_hub True --train_batch_size 10"
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi All ,
Trying to use distilbert-base-uncased model to replicated the below example on Sagemaker and facing the error log. Any suggestion ?
Code base:- https://github.com/huggingface/notebooks/blob/master/sagemaker/06_sagemaker_metrics/sagemaker-notebook.ipynb
Error :- UnexpectedStatusException: Error for Training job huggingface-pytorch-training-2022-01-25-19-23-38-888: Failed. Reason: AlgorithmError: ExecuteUserScriptError:
ExitCode 1
ErrorMessage "raise EnvironmentError(msg)
OSError: Can't load config for 'None'. Make sure that: - 'None' is a correct model identifier listed on 'https://huggingface.co/models' (make sure 'None' is not a path to a local directory with something else, in that case) - or 'None' is the correct path to a directory containing a config.json file"
Command "/opt/conda/bin/python3.8 train.py --epochs 1 --eval_batch_size 20 --fp16 True --hub_model_id sagemaker-distilbert-emotion --hub_strategy every_save --hub_token --learning_rate 3e-05 --model_id distilbert-base-uncased --push_to_hub True --train_batch_size 10"
Log:- 2022-01-25 19:29:48,286 sagemaker-training-toolkit INFO Imported framework sagemaker_pytorch_container.training
2022-01-25 19:29:48,307 sagemaker_pytorch_container.training INFO Block until all host DNS lookups succeed.
2022-01-25 19:29:51,328 sagemaker_pytorch_container.training INFO Invoking user training script.
2022-01-25 19:29:51,774 sagemaker-training-toolkit INFO Invoking user script
Training Env:
{
"additional_framework_parameters": {},
"channel_input_dirs": {
"test": "/opt/ml/input/data/test",
"train": "/opt/ml/input/data/train"
},
"current_host": "algo-1",
"framework_module": "sagemaker_pytorch_container.training:main",
"hosts": [
"algo-1"
],
"hyperparameters": {
"hub_token": null,
"model_id": "distilbert-base-uncased",
"eval_batch_size": 20,
"train_batch_size": 10,
"push_to_hub": true,
"hub_model_id": "sagemaker-distilbert-emotion",
"epochs": 1,
"learning_rate": 3e-05,
"hub_strategy": "every_save",
"fp16": true
},
"input_config_dir": "/opt/ml/input/config",
"input_data_config": {
"test": {
"TrainingInputMode": "File",
"S3DistributionType": "FullyReplicated",
"RecordWrapperType": "None"
},
"train": {
"TrainingInputMode": "File",
"S3DistributionType": "FullyReplicated",
"RecordWrapperType": "None"
}
},
"input_dir": "/opt/ml/input",
"is_master": true,
"job_name": "huggingface-pytorch-training-2022-01-25-19-23-38-888",
"log_level": 20,
"master_hostname": "algo-1",
"model_dir": "/opt/ml/model",
"module_dir": "s3://sagemaker-eu-west-2-352316401451/huggingface-pytorch-training-2022-01-25-19-23-38-888/source/sourcedir.tar.gz",
"module_name": "train",
"network_interface_name": "eth0",
"num_cpus": 8,
"num_gpus": 1,
"output_data_dir": "/opt/ml/output/data",
"output_dir": "/opt/ml/output",
"output_intermediate_dir": "/opt/ml/output/intermediate",
"resource_config": {
"current_host": "algo-1",
"hosts": [
"algo-1"
],
"network_interface_name": "eth0"
},
"user_entry_point": "train.py"
}
Environment variables:
SM_HOSTS=["algo-1"]
SM_NETWORK_INTERFACE_NAME=eth0
SM_HPS={"epochs":1,"eval_batch_size":20,"fp16":true,"hub_model_id":"sagemaker-distilbert-emotion","hub_strategy":"every_save","hub_token":null,"learning_rate":3e-05,"model_id":"distilbert-base-uncased","push_to_hub":true,"train_batch_size":10}
SM_USER_ENTRY_POINT=train.py
SM_FRAMEWORK_PARAMS={}
SM_RESOURCE_CONFIG={"current_host":"algo-1","hosts":["algo-1"],"network_interface_name":"eth0"}
SM_INPUT_DATA_CONFIG={"test":{"RecordWrapperType":"None","S3DistributionType":"FullyReplicated","TrainingInputMode":"File"},"train":{"RecordWrapperType":"None","S3DistributionType":"FullyReplicated","TrainingInputMode":"File"}}
SM_OUTPUT_DATA_DIR=/opt/ml/output/data
SM_CHANNELS=["test","train"]
SM_CURRENT_HOST=algo-1
SM_MODULE_NAME=train
SM_LOG_LEVEL=20
SM_FRAMEWORK_MODULE=sagemaker_pytorch_container.training:main
SM_INPUT_DIR=/opt/ml/input
SM_INPUT_CONFIG_DIR=/opt/ml/input/config
SM_OUTPUT_DIR=/opt/ml/output
SM_NUM_CPUS=8
SM_NUM_GPUS=1
SM_MODEL_DIR=/opt/ml/model
SM_MODULE_DIR=s3://sagemaker-eu-west-2-352316401451/huggingface-pytorch-training-2022-01-25-19-23-38-888/source/sourcedir.tar.gz
SM_TRAINING_ENV={"additional_framework_parameters":{},"channel_input_dirs":{"test":"/opt/ml/input/data/test","train":"/opt/ml/input/data/train"},"current_host":"algo-1","framework_module":"sagemaker_pytorch_container.training:main","hosts":["algo-1"],"hyperparameters":{"epochs":1,"eval_batch_size":20,"fp16":true,"hub_model_id":"sagemaker-distilbert-emotion","hub_strategy":"every_save","hub_token":null,"learning_rate":3e-05,"model_id":"distilbert-base-uncased","push_to_hub":true,"train_batch_size":10},"input_config_dir":"/opt/ml/input/config","input_data_config":{"test":{"RecordWrapperType":"None","S3DistributionType":"FullyReplicated","TrainingInputMode":"File"},"train":{"RecordWrapperType":"None","S3DistributionType":"FullyReplicated","TrainingInputMode":"File"}},"input_dir":"/opt/ml/input","is_master":true,"job_name":"huggingface-pytorch-training-2022-01-25-19-23-38-888","log_level":20,"master_hostname":"algo-1","model_dir":"/opt/ml/model","module_dir":"s3://sagemaker-eu-west-2-352316401451/huggingface-pytorch-training-2022-01-25-19-23-38-888/source/sourcedir.tar.gz","module_name":"train","network_interface_name":"eth0","num_cpus":8,"num_gpus":1,"output_data_dir":"/opt/ml/output/data","output_dir":"/opt/ml/output","output_intermediate_dir":"/opt/ml/output/intermediate","resource_config":{"current_host":"algo-1","hosts":["algo-1"],"network_interface_name":"eth0"},"user_entry_point":"train.py"}
SM_USER_ARGS=["--epochs","1","--eval_batch_size","20","--fp16","True","--hub_model_id","sagemaker-distilbert-emotion","--hub_strategy","every_save","--hub_token","","--learning_rate","3e-05","--model_id","distilbert-base-uncased","--push_to_hub","True","--train_batch_size","10"]
SM_OUTPUT_INTERMEDIATE_DIR=/opt/ml/output/intermediate
SM_CHANNEL_TEST=/opt/ml/input/data/test
SM_CHANNEL_TRAIN=/opt/ml/input/data/train
SM_HP_HUB_TOKEN=
SM_HP_MODEL_ID=distilbert-base-uncased
SM_HP_EVAL_BATCH_SIZE=20
SM_HP_TRAIN_BATCH_SIZE=10
SM_HP_PUSH_TO_HUB=true
SM_HP_HUB_MODEL_ID=sagemaker-distilbert-emotion
SM_HP_EPOCHS=1
SM_HP_LEARNING_RATE=3e-05
SM_HP_HUB_STRATEGY=every_save
SM_HP_FP16=true
PYTHONPATH=/opt/ml/code:/opt/conda/bin:/opt/conda/lib/python38.zip:/opt/conda/lib/python3.8:/opt/conda/lib/python3.8/lib-dynload:/opt/conda/lib/python3.8/site-packages
Invoking script with the following command:
/opt/conda/bin/python3.8 train.py --epochs 1 --eval_batch_size 20 --fp16 True --hub_model_id sagemaker-distilbert-emotion --hub_strategy every_save --hub_token --learning_rate 3e-05 --model_id distilbert-base-uncased --push_to_hub True --train_batch_size 10
2022-01-25 19:30:06 Uploading - Uploading generated training model
2022-01-25 19:30:06 Failed - Training job failed
ProfilerReport-1643138618: Stopping
2022-01-25 19:29:56,293 - main - INFO - loaded train_dataset length is: 16000
2022-01-25 19:29:56,293 - main - INFO - loaded test_dataset length is: 2000
404 Client Error: Not Found for url: https://huggingface.co/None/resolve/main/config.json
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/transformers/configuration_utils.py", line 550, in get_config_dict
404 Client Error: Not Found for url: https://huggingface.co/None/resolve/main/config.json
resolved_config_file = cached_path(
File "/opt/conda/lib/python3.8/site-packages/transformers/file_utils.py", line 1491, in cached_path
output_path = get_from_cache(
File "/opt/conda/lib/python3.8/site-packages/transformers/file_utils.py", line 1663, in get_from_cache
r.raise_for_status()
File "/opt/conda/lib/python3.8/site-packages/requests/models.py", line 953, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/None/resolve/main/config.json
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train.py", line 57, in
model = AutoModelForSequenceClassification.from_pretrained(args.model_name)
File "/opt/conda/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 396, in from_pretrained
config, kwargs = AutoConfig.from_pretrained(
File "/opt/conda/lib/python3.8/site-packages/transformers/models/auto/configuration_auto.py", line 558, in from_pretrained
config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/transformers/configuration_utils.py", line 575, in get_config_dict
raise EnvironmentError(msg)
OSError: Can't load config for 'None'. Make sure that:
(make sure 'None' is not a path to a local directory with something else, in that case)
2022-01-25 19:29:57,197 sagemaker-training-toolkit ERROR Reporting training FAILURE
2022-01-25 19:29:57,197 sagemaker-training-toolkit ERROR ExecuteUserScriptError:
ExitCode 1
ErrorMessage "raise EnvironmentError(msg)
OSError: Can't load config for 'None'. Make sure that: - 'None' is a correct model identifier listed on 'https://huggingface.co/models' (make sure 'None' is not a path to a local directory with something else, in that case) - or 'None' is the correct path to a directory containing a config.json file"
Command "/opt/conda/bin/python3.8 train.py --epochs 1 --eval_batch_size 20 --fp16 True --hub_model_id sagemaker-distilbert-emotion --hub_strategy every_save --hub_token --learning_rate 3e-05 --model_id distilbert-base-uncased --push_to_hub True --train_batch_size 10"
2022-01-25 19:29:57,197 sagemaker-training-toolkit ERROR Encountered exit_code 1
UnexpectedStatusException Traceback (most recent call last)
in
----> 1 huggingface_estimator.fit({'train': training_input_path, 'test': test_input_path})
~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/estimator.py in fit(self, inputs, wait, logs, job_name, experiment_config)
690 self.jobs.append(self.latest_training_job)
691 if wait:
--> 692 self.latest_training_job.wait(logs=logs)
693
694 def _compilation_job_name(self):
~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/estimator.py in wait(self, logs)
1665 # If logs are requested, call logs_for_jobs.
1666 if logs != "None":
-> 1667 self.sagemaker_session.logs_for_job(self.job_name, wait=True, log_type=logs)
1668 else:
1669 self.sagemaker_session.wait_for_job(self.job_name)
~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/session.py in logs_for_job(self, job_name, wait, poll, log_type)
3783
3784 if wait:
-> 3785 self._check_job_status(job_name, description, "TrainingJobStatus")
3786 if dot:
3787 print()
~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/session.py in _check_job_status(self, job, desc, status_key_name)
3341 ),
3342 allowed_statuses=["Completed", "Stopped"],
-> 3343 actual_status=status,
3344 )
3345
UnexpectedStatusException: Error for Training job huggingface-pytorch-training-2022-01-25-19-23-38-888: Failed. Reason: AlgorithmError: ExecuteUserScriptError:
ExitCode 1
ErrorMessage "raise EnvironmentError(msg)
OSError: Can't load config for 'None'. Make sure that: - 'None' is a correct model identifier listed on 'https://huggingface.co/models' (make sure 'None' is not a path to a local directory with something else, in that case) - or 'None' is the correct path to a directory containing a config.json file"
Command "/opt/conda/bin/python3.8 train.py --epochs 1 --eval_batch_size 20 --fp16 True --hub_model_id sagemaker-distilbert-emotion --hub_strategy every_save --hub_token --learning_rate 3e-05 --model_id distilbert-base-uncased --push_to_hub True --train_batch_size 10"
Beta Was this translation helpful? Give feedback.
All reactions