Replies: 1 comment
-
I resolved the problem. The issue is at sm_client.create_model():
I have used Deep Learning Container Images for training the model. So for the PrimaryContainer['Image'] (above code), I have to pass an inference image and not a training image. AS long as my training image does not change, my inference image should not change. I chose the correct inference image from the AWS Sagemkaer Deep Learning Container Images and pass it to the
PROBLEM RESOLVED :) |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have trained a model, deployed it successfully by just running the notebook (https://github.com/huggingface/notebooks/blob/master/sagemaker/01_getting_started_pytorch/sagemaker-notebook.ipynb)
However, I am trying to rerun the training job and deploy the new model at the endpoint. I am able to create a new trainingjob (whose specs are identical) to the training job obtained from the example notebook. I am able to create a new model and new endpoint configuration. But I get the following error when I update the endpoint with the new endpoint configuration:
The primary container for production variant did not pass the ping health check
How I create models, endpoint configuration, and update endpoint:
I have the following code in my AWS Lambda to trigger the creation process:
Using Python 3.6
Other Approaches:
I received the same error at the endpoint when I tried to create a model and an endpoint configuration using AWS Sagemaker GUI.
Kindly Assist
Beta Was this translation helpful? Give feedback.
All reactions