Not able to deploy pretrained Pytorch Model #2373

mukeshyadav · 2020-07-20T17:38:35Z

mukeshyadav
Jul 20, 2020

I have a pre-trained model, now trying to create an endpoint using Sagemaker, my folder structure like this
"model.tar.gz" looks like this:

running following script to create endpoint:

pytorch_model = PyTorchModel( model_data='s3://mck-dl-ai-studio/answer_card/answercard.tar.gz', role=role, entry_point='inference.py', framework_version="1.3.1")

predictor = pytorch_model.deploy(instance_type='ml.t2.medium', initial_instance_count=1)

An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (500) from model with message "No module named 'transformers'". See https://us-west-2.console.aws.amazon.com/cloudwatch/home?region=us-west-2#logEventViewer:group=/aws/sagemaker/Endpoints/pytorch-inference-2020-07-20-16-45-51-564 in account xxxxxx for more information.

what I am missing here tried adding source_dir and py_version but no success

chuyang-deng · 2020-07-21T19:15:47Z

chuyang-deng
Jul 21, 2020

Hi @mukeshyadav , did you import transformers in your inference.py script? If so, it might be that the container failed to install modules listed in requirements.txt.

When you init the PyTorchModel, did you provide the source_dir with both inference.py and requirements.txt at the root level of source_dir? Here's documentation on how to include requirements.txt: https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/using_pytorch.html#using-third-party-libraries

0 replies

mukeshyadav · 2020-07-27T14:40:11Z

mukeshyadav
Jul 27, 2020
Author

Hi @ChuyangDeng Thanks for sharing the link, I resolved the issue but now during calling predict function getting below error:

An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (0) from the model with message "Your invocation timed out while waiting for a response from container model. Review the latency metrics for each container in Amazon CloudWatch,

0 replies

laurenyu · 2020-07-27T15:47:31Z

laurenyu
Jul 27, 2020

Could you share your CloudWatch logs from the endpoint?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not able to deploy pretrained Pytorch Model #2373

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Not able to deploy pretrained Pytorch Model #2373

mukeshyadav Jul 20, 2020

Replies: 3 comments

chuyang-deng Jul 21, 2020

mukeshyadav Jul 27, 2020 Author

laurenyu Jul 27, 2020

mukeshyadav
Jul 20, 2020

chuyang-deng
Jul 21, 2020

mukeshyadav
Jul 27, 2020
Author

laurenyu
Jul 27, 2020