502 bad gateway? #2384

Xixiong-Guo · 2020-05-11T02:41:55Z

Xixiong-Guo
May 11, 2020

Following https://aws.amazon.com/blogs/machine-learning/deploy-trained-keras-or-tensorflow-models-using-amazon-sagemaker/, I tried to save two different model (sentiment analysis and a simple regression model) trained by tensorflow+keras, and uploaded to Sagemaker, but encountered the same 502 error, which is seldom reported here or stackoverflow. Any thoughts?

Body_review = ','.join([str(val) for val in padded_pred]).encode('utf-8')

response = runtime.invoke_endpoint(EndpointName=predictor.endpoint,

ContentType = 'text/csv',

Body = Body_review)

An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (502) from model with message " <title>502 Bad Gateway</title>

502 Bad Gateway

nginx/1.16.1

I searched the CloudWatch as found attached:

2020/05/10 15:53:27 [error] 35#35: *187 connect() failed (111: Connection refused) while connecting to upstream, client: 10.32.0.1, server: , request: "POST /invocations HTTP/1.1", subrequest: "/v1/models/export:predict", upstream: "http://127.0.0.1:27001/v1/models/export:predict", host: "model.aws.local:8080"

I tried another regression model (trained outside sagemaker, saved and loaded to S3 and Sagemaker, following https://aws.amazon.com/blogs/machine-learning/deploy-trained-keras-or-tensorflow-models-using-amazon-sagemaker/ )

Still the same issue when using the predictor:

from sagemaker.predictor import csv_serializer

predictor.content_type = 'text/csv'

predictor.serializer = csv_serializer

Y_pred = predictor.predict(test.tolist())

Error:

--------------------------------------------------------------------------- ModelError Traceback (most recent call last) in () 4 predictor.serializer = csv_serializer 5 ----> 6 Y_pred = predictor.predict(test) ~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/sagemaker/predictor.py in predict(self, data, initial_args, target_model) 108 109 request_args = self._create_request_args(data, initial_args, target_model) --> 110 response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args) 111 return self._handle_response(response) 112 ~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/botocore/client.py in _api_call(self, *args, **kwargs) 314 "%s() only accepts keyword arguments." % py_operation_name) 315 # The "self" in this scope is referring to the BaseClient. --> 316 return self._make_api_call(operation_name, kwargs) 317 318 _api_call.name = str(py_operation_name) ~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params) 624 error_code = parsed_response.get("Error", {}).get("Code") 625 error_class = self.exceptions.from_code(error_code) --> 626 raise error_class(parsed_response, operation_name) 627 else: 628 return parsed_response ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (502) from model with message " <title>502 Bad Gateway</title>

502 Bad Gateway

nginx/1.16.1 ".

Xixiong-Guo · 2020-05-11T02:44:51Z

Xixiong-Guo
May 11, 2020
Author

Actually the 502 error trouble was there when running predictor.predict(test) before deploy. But my model performed well in my own machine and saved exactly the same way as https://aws.amazon.com/blogs/machine-learning/deploy-trained-keras-or-tensorflow-models-using-amazon-sagemaker/

0 replies

chuyang-deng · 2020-05-15T19:55:51Z

chuyang-deng
May 15, 2020

Hi @Xixiong-Guo, if you are following the example from https://aws.amazon.com/blogs/machine-learning/deploy-trained-keras-or-tensorflow-models-using-amazon-sagemaker/ , it's likely that you've used the wrong Model class from step 5.

For framework versions 1.11 and above, we've split the tensorflow container into training and serving. And for deploying the model, please use this class instead: https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/tensorflow/serving.py#L121

0 replies

Xixiong-Guo · 2020-05-15T20:50:34Z

Xixiong-Guo
May 15, 2020
Author

Hi, @ChuyangDeng Thanks for your reply.

I did encountered this problem, and after I found this reference (https://sagemaker.readthedocs.io/en/stable/using_tf.html#deploying-directly-from-model-artifacts). I've changed to

from sagemaker.tensorflow.serving import Model
model = Model(model_data='s3://sagemaker-us-east-1-665159495798/model/model.tar.gz', role=role)

I guess this should not be the reason for that 502 error now? Thanks!

0 replies

chuyang-deng · 2020-05-15T21:43:12Z

chuyang-deng
May 15, 2020

Hi @Xixiong-Guo,

How did you tar the model? When you tar your model, please make sure to use the -C option so that the tar.gz does not add an extra layer to your folder. When it extracts, the outmost layer should be the version number, something like

$ ls -al 00000123 # version number (not model name)
total 24
drwxr-xr-x .
drwx------ ..
drwxr-xr-x assets
-rw-r--r-- saved_model.pb
drwxr-xr-x variables

0 replies

Xixiong-Guo · 2020-05-15T23:14:07Z

Xixiong-Guo
May 15, 2020
Author

Hi @ChuyangDeng
I used:
"import tarfile
with tarfile.open('model.tar.gz', mode='w:gz') as archive:
archive.add('export', recursive=True)"

My tar.gz looks like:
model.tar.gz\export\Servo\1\saved_model.pb
model.tar.gz\export\Servo\1\variables\

You mean the directory should like:
model.tar.gz\1\saved_model.pb
model.tar.gz\1\variables\ ? (If the version number is 1)

Thanks.

0 replies

chuyang-deng · 2020-05-16T04:59:28Z

chuyang-deng
May 16, 2020

Yes, SageMaker expects model to be extracted directly under "opt/ml/<model_name>/" directory inside the container. The sagemaker-tensorflow-serving container will look for model version directly under "<model_name>/". So your tar structure should be:

model.tar.gz\1\saved_model.pb
model.tar.gz\1\variables...

0 replies

Xixiong-Guo · 2020-05-16T14:12:24Z

Xixiong-Guo
May 16, 2020
Author

Hi @ChuyangDeng
Unfortunately it is still not working. The error is still the same as previous. The tar structure is attached.

Code and errors are as follows:

import boto3, re
from sagemaker import get_execution_role

role = get_execution_role()
import tarfile
with tarfile.open('model.tar.gz', mode='w:gz') as archive:
archive.add('1', recursive=True)

import sagemaker
sagemaker_session = sagemaker.Session()
inputs = sagemaker_session.upload_data(path='model.tar.gz', key_prefix='model')

from sagemaker.tensorflow.serving import Model
model = Model(model_data='s3://sagemaker-us-east-1-665159495798/model/model.tar.gz', role=role)
predictor = model.deploy(initial_instance_count=1, instance_type='ml.c5.xlarge')

result = predictor.predict(input)

ModelError Traceback (most recent call last)
in ()
----> 1 result = predictor.predict(input)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/sagemaker/tensorflow/serving.py in predict(self, data, initial_args)
116 args["CustomAttributes"] = self._model_attributes
117
--> 118 return super(Predictor, self).predict(data, args)
119
120

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/sagemaker/predictor.py in predict(self, data, initial_args, target_model)
108
109 request_args = self._create_request_args(data, initial_args, target_model)
--> 110 response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args)
111 return self._handle_response(response)
112

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
314 "%s() only accepts keyword arguments." % py_operation_name)
315 # The "self" in this scope is referring to the BaseClient.
--> 316 return self._make_api_call(operation_name, kwargs)
317
318 _api_call.name = str(py_operation_name)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
624 error_code = parsed_response.get("Error", {}).get("Code")
625 error_class = self.exceptions.from_code(error_code)
--> 626 raise error_class(parsed_response, operation_name)
627 else:
628 return parsed_response

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (502) from model with message "

<title>502 Bad Gateway</title>

502 Bad Gateway

nginx/1.16.1 ". See https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logEventViewer:group=/aws/sagemaker/Endpoints/sagemaker-tensorflow-serving-2020-05-16-13-45-23-880 in account 665159495798 for more information.

0 replies

Sbrikky · 2020-05-18T08:17:53Z

Sbrikky
May 18, 2020

I've been having the same issue after following the same examples. I've also checked me TAR and use the serving model.

The cloudwatch log is as follows from the moment I invoke the endpoint until it goes back to the regular pinging. (I used container_log_level = logging.DEBUG))

2020-05-18 08:13:49.621487: F external/org_tensorflow/tensorflow/core/util/tensor_format.h:426] Check failed: index >= 0 && index < num_total_dims Invalid index from the dimension: 3, 0, C

2020/05/18 08:13:49 [error] 18#18: *96 upstream prematurely closed connection while reading response header from upstream, client: 10.32.0.2, server: , request: "POST /invocations HTTP/1.1", subrequest: "/v1/models/model:predict", upstream: "http://127.0.0.1:27001/v1/models/model:predict", host: "model.aws.local:8080"

2020/05/18 08:13:49 [warn] 18#18: *96 upstream server temporarily disabled while reading response header from upstream, client: 10.32.0.2, server: , request: "POST /invocations HTTP/1.1", subrequest: "/v1/models/model:predict", upstream: "http://127.0.0.1:27001/v1/models/model:predict", host: "model.aws.local:8080"

10.32.0.2 - - [18/May/2020:08:13:49 +0000] "POST /invocations HTTP/1.1" 502 157 "-" "AHC/2.0"

WARNING:main:unexpected tensorflow serving exit (status: 6). restarting.

INFO:main:tensorflow version info:

TensorFlow ModelServer: 2.1.0-rc1+dev.sha.075ffcf

TensorFlow Library: 2.1.0

INFO:main:tensorflow serving command: tensorflow_model_server --port=27000 --rest_api_port=27001 --model_config_file=/sagemaker/model-config.cfg --max_num_load_retries=0

INFO:main:started tensorflow serving (pid: 131)

2020-05-18 08:13:50.045247: I tensorflow_serving/model_servers/server_core.cc:462] Adding/updating models.

2020-05-18 08:13:50.045284: I tensorflow_serving/model_servers/server_core.cc:573] (Re-)adding model: model

2020-05-18 08:13:50.145608: I tensorflow_serving/util/retrier.cc:46] Retrying of Reserving resources for servable: {name: model version: 1} exhausted max_num_retries: 0

2020-05-18 08:13:50.145645: I tensorflow_serving/core/basic_manager.cc:739] Successfully reserved resources to load servable {name: model version: 1}

2020-05-18 08:13:50.145657: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: model version: 1}

2020-05-18 08:13:50.145670: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: model version: 1}

2020-05-18 08:13:50.145704: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /opt/ml/model/1

2020-05-18 08:13:50.154042: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }

2020-05-18 08:13:50.154072: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:264] Reading SavedModel debug info (if present) from: /opt/ml/model/1

2020-05-18 08:13:50.155009: I external/org_tensorflow/tensorflow/core/common_runtime/process_util.cc:147] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.

2020-05-18 08:13:50.194113: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:203] Restoring SavedModel bundle.

2020-05-18 08:13:50.252623: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:333] SavedModel load for tags { serve }; Status: success: OK. Took 106920 microseconds.

2020-05-18 08:13:50.254566: I tensorflow_serving/servables/tensorflow/saved_model_warmup.cc:105] No warmup data file found at /opt/ml/model/1/assets.extra/tf_serving_warmup_requests

2020-05-18 08:13:50.255778: I tensorflow_serving/util/retrier.cc:46] Retrying of Loading servable: {name: model version: 1} exhausted max_num_retries: 0

2020-05-18 08:13:50.255798: I tensorflow_serving/core/loader_harness.cc:87] Successfully loaded servable version {name: model version: 1}

2020-05-18 08:13:50.257993: I tensorflow_serving/model_servers/server.cc:362] Running gRPC ModelServer at 0.0.0.0:27000 ...

[warn] getaddrinfo: address family for nodename not supported

2020-05-18 08:13:50.259143: I tensorflow_serving/model_servers/server.cc:382] Exporting HTTP/REST API at:localhost:27001 ...

[evhttp_server.cc : 238] NET_LOG: Entering the event loop ...

0 replies

Sbrikky · 2020-05-18T09:15:57Z

Sbrikky
May 18, 2020

For me this ended up being an issue with the shape of the input. I was uploading an individual sample, but the endpoint expects a batch, so I needed to make my input one layer deeper (As described here). Could this be happening for you as well, @Xixiong-Guo?

0 replies

Xixiong-Guo · 2020-05-18T13:33:55Z

Xixiong-Guo
May 18, 2020
Author

Hi @Sbrikky , did you encounter the same 502 issue?
I have no idea if it is caused by shape of the input. I can make prediction outside of Sagemaker in Keras when a list representing an embedded sentence is considered as input. But failed inside Sagemaker.

0 replies

Sbrikky · 2020-05-18T14:40:28Z

Sbrikky
May 18, 2020

@Xixiong-Guo Yes, I had the exact same error in my notebook as you posted so I didn't bother posting it again.
If you look at the first line of my cloudwatch log you see it says:

2020-05-18 08:13:49.621487: F external/org_tensorflow/tensorflow/core/util/tensor_format.h:426] Check failed: index >= 0 && index < num_total_dims Invalid index from the dimension: 3, 0, C

This suggested that maybe there was something in the shape of the request. Why on earth it ends up throwing this as a 502, I have no clue.
For me I had to put my input into another list. So instead of an array with all my pixels values, I had a list of 1 element, and that 1 element was my array with all my pixel values. So instead predict(input) I had to do predict([input.tolist()])

0 replies

Xixiong-Guo · 2020-05-18T21:24:54Z

Xixiong-Guo
May 18, 2020
Author

Hi @Sbrikky I got it. In your case, is there any difference in terms of the error info, when you tried both predict(input) and predict([input.tolist()])?

0 replies

Sbrikky · 2020-05-19T07:07:59Z

Sbrikky
May 19, 2020

When I use predict([input.tolist()]) it works and I get a prediction back. No 502.

0 replies

chuyang-deng · 2020-05-19T17:21:50Z

chuyang-deng
May 19, 2020

Hi @Xixiong-Guo , looks like you are using csv_seralizer and note here (https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/predictor.py#L325) that the serializer will try to serialize your input row by row delimited by "," if you are using a python list: https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/predictor.py#L363

0 replies

MohGhaziAlZeyadi · 2020-06-12T17:32:03Z

MohGhaziAlZeyadi
Jun 12, 2020

Hi all,
I am having the same problem with error 502 as below:
ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (502) from model with message "

<title>502 Bad Gateway</title>

502 Bad Gateway

nginx/1.16.1

0 replies

alsulke · 2020-07-07T12:10:20Z

alsulke
Jul 7, 2020

For me this ended up being an issue with the directory structure of the saved model.
As per the latest, Sagemaker expects model to be extracted directly under model_name/version/..

So, Your tar structure should should be :
model.tar.gz\export\1\saved_model.pb
model.tar.gz\export\1\variables\

0 replies

mdthabrez · 2023-06-08T15:43:16Z

mdthabrez
Jun 8, 2023

Hi guys ,
ok i too am experiencing the same problem , what was the solution to this 502 bad gateway ??

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

502 bad gateway? #2384

{{title}}

Replies: 17 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

502 bad gateway? #2384

502 Bad Gateway

502 Bad Gateway

Replies: 17 comments

Xixiong-Guo May 11, 2020 Author

Xixiong-Guo May 15, 2020 Author

Xixiong-Guo May 15, 2020 Author

Xixiong-Guo May 16, 2020 Author

502 Bad Gateway

Xixiong-Guo May 18, 2020 Author

Xixiong-Guo May 18, 2020 Author

502 Bad Gateway

Xixiong-Guo
May 11, 2020
Author

Xixiong-Guo
May 15, 2020
Author

Xixiong-Guo
May 15, 2020
Author

Xixiong-Guo
May 16, 2020
Author

Xixiong-Guo
May 18, 2020
Author

Xixiong-Guo
May 18, 2020
Author