Sagemaker async inference endpoint concurrency #3139

g0lemXIV · 2022-05-26T09:16:12Z

g0lemXIV
May 26, 2022

I've deployed a custom model with an async endpoint. I want to process video files with it because videos can have ~5-10 minutes I can't load all frames to memory. Of course, I want to make an inference on each frame.
I've written
input_fn - download video file from s3 using boto and creates generator which loads video frames with a given batch size - return a generator - written with OpenCV
predict_fn - iterate over generator batched frames and generate prediction using model - save prediction in list
output_fn - transform prediction into json format, gzip all to reduce the size

Endpoint works well, but the problem is concurrency. The sagemaker endpoint processes request after request (from cloudwatch and s3 save file time). I don't know why this happens.
max_concurrent_invocations_per_instance is set to 1000. Other settings from PyTorch serving are as follows:

    TS_MAX_RESPONSE_SIZE: 100000000
    TS_DEFAULT_RESPONSE_TIMEOUT: 1000
    SAGEMAKER_TS_MAX_BATCH_DELAY: 10000
    SAGEMAKER_TS_BATCH_SIZE: 250

And still, it doesn't work. So how can I create an async inference endpoint with PyTorch to get concurrency?

mohitnihalani · 2022-06-17T14:27:52Z

mohitnihalani
Jun 17, 2022

Hey, how are you setting the TS_DEFAULT_RESPONSE_TIMEOUT config?

1 reply

g0lemXIV Jun 20, 2022
Author

Thank you for your response. However, I've found where I / AWS made a mistake. Sagemaker has a parameter called SAGEMAKER_MODEL_SERVER_WORKERS, which seems to be set to one by default. I changed it to n workers, and a model started to work asynchronously.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sagemaker async inference endpoint concurrency #3139

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Sagemaker async inference endpoint concurrency #3139

g0lemXIV May 26, 2022

Replies: 1 comment · 1 reply

mohitnihalani Jun 17, 2022

g0lemXIV Jun 20, 2022 Author

g0lemXIV
May 26, 2022

Replies: 1 comment 1 reply

mohitnihalani
Jun 17, 2022

g0lemXIV Jun 20, 2022
Author