Dynamic batching for PyTorch inference (not batch transform!) #2462

johann-petrak · 2021-06-17T08:52:06Z

johann-petrak
Jun 17, 2021

Does this work, if yes how does it work and how can I configure it?

I think deep down in the nested design of the inference image is something where the inferencer takes an array of inputs and where it is possible to configure a batch size and a maximum timeout. When the inference endpoint gets more than batchsize requests within the timeout, then all those requests get grouped into a single call to the inferences so that the whole batch can be sent through the inferencer.

However, I cannot see how this works with the PyTorch inferencer? Could you shed some light on this?

johann-petrak · 2021-06-19T16:06:38Z

johann-petrak
Jun 19, 2021
Author

Does anybody know if there is a way to get dynamic batches work with Pytorch inference and what the conventions are there so that AWS automatically groups multiple requests within a certain time into one request with a list of request data?

0 replies

johann-petrak · 2021-06-20T21:05:35Z

johann-petrak
Jun 20, 2021
Author

For Pytorch inference the AWS container basically relies 100% on torch serve where this was not possible so far using a config file.
See pytorch/serve#1132

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic batching for PyTorch inference (not batch transform!) #2462

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Dynamic batching for PyTorch inference (not batch transform!) #2462

johann-petrak Jun 17, 2021

Replies: 2 comments

johann-petrak Jun 19, 2021 Author

johann-petrak Jun 20, 2021 Author

johann-petrak
Jun 17, 2021

johann-petrak
Jun 19, 2021
Author

johann-petrak
Jun 20, 2021
Author