Dynamic batching for PyTorch inference (not batch transform!) #2462
Unanswered
johann-petrak
asked this question in
Help
Replies: 2 comments
-
Does anybody know if there is a way to get dynamic batches work with Pytorch inference and what the conventions are there so that AWS automatically groups multiple requests within a certain time into one request with a list of request data? |
Beta Was this translation helpful? Give feedback.
0 replies
-
For Pytorch inference the AWS container basically relies 100% on torch serve where this was not possible so far using a config file. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Does this work, if yes how does it work and how can I configure it?
I think deep down in the nested design of the inference image is something where the inferencer takes an array of inputs and where it is possible to configure a batch size and a maximum timeout. When the inference endpoint gets more than batchsize requests within the timeout, then all those requests get grouped into a single call to the inferences so that the whole batch can be sent through the inferencer.
However, I cannot see how this works with the PyTorch inferencer? Could you shed some light on this?
Beta Was this translation helpful? Give feedback.
All reactions