Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

torch.cuda.OutOfMemoryError: CUDA out of memory. Why isn't it handle by the queue system ? #2417

Open
1 of 4 tasks
JustAnotherVeryNormalDeveloper opened this issue Aug 14, 2024 · 0 comments

Comments

@JustAnotherVeryNormalDeveloper
Copy link

JustAnotherVeryNormalDeveloper commented Aug 14, 2024

System Info

text-generation-inference v2.2.0

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Exception ignored in: <function Server.del at 0xXXXXXXXXX>
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/grpc/aio/_server.py", line 194, in del
cygrpc.schedule_coro_threadsafe(
File "src/python/grpcio/grpc/_cython/_cygrpc/aio/common.pyx.pxi", line 120, in grpc._cython.cygrpc.schedule_coro_threadsafe
File "src/python/grpcio/grpc/_cython/_cygrpc/aio/common.pyx.pxi", line 112, in grpc._cython.cygrpc.schedule_coro_threadsafe
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 436, in create_task
self._check_closed()
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 515, in _check_closed
raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed
sys:1: RuntimeWarning: coroutine 'AioServer.shutdown' was never awaited
Task exception was never retrieved
Error: ShardFailed
future: <Task finished name='HandleExceptions[/generate.v2.TextGenerationService/Prefill]' coro=<()> exception=SystemExit(1)>
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/interceptor.py", line 21, in intercept
return await response
File "/opt/conda/lib/python3.10/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 82, in _unary_interceptor
raise error
File "/opt/conda/lib/python3.10/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 73, in _unary_interceptor
return await behavior(request_or_iterator, context)
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 142, in Prefill
generations, next_batch, timings = self.model.generate_token(batch)
File "/opt/conda/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_causal_lm.py", line 1141, in generate_token
prefill_logprobs_tensor = torch.log_softmax(out, -1)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 7.50 GiB. GPU has a total capacity of 79.15 GiB of which 6.94 GiB is free. Process 63385 has 72.21 GiB memory in use. Of the allocated memory 69.81 GiB is allocated by PyTorch, and 309.29 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (
https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Expected behavior

Actually, the service reboot so all the requests on the queue and the ones running goes:
openai.InternalServerError: upstream
connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: delayed connect error: 111

To avoid this, it would be really good to check the future memory in place before adding too big info. It could be added as a critera to decide if it needs to go on the queue or not before treating request that we know will blow up the system.

Actually, I'm doing a manual pre check on the length of all the payloads of all the requests I do to avoid this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant