New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Multi-LORA feature question-2 #2506

Open

imran3180 opened this issue Sep 9, 2024 · 0 comments

imran3180 commented Sep 9, 2024

Hey team, I'm using the multi-lora adapter deployment feature from the latest code. I've couple of questions regarding the feature.

My questions are:

What is maximum limit on the number of local adapters that we can deploy on instance? Is there any limit put by TGI engine or will it depends on the instance capacity?
How these adapters are stored? Are they getting stored in the disk or GPU memory?
What will happen if the adapter_id requested is not present in the current GPU memory? will it load from the disk?
Could we specify the number of adapters that can live in GPU memory explicitly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment