Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-LORA feature question-2 #2506

Open
imran3180 opened this issue Sep 9, 2024 · 0 comments
Open

Multi-LORA feature question-2 #2506

imran3180 opened this issue Sep 9, 2024 · 0 comments

Comments

@imran3180
Copy link

Hey team, I'm using the multi-lora adapter deployment feature from the latest code. I've couple of questions regarding the feature.

My questions are:

  1. What is maximum limit on the number of local adapters that we can deploy on instance? Is there any limit put by TGI engine or will it depends on the instance capacity?
  2. How these adapters are stored? Are they getting stored in the disk or GPU memory?
  3. What will happen if the adapter_id requested is not present in the current GPU memory? will it load from the disk?
  4. Could we specify the number of adapters that can live in GPU memory explicitly?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant