Dynamically serve LoRA modules #2860

rikardradovac · 2024-12-20T09:35:00Z

Feature request

Do you plan on integrating dynamic serving of LoRA modules, so that new modules can be added / removed during runtime instead of having to restart the engine and add the new modules to the LORA_ADAPTERS env variable?

Motivation

I am training multiple LoRA modules and want to serve them ASAP through my inference endpoint, without the need for manual restarting and adding the new modules there. An example of it would be to send a request to some load_lora endpoint with an url/path to the new module to add.

Your contribution

Could open up a PR

drbh · 2025-01-13T18:41:51Z

Hi @rikardradovac thank you for opening this issue, currently we are not planning to support dynamic lora loading in TGI. This is because we load all of the weights into memory at startup to ensure optimal performance.

It's possible to load many loras at startup, but TGI does not provide a way to add/remove these after startup. Might I recommend checking out Predibase's Lorax inference server https://github.com/predibase/lorax, I believe they support dynamic lora adapters and are build on top of TGI foundations.

I hope this is helpful! Thank you

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamically serve LoRA modules #2860

Dynamically serve LoRA modules #2860

rikardradovac commented Dec 20, 2024

drbh commented Jan 13, 2025

Dynamically serve LoRA modules #2860

Dynamically serve LoRA modules #2860

Comments

rikardradovac commented Dec 20, 2024

Feature request

Motivation

Your contribution

drbh commented Jan 13, 2025