-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concurrency in stable-diffusion image generation #1475
base: master
Are you sure you want to change the base?
Conversation
@@ -29,6 +29,7 @@ class OPENVINO_GENAI_EXPORTS AutoencoderKL { | |||
std::vector<size_t> block_out_channels = { 64 }; | |||
|
|||
explicit Config(const std::filesystem::path& config_path); | |||
Config() = default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is it required?
I think you can initialize m_config
in copy constructor via constructor initializer list
AutoencoderKL::AutoencoderKL(const AutoencoderKL& original_model){ | ||
encoder_compiled_model = original_model.encoder_compiled_model; | ||
decoder_compiled_model = original_model.decoder_compiled_model; | ||
m_decoder_request = original_model.decoder_compiled_model->create_infer_request(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if model is not compiled yet?
} | ||
m_encoder_model = original_model.m_encoder_model; | ||
m_decoder_model = original_model.m_decoder_model; | ||
m_config = original_model.m_config; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it does not look safe that copy constructor performs infer request creation. We have code like:
StableDiffusionPipeline(
PipelineType pipeline_type,
const CLIPTextModel& clip_text_model,
const UNet2DConditionModel& unet,
const AutoencoderKL& vae)
: StableDiffusionPipeline(pipeline_type) {
m_clip_text_encoder = std::make_shared<CLIPTextModel>(clip_text_model); // LEADS TO RE_CREATION OF REQUEST
m_unet = std::make_shared<UNet2DConditionModel>(unet); // LEADS TO RE_CREATION OF REQUEST
m_vae = std::make_shared<AutoencoderKL>(vae); // LEADS TO RE_CREATION OF REQUEST
const bool is_lcm = m_unet->get_config().time_cond_proj_dim > 0;
const char * const pipeline_name = is_lcm ? "LatentConsistencyModelPipeline" : "StableDiffusionPipeline";
initialize_generation_config(pipeline_name);
}
which means inference request will be re-created, while we don't have such goal.
ov::CompiledModel encoder_compiled_model = core.compile_model(m_encoder_model, device, properties); | ||
ov::genai::utils::print_compiled_model_properties(encoder_compiled_model, "Auto encoder KL encoder model"); | ||
m_encoder_request = encoder_compiled_model.create_infer_request(); | ||
encoder_compiled_model = std::make_shared<ov::CompiledModel>(core.compile_model(m_encoder_model, device, properties)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ov::CompiledModel
is shared model by its implementation. It's a wrapper around p_impl
void runPipeline(std::string prompt, std::filesystem::path root_dir, ov::genai::CLIPTextModel & text_encoder, ov::genai::UNet2DConditionModel & unet, ov::genai::AutoencoderKL & vae, std::promise<ov::Tensor> & Tensor_prm){ | ||
std::cout << "create pipeline" << prompt << std::endl; | ||
auto scheduler = ov::genai::Scheduler::from_config(root_dir / "scheduler/scheduler_config.json"); | ||
auto pipe2 = ov::genai::Text2ImagePipeline::stable_diffusion(scheduler, text_encoder, unet, vae); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the problem with such approach is that it will be hard to apply LoRA adapters here in generic case.
E.g. SD 1.5 has simple LoRA configuration, while FLUX or other more complex models, require code like this https://github.com/openvinotoolkit/openvino.genai/pull/1602/files
Alternative approach is to have API like
Text2ImagePipeline pipeline( .. ); // similar to compile model
Text2ImagePipeline::GenerationRequest request = pipeline.create_generation_request(); // holds inference request
request.update_generation_config(guidance_scale(5.0));
Tensor image = request.generate("cat", callback(my_callback));
Text2ImagePipeline::GenerationRequest request2 = pipeline.create_generation_request(); // holds inference request
request2.update_generation_config(guidance_scale(6.0));
Tensor image2 = request2.generate("cat", width(200), height(200));
In this case all complexity with LoRA is hidden inside and even clients can use the same API (e.g. generate different images with different LoRAs / alphas in parallel)
No description provided.