Concurrency in stable-diffusion image generation #1475

dtrawins · 2025-01-03T15:01:55Z

No description provided.

ilya-lavrenov · 2025-01-22T13:46:44Z

src/cpp/include/openvino/genai/image_generation/autoencoder_kl.hpp

@@ -29,6 +29,7 @@ class OPENVINO_GENAI_EXPORTS AutoencoderKL {
        std::vector<size_t> block_out_channels = { 64 };

        explicit Config(const std::filesystem::path& config_path);
+        Config() = default;


why is it required?

I think you can initialize m_config in copy constructor via constructor initializer list

ilya-lavrenov · 2025-01-22T13:50:01Z

src/cpp/src/image_generation/models/autoencoder_kl.cpp

+AutoencoderKL::AutoencoderKL(const AutoencoderKL& original_model){
+    encoder_compiled_model = original_model.encoder_compiled_model;
+    decoder_compiled_model = original_model.decoder_compiled_model;
+    m_decoder_request = original_model.decoder_compiled_model->create_infer_request();


what if model is not compiled yet?

ilya-lavrenov · 2025-01-22T13:51:59Z

src/cpp/src/image_generation/models/autoencoder_kl.cpp

+    }
+    m_encoder_model = original_model.m_encoder_model;
+    m_decoder_model = original_model.m_decoder_model;
+    m_config = original_model.m_config;


it does not look safe that copy constructor performs infer request creation. We have code like:

StableDiffusionPipeline( PipelineType pipeline_type, const CLIPTextModel& clip_text_model, const UNet2DConditionModel& unet, const AutoencoderKL& vae) : StableDiffusionPipeline(pipeline_type) { m_clip_text_encoder = std::make_shared<CLIPTextModel>(clip_text_model); // LEADS TO RE_CREATION OF REQUEST m_unet = std::make_shared<UNet2DConditionModel>(unet); // LEADS TO RE_CREATION OF REQUEST m_vae = std::make_shared<AutoencoderKL>(vae); // LEADS TO RE_CREATION OF REQUEST const bool is_lcm = m_unet->get_config().time_cond_proj_dim > 0; const char * const pipeline_name = is_lcm ? "LatentConsistencyModelPipeline" : "StableDiffusionPipeline"; initialize_generation_config(pipeline_name); }

which means inference request will be re-created, while we don't have such goal.

ilya-lavrenov · 2025-01-22T13:52:35Z

src/cpp/src/image_generation/models/autoencoder_kl.cpp

-        ov::CompiledModel encoder_compiled_model = core.compile_model(m_encoder_model, device, properties);
-        ov::genai::utils::print_compiled_model_properties(encoder_compiled_model, "Auto encoder KL encoder model");
-        m_encoder_request = encoder_compiled_model.create_infer_request();
+        encoder_compiled_model = std::make_shared<ov::CompiledModel>(core.compile_model(m_encoder_model, device, properties));


ov::CompiledModel is shared model by its implementation. It's a wrapper around p_impl

ilya-lavrenov · 2025-01-22T14:03:48Z

samples/cpp/image_generation/text2image_concurrency.cpp

+void runPipeline(std::string prompt, std::filesystem::path root_dir, ov::genai::CLIPTextModel & text_encoder, ov::genai::UNet2DConditionModel & unet, ov::genai::AutoencoderKL & vae,  std::promise<ov::Tensor> & Tensor_prm){
+    std::cout << "create pipeline" << prompt << std::endl;
+    auto scheduler = ov::genai::Scheduler::from_config(root_dir / "scheduler/scheduler_config.json");
+    auto pipe2 = ov::genai::Text2ImagePipeline::stable_diffusion(scheduler, text_encoder, unet, vae);


the problem with such approach is that it will be hard to apply LoRA adapters here in generic case.
E.g. SD 1.5 has simple LoRA configuration, while FLUX or other more complex models, require code like this https://github.com/openvinotoolkit/openvino.genai/pull/1602/files

Alternative approach is to have API like

Text2ImagePipeline pipeline( .. ); // similar to compile model Text2ImagePipeline::GenerationRequest request = pipeline.create_generation_request(); // holds inference request request.update_generation_config(guidance_scale(5.0)); Tensor image = request.generate("cat", callback(my_callback)); Text2ImagePipeline::GenerationRequest request2 = pipeline.create_generation_request(); // holds inference request request2.update_generation_config(guidance_scale(6.0)); Tensor image2 = request2.generate("cat", width(200), height(200));

In this case all complexity with LoRA is hidden inside and even clients can use the same API (e.g. generate different images with different LoRAs / alphas in parallel)

dtrawins added 2 commits January 2, 2025 14:24

test version

62ec715

initial working stable-diffusion pipeline

9a1863b

github-actions bot added category: text to image Text 2 image pipeline category: cmake / build Cmake scripts category: samples GenAI samples category: GenAI C++ API Changes in GenAI C++ public headers no-match-files labels Jan 3, 2025

merge from upstream

903673c

dtrawins requested a review from ilya-lavrenov January 3, 2025 15:23

ilya-lavrenov self-assigned this Jan 6, 2025

ilya-lavrenov reviewed Jan 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concurrency in stable-diffusion image generation #1475

Concurrency in stable-diffusion image generation #1475

dtrawins commented Jan 3, 2025

ilya-lavrenov Jan 22, 2025

ilya-lavrenov Jan 22, 2025

ilya-lavrenov Jan 22, 2025

ilya-lavrenov Jan 22, 2025

ilya-lavrenov Jan 22, 2025

Concurrency in stable-diffusion image generation #1475

Are you sure you want to change the base?

Concurrency in stable-diffusion image generation #1475

Conversation

dtrawins commented Jan 3, 2025

ilya-lavrenov Jan 22, 2025

Choose a reason for hiding this comment

ilya-lavrenov Jan 22, 2025

Choose a reason for hiding this comment

ilya-lavrenov Jan 22, 2025

Choose a reason for hiding this comment

ilya-lavrenov Jan 22, 2025

Choose a reason for hiding this comment

ilya-lavrenov Jan 22, 2025

Choose a reason for hiding this comment