-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Diffusers] Regression of CPU memory usage #738
Comments
Experiment 1: compiled with Neuron sdk 2.15.0 + Optimum Neuron 0.0.26:
|
Experiment 2: Neuron sdk 2.16.1 + Optimum Neuron 0.0.18
CPU OOM... Next step, neuron SDK 2.16.1 with Optimum Neuron 0.0.13 |
Experiment 2: Neuron sdk 2.16.1 + Optimum Neuron 0.0.13
|
Optimum Neuron v0.0.14 + Neuron SDK 2.16.1 ✔️ |
Optimum Neuron v0.0.15 + Neuron SDK 2.16.1 👎 Jingya/sd_neuron_xl_2.16.1_0.0.15
The regression shall has been introduced between v0.0.14 -> v0.0.15 |
The regression comes from the change of order in loading submodels, we need to load the unet first otherwise we get CPU OOM when loading the UNet if other models(VAE, text encoders) are already loaded. This regression was already corrected during the diffusion pipeline refactoring: #711 and included since Optimum Neuron v0.0.26 release. Besides, a PR is opened to enhance the loading order: #742. Close this issue as solved. |
Issue
We were able to run SDXL artifacts compiled with inf2.8xlarge on inf2.xLarge (as stated in the blog). However, we recently found that SDXL's CPU memory usage increased, leading to OOM during the inference on inf2.xlarge. In this issue, we will note down some experiment results to trace where the regression was introduced.
Tasks
Reproduction (minimal, reproducible, runnable)
optimum-cli export neuron --model stabilityai/stable-diffusion-xl-base-1.0 --batch_size 1 --height 1024 --width 1024 --num_images_per_prompt 4 --auto_cast matmul --auto_cast_type bf16 sd_neuron_xl/
Expected behavior
Being able to fit into inf2.xlarge after being compiled with inf2.8xlarge.
The text was updated successfully, but these errors were encountered: