Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Diffusers] Regression of CPU memory usage #738

Closed
3 tasks done
JingyaHuang opened this issue Nov 18, 2024 · 6 comments
Closed
3 tasks done

[Diffusers] Regression of CPU memory usage #738

JingyaHuang opened this issue Nov 18, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@JingyaHuang
Copy link
Collaborator

JingyaHuang commented Nov 18, 2024

Issue

We were able to run SDXL artifacts compiled with inf2.8xlarge on inf2.xLarge (as stated in the blog). However, we recently found that SDXL's CPU memory usage increased, leading to OOM during the inference on inf2.xlarge. In this issue, we will note down some experiment results to trace where the regression was introduced.

Tasks

  • Latest Optimum Neuron (0.0.26) on Neuron SDK 2.15.0
  • Other Neuron SDK versions
  • Pytorch version 1.13.1 v.s 2.1?

Reproduction (minimal, reproducible, runnable)

optimum-cli export neuron --model stabilityai/stable-diffusion-xl-base-1.0  --batch_size 1 --height 1024 --width 1024 --num_images_per_prompt 4 --auto_cast matmul --auto_cast_type bf16 sd_neuron_xl/

Expected behavior

Being able to fit into inf2.xlarge after being compiled with inf2.8xlarge.

@JingyaHuang JingyaHuang added the bug Something isn't working label Nov 18, 2024
@JingyaHuang
Copy link
Collaborator Author

Experiment 1: compiled with Neuron sdk 2.15.0 + Optimum Neuron 0.0.26:

Loading only U-Net into both Neuron Cores...
You have disabled the safety checker for <class 'optimum.neuron.modeling_diffusion.NeuronStableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
  0%|                                                                                                       | 0/50 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/ubuntu/optimum-neuron/test_sdxl.py", line 5, in <module>
    image = stable_diffusion(prompt).images[0]
  File "/home/ubuntu/optimum-neuron/optimum/neuron/modeling_diffusion.py", line 1108, in __call__
    return self.auto_model_class.__call__(self, height=height, width=width, *args, **kwargs)
  File "/home/ubuntu/aws_neuron_venv_2.15.0/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/aws_neuron_venv_2.15.0/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 1020, in __call__
    noise_pred = self.unet(
  File "/home/ubuntu/optimum-neuron/optimum/neuron/modeling_diffusion.py", line 1137, in __call__
    return self.forward(*args, **kwargs)
  File "/home/ubuntu/optimum-neuron/optimum/neuron/modeling_diffusion.py", line 1228, in forward
    outputs = self.model(*inputs)
  File "/home/ubuntu/aws_neuron_venv_2.15.0/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ubuntu/aws_neuron_venv_2.15.0/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/aws_neuron_venv_2.15.0/lib/python3.10/site-packages/torch_neuronx/xla_impl/data_parallel.py", line 254, in forward
    outputs = parallel_apply(
  File "/home/ubuntu/aws_neuron_venv_2.15.0/lib/python3.10/site-packages/torch_neuronx/xla_impl/data_parallel.py", line 404, in parallel_apply
    output.reraise()
  File "/home/ubuntu/aws_neuron_venv_2.15.0/lib/python3.10/site-packages/torch/_utils.py", line 694, in reraise
    raise exception
RuntimeError: Caught RuntimeError on neuroncore 0.
Original Traceback (most recent call last):
  File "/home/ubuntu/aws_neuron_venv_2.15.0/lib/python3.10/site-packages/torch_neuronx/xla_impl/data_parallel.py", line 390, in _worker
    output = module(*input)
  File "/home/ubuntu/aws_neuron_venv_2.15.0/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ubuntu/aws_neuron_venv_2.15.0/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
RuntimeError: forward() is missing value for argument 'argument_4'. Declaration: forward(__torch__.torch_neuronx.xla_impl.trace.___torch_mangle_7.NeuronModule self, Tensor argument_1, Tensor argument_2, Tensor argument_3, Tensor argument_4, Tensor argument_5) -> ((Tensor))

Segmentation fault (core dumped)

@JingyaHuang
Copy link
Collaborator Author

JingyaHuang commented Nov 20, 2024

Experiment 2: Neuron sdk 2.16.1 + Optimum Neuron 0.0.18

model_index.json: 100%|███████████████████████████████████████████████████████████████████████████| 779/779 [00:00<00:00, 8.42MB/s]
tokenizer/special_tokens_map.json: 100%|██████████████████████████████████████████████████████████| 472/472 [00:00<00:00, 6.37MB/s]
text_encoder_2/config.json: 100%|█████████████████████████████████████████████████████████████| 1.42k/1.42k [00:00<00:00, 19.9MB/s]
tokenizer/tokenizer_config.json: 100%|████████████████████████████████████████████████████████████| 704/704 [00:00<00:00, 10.2MB/s]
tokenizer/merges.txt: 100%|█████████████████████████████████████████████████████████████████████| 525k/525k [00:00<00:00, 28.2MB/s]
tokenizer_2/special_tokens_map.json: 100%|████████████████████████████████████████████████████████| 460/460 [00:00<00:00, 6.92MB/s]
text_encoder/config.json: 100%|███████████████████████████████████████████████████████████████| 1.41k/1.41k [00:00<00:00, 20.1MB/s]
scheduler/scheduler_config.json: 100%|████████████████████████████████████████████████████████████| 582/582 [00:00<00:00, 9.11MB/s]
tokenizer_2/tokenizer_config.json: 100%|██████████████████████████████████████████████████████████| 855/855 [00:00<00:00, 13.5MB/s]
unet/config.json: 100%|███████████████████████████████████████████████████████████████████████| 2.82k/2.82k [00:00<00:00, 33.1MB/s]
vae_decoder/config.json: 100%|████████████████████████████████████████████████████████████████| 1.43k/1.43k [00:00<00:00, 21.5MB/s]
vae_encoder/config.json: 100%|████████████████████████████████████████████████████████████████| 1.44k/1.44k [00:00<00:00, 21.8MB/s]
tokenizer/vocab.json: 100%|███████████████████████████████████████████████████████████████████| 1.06M/1.06M [00:00<00:00, 9.99MB/s]
model.neuron: 100%|█████████████████████████████████████████████████████████████████████████████| 426M/426M [00:56<00:00, 7.54MB/s]
model.neuron: 100%|█████████████████████████████████████████████████████████████████████████████| 376M/376M [00:59<00:00, 6.28MB/s]
model.neuron: 100%|█████████████████████████████████████████████████████████████████████████████| 825M/825M [02:04<00:00, 6.64MB/s]
model.neuron: 100%|███████████████████████████████████████████████████████████████████████████| 1.79G/1.79G [02:52<00:00, 10.4MB/s]
model.neuron: 100%|███████████████████████████████████████████████████████████████████████████| 4.18G/4.18G [06:41<00:00, 10.4MB/s]
Fetching 20 files: 100%|███████████████████████████████████████████████████████████████████████████| 20/20 [06:41<00:00, 20.07s/it]
Passing the argument `library_name` to `get_supported_tasks_for_model_type` is required, but got library_name=None. Defaulting to `transformers`. An error will be raised in a future version of Optimum if `library_name` is not provided.18G [02:52<03:48, 10.4MB/s]
Passing the argument `library_name` to `get_supported_tasks_for_model_type` is required, but got library_name=None. Defaulting to `transformers`. An error will be raised in a future version of Optimum if `library_name` is not provided.
Passing the argument `library_name` to `get_supported_tasks_for_model_type` is required, but got library_name=None. Defaulting to `transformers`. An error will be raised in a future version of Optimum if `library_name` is not provided.
Passing the argument `library_name` to `get_supported_tasks_for_model_type` is required, but got library_name=None. Defaulting to `transformers`. An error will be raised in a future version of Optimum if `library_name` is not provided.
Passing the argument `library_name` to `get_supported_tasks_for_model_type` is required, but got library_name=None. Defaulting to `transformers`. An error will be raised in a future version of Optimum if `library_name` is not provided.
Loading only U-Net into both Neuron Cores...
Killed

CPU OOM... Next step, neuron SDK 2.16.1 with Optimum Neuron 0.0.13

@JingyaHuang
Copy link
Collaborator Author

Experiment 2: Neuron sdk 2.16.1 + Optimum Neuron 0.0.13

@JingyaHuang
Copy link
Collaborator Author

JingyaHuang commented Nov 22, 2024

Optimum Neuron v0.0.14 + Neuron SDK 2.16.1 ✔️

https://huggingface.co/Jingya/sd_neuron_xl_2.16.1_0.0.14

@JingyaHuang
Copy link
Collaborator Author

Optimum Neuron v0.0.15 + Neuron SDK 2.16.1 👎

Jingya/sd_neuron_xl_2.16.1_0.0.15

Loading only U-Net into both Neuron Cores...
Killed

The regression shall has been introduced between v0.0.14 -> v0.0.15

@JingyaHuang
Copy link
Collaborator Author

The regression comes from the change of order in loading submodels, we need to load the unet first otherwise we get CPU OOM when loading the UNet if other models(VAE, text encoders) are already loaded.

This regression was already corrected during the diffusion pipeline refactoring: #711 and included since Optimum Neuron v0.0.26 release. Besides, a PR is opened to enhance the loading order: #742.

Close this issue as solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant