Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for lumina2 #10642

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Conversation

zhuole1025
Copy link
Contributor

What does this PR do?

This PR will add the official Lumina-Image 2 to the diffusers. Lumina-Image 2.0 is the latest model in the Lumina family and will be released very soon (https://huggingface.co/Alpha-VLLM/Lumina-Image-2.0). It is a 2B parameter Diffusion Transformer that significantly improves instruction-following and generates higher-quality, more diverse images. Our paper will be released soon, and we have finished the diffuser pipeline for Lumina-Image 2.0.

Core library:

cap_freqs_cis[i, :cap_len] = freqs_cis[i, :cap_len]
img_freqs_cis[i, :img_len] = freqs_cis[i, cap_len:cap_len+img_len]

for layer in self.context_refiner:
Copy link
Collaborator

@yiyixuxu yiyixuxu Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think context_refiner and noise_refiner should be moved inside foward for better readbility
does it make sense to wrap the code to create freqs_cis inside a class like this

class LTXVideoRotaryPosEmbed(nn.Module):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! I have moved refiner into forward() for better readbility. As for the freqs_cis, we already have a class Lumina2PosEmbed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants