Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider allowing hidden state initialisation via ssm_state input parameter for selective_scan_fn #258

Open
govorunov opened this issue Mar 20, 2024 · 6 comments

Comments

@govorunov
Copy link

Please, please, consider adding the ssm_state input parameter for selective_scan_fn to allow hidden state initialisation for the Mamba block.
Also please consider making hidden state differentiable as currently at selective_scan_fn we have:

Note that the gradient of the last state is not considered in the backward pass.

This change should potentially open the path for encoder-decoder Mamba architecture and for the encoder-only BERT-like architecture.
The architecture analogous to RNNs would be - Mamba encoder goes through the input sequence ignoring output, the last hidden state then used to initialize the decoder with input token and the decoder unrolls the state recursively.
For the encoder to work last hidden state has to be differentiable. This also should open a route to encoder-only BERT architecture, classification/embedding problems, etc.
For the decoder to work the Mamba block needs to be able to accept a hidden state at initialisation.

Related issues: #233 , #101

PS: Excellent work! Very impressive (especially the CUDA part)!

@LechengKong
Copy link

Upvoting this issue and agreeing to all points that @govorunov mentioned. Being able to manipulate/add learning modules on the differentiable hidden states opens many new possible ways to use Mamba.

@Xudangliatiger
Copy link

+1

2 similar comments
@brightonm
Copy link

+1

@XiangPiIi
Copy link

+1

@peterukk
Copy link

Yes please, this is pretty much a must have for my application too in weather and climate

@retepViolet
Copy link

Hi, here is my implementation based on mambapy and huggingface.

InitMamba

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants