-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The speed problem #3
Comments
Note that the code is for inference acceleration by taking some training efforts. After we learn which ISS components can be removed, we just need to throw those zeros away and initialize a small LSTM with learned nonzero weights for inference. It makes no sense to keep zeros, and this is the advantage of the method. |
Could you please tell me how to throw those zeros? Thanks a lot! |
When one ISS component is all zeroes, it means the hidden size of lstm is reduced by one. you just need to create a new lstm with a smaller size and initialize the weights by those nonzeroes. |
Hello wenwei: |
sort of |
I am quite confused because after I read your paper, I thought the speedup is for the training phase, and you just use the trained model itself to do inference. But from your response to this issue, you mean you take the non-zero part of weight and use them as a pre-trained model to initialize a smaller model? And your code doesn't show how to save and load this pre-trained model, right? |
Here is the command I used to train
structure_grouplasso
model andfrom_scratch
model.The speed of
from_scratch
model is 392 wps, but the speed ofstructure_grouplasso
model is 2 wps.And I also have a question about the paper: Why the
ISS
method has a similar speed with thedirect design
method? TheISS
method will create a sparse large model where the zero weights also consumes CPU.Thanks a lot!
The text was updated successfully, but these errors were encountered: