Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The speed problem #3

Open
zuowang opened this issue Nov 24, 2017 · 6 comments
Open

The speed problem #3

zuowang opened this issue Nov 24, 2017 · 6 comments

Comments

@zuowang
Copy link

zuowang commented Nov 24, 2017

Here is the command I used to train structure_grouplasso model and from_scratch model.
The speed of from_scratch model is 392 wps, but the speed of structure_grouplasso model is 2 wps.
And I also have a question about the paper: Why the ISS method has a similar speed with the direct design method? The ISS method will create a sparse large model where the zero weights also consumes CPU.

Thanks a lot!

python ptb_word_lm.py --model sparselarge --data_path simple-examples/data/ --config_file structure_grouplasso.json

python ptb_word_lm_heter.py --model large --data_path simple-examples/data/ --hidden_size1 373 --hidden_size2 315 --config_file from_scratch.json
python ptb_word_lm.py --model validtestlarge --data_path simple-examples/data/ --display_weights True --config_file structure_grouplasso.json --restore_path /tmp/2017-11-24___01-55-55

python ptb_word_lm_heter.py --model validtestlarge --data_path simple-examples/data/ --display_weights True --hidden_size1 373 --hidden_size2 315 --config_file from_scratch.json --restore_path /tmp/2017-11-23___10-33-44
@wenwei202
Copy link
Owner

Note that the code is for inference acceleration by taking some training efforts.

After we learn which ISS components can be removed, we just need to throw those zeros away and initialize a small LSTM with learned nonzero weights for inference. It makes no sense to keep zeros, and this is the advantage of the method.

@zuowang
Copy link
Author

zuowang commented Nov 25, 2017

Could you please tell me how to throw those zeros? Thanks a lot!

@wenwei202
Copy link
Owner

When one ISS component is all zeroes, it means the hidden size of lstm is reduced by one. you just need to create a new lstm with a smaller size and initialize the weights by those nonzeroes.

@RyanTang1
Copy link

Hello wenwei:
So this means when we finish training the first time. We'll have to look at the parameters manually to see whether the ISS component is all zeros. Then we'll have to make a new lstm with smaller size based on what we observed. Is this correct?

@wenwei202
Copy link
Owner

sort of

@ShangwuYao
Copy link

ShangwuYao commented Dec 19, 2017

I am quite confused because after I read your paper, I thought the speedup is for the training phase, and you just use the trained model itself to do inference. But from your response to this issue, you mean you take the non-zero part of weight and use them as a pre-trained model to initialize a smaller model? And your code doesn't show how to save and load this pre-trained model, right?
And how to deal with this newly initialized model? Did you fine-tuned it? If so, with what parameter?
Thanks a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants