The speed problem #3

zuowang · 2017-11-24T11:53:56Z

Here is the command I used to train structure_grouplasso model and from_scratch model.
The speed of from_scratch model is 392 wps, but the speed of structure_grouplasso model is 2 wps.
And I also have a question about the paper: Why the ISS method has a similar speed with the direct design method? The ISS method will create a sparse large model where the zero weights also consumes CPU.

Thanks a lot!

python ptb_word_lm.py --model sparselarge --data_path simple-examples/data/ --config_file structure_grouplasso.json

python ptb_word_lm_heter.py --model large --data_path simple-examples/data/ --hidden_size1 373 --hidden_size2 315 --config_file from_scratch.json

python ptb_word_lm.py --model validtestlarge --data_path simple-examples/data/ --display_weights True --config_file structure_grouplasso.json --restore_path /tmp/2017-11-24___01-55-55

python ptb_word_lm_heter.py --model validtestlarge --data_path simple-examples/data/ --display_weights True --hidden_size1 373 --hidden_size2 315 --config_file from_scratch.json --restore_path /tmp/2017-11-23___10-33-44

The text was updated successfully, but these errors were encountered:

wenwei202 · 2017-11-25T02:25:58Z

Note that the code is for inference acceleration by taking some training efforts.

After we learn which ISS components can be removed, we just need to throw those zeros away and initialize a small LSTM with learned nonzero weights for inference. It makes no sense to keep zeros, and this is the advantage of the method.

zuowang · 2017-11-25T03:25:49Z

Could you please tell me how to throw those zeros? Thanks a lot!

wenwei202 · 2017-11-27T22:08:55Z

When one ISS component is all zeroes, it means the hidden size of lstm is reduced by one. you just need to create a new lstm with a smaller size and initialize the weights by those nonzeroes.

RyanTang1 · 2017-11-28T03:58:58Z

Hello wenwei:
So this means when we finish training the first time. We'll have to look at the parameters manually to see whether the ISS component is all zeros. Then we'll have to make a new lstm with smaller size based on what we observed. Is this correct?

wenwei202 · 2017-11-29T01:22:38Z

sort of

ShangwuYao · 2017-12-19T05:51:25Z

I am quite confused because after I read your paper, I thought the speedup is for the training phase, and you just use the trained model itself to do inference. But from your response to this issue, you mean you take the non-zero part of weight and use them as a pre-trained model to initialize a smaller model? And your code doesn't show how to save and load this pre-trained model, right?
And how to deal with this newly initialized model? Did you fine-tuned it? If so, with what parameter?
Thanks a lot.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The speed problem #3

The speed problem #3

zuowang commented Nov 24, 2017

wenwei202 commented Nov 25, 2017

zuowang commented Nov 25, 2017

wenwei202 commented Nov 27, 2017

RyanTang1 commented Nov 28, 2017

wenwei202 commented Nov 29, 2017

ShangwuYao commented Dec 19, 2017 •

edited

Loading

The speed problem #3

The speed problem #3

Comments

zuowang commented Nov 24, 2017

wenwei202 commented Nov 25, 2017

zuowang commented Nov 25, 2017

wenwei202 commented Nov 27, 2017

RyanTang1 commented Nov 28, 2017

wenwei202 commented Nov 29, 2017

ShangwuYao commented Dec 19, 2017 • edited Loading

ShangwuYao commented Dec 19, 2017 •

edited

Loading