You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, nice work! I am a bit confused about gumbel softmax. You mention in your paper that, during traininig, gumbel softmax is used. I wonder if it can be replaced by pure softmax (i.e. torch.softmax)? Could you please give more explanation on this design choice? Thx!
The text was updated successfully, but these errors were encountered:
Hi @btwbtm, thanks for your interest in our work. Softmax is also used in several network quantization or pruning methods to soften one-hot distributions. In my opinion, softmax may also works in our SMSR but I have not tried it. In our experiments, gumbel softmax is adopted since it is theorically identical to one-hot distribution while softmax is not.
Hi @btwbtm, thanks for your interest in our work. Softmax is also used in several network quantization or pruning methods to soften one-hot distributions. In my opinion, softmax may also works in our SMSR but I have not tried it. In our experiments, gumbel softmax is adopted since it is theorically identical to one-hot distribution while softmax is not.
I found the implement of gumbel softmax in your code is different from original paper("Categorical reparameterization with gumbel-softmax"), why do you modify this? which is better?
Hi, nice work! I am a bit confused about gumbel softmax. You mention in your paper that, during traininig, gumbel softmax is used. I wonder if it can be replaced by pure softmax (i.e. torch.softmax)? Could you please give more explanation on this design choice? Thx!
The text was updated successfully, but these errors were encountered: