MoS-Tensorflow

Tensorflow implementation of the mixture of softmaxes algorithm described in the paper Breaking the Softmax Bottleneck: A High-Rank RNN Language Model (Yang et al., 2017).
See https://github.com/zihangdai/mos for an implementation using PyTorch.

Why does mixture of softmaxes matter?

In natural language processing, the extent to which the true probability distribution of appropriate responses can be approximated overall by the network depends on the ability to express probabilties.
The problem with using the softmax function is that, when applied to the logits or raw outputs of a neural network, a substantial amount of information is lost.
This loss of information, signified by the low-rank of a resultant matrix one constructs from the logits, encourages the network to fit generic responses to each input.
Ideally, the rank of the matrix should be high, which entails more expressiveness and allows the network to use more information in its generation of responses and its analysis. Thus, this is what the mixture of softmaxes network accomplishes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

MoS-Tensorflow

Why does mixture of softmaxes matter?

Code incomplete and heavily under construction.

Files

README.md

Latest commit

History

README.md

File metadata and controls

MoS-Tensorflow

Why does mixture of softmaxes matter?

Code incomplete and heavily under construction.