Releases: tatp22/linformer-pytorch
Latest working version
Added intermediate dim change
Added intermediate ff dimension
Now, the model dimension can be different in the intermediate layers.
This change applies to the ff module, and only in the encoder. Now, if
the flag ff_intermediate
is not None, the layers will look like this:
channels -> ff_dim -> ff_intermediate (For layer 1)
ff_intermediate -> ff_dim -> ff_intermediate (For layers 2 to depth-1)
ff_intermediate -> ff_dim -> channels (For layer depth)
As opposed to
channels -> ff_dim -> channels (For all layers)
Able to use convolutional nets instead of linear
Now, the linformer supports convolution as a way to downsample the input, instead of relying on linear layers. This may reduce the amount of parameters necessary.
Encoder Decoder finished, Causal attention
Finished an encoder and a decoder module. Also, causal attention works, when the causal=True
flag is set. Will update the README shortly...
Added Masking
Added masking to the Linformer. However, this is still a WIP, since masking cannot be done in the traditional sense, like what is done in the attention is all you need paper, because there is an overhead of adding another (n,n)
matrix, which is infeasable.
Started Encoder/Decoder work
The repo now supports an encoder and a decoder.
TODO: Masking
Bug fixed
LM model
Rebase, added option to plot MHAttention heads
Rebased the code so it looks better, and added the option to plot the
MHAttention module as well as the Linformer module
No weight matrices in `LinearAttentionHead`
Check out pull request #7 to see the changes