Releases · tatp22/linformer-pytorch

27 Jun 17:44

tatp22

0.9.0

bfb9f27

Full attention option

Added an option to the linformer to compare it with full attention. Watch out, this takes O(n^2) time and space complexity now, where n is the sequence length

Assets 4

23 Jun 16:12

tatp22

0.8.2

1d15c59

Added option to save visualization

Added the option to save the visualization to a file

Assets 4

22 Jun 22:35

tatp22

0.8.0

0533739

Added Visualizer, fixed bug

Added the visualizer class, which lets you see all of the attention heads.

Also fixed a bug where calculated the E and F matrices. They were calculated to be (n,d), but instead, they should have been (n,k). This has since been fixed.

Assets 4

21 Jun 14:52

tatp22

0.7.0

38f75d5

0.7.0

As well as updating the README, I updated the default behavior of the calculation of the inner head dimension. Now, instead of the default value having to be given, it works just like in the "attention is all you need" paper, where it takes however many channels there are, and divides the channels by the number of heads, and then that dimension goes into each of the attention heads.

Assets 4

20 Jun 16:14

tatp22

0.6.0

66dadd4

Added activation to MHAttention

Added both the RELU and GELU activation function options to the multihead attention block

Assets 4

17 Jun 21:53

tatp22

0.5.0

ce5b5d0

Can decrease k by layer

Added the flag where one is able to reduce the value of dim_k by layer, with the k_reduce_by_layer flag. This was alluded to in Figure 1 of the paper, where the normalized cumulative eigenvalue index went up by layer, meaning that we can potentially get away with lower dimensions at higher depths.

Assets 4

17 Jun 21:45

tatp22

0.4.0

14c2359

Added weight sharing options and pos enc

Added the none, headwise, kv, and layerwise parameter sharing options. Also, added positional encodings

Assets 4

17 Jun 21:44

tatp22

0.3.1

6c080a5

E, F matrix calculation changed

The way that the E and F matrices were calculated were changed. Before, they were an identity matrix, but with this release, they were changed to the way that the paper's authors recommended: As linear layers, with xavier init.

Assets 4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: tatp22/linformer-pytorch

Full attention option

Added option to save visualization

Added Visualizer, fixed bug

0.7.0

Added activation to MHAttention

Can decrease k by layer

Added weight sharing options and pos enc

E, F matrix calculation changed