Releases: TransformerLensOrg/TransformerLens
v2.5.0
Nice little release! This release adds a new parameter named first_n_layers
that will allow you to specify how many layers of a model you want to load.
What's Changed
- Fix typo in bug issue template by @JasonGross in #715
- HookedTransformerConfig docs string:
weight_init_mode
=>init_mode
by @JasonGross in #716 - Allow loading only first n layers. by @joelburget in #717
Full Changelog: v2.4.1...v2.5.0
v2.4.1
Little update to the code usage, but huge update for memory consumption! TransformerLens now needs almost half the memory it needed previously to boot thanks to a change with how the TransformerLens models are loaded.
What's Changed
- removed einsum causing error when use_atten_result is enabled by @oliveradk in #660
- revised loading to recycle state dict by @bryce13950 in #706
New Contributors
- @oliveradk made their first contribution in #660
Full Changelog: v2.4.0...v2.4.1
v2.4.0
Nice little update! This gives users a little bit more control over attention masks, as well as adds a new demo.
What's Changed
- Improve attention masking by @UFO-101 in #699
- add a demo for Patchscopes and Generation with Patching by @HenryCai11 in #692
New Contributors
- @HenryCai11 made their first contribution in #692
Full Changelog: v2.3.1...v2.4.0
v2.3.1
Nice little bug fix!
What's Changed
- Update Gemma2 attention scale by @mntss in #694
- Release v2.3.1 by @bryce13950 in #701
New Contributors
Full Changelog: v2.3.0...v2.3.1
v2.3.0
New models! This release adds support for Gemma 2 2B as well as Qwen2. This also removes official support for python 3.8. Python 3.8 should continue to work for a while, but there is a high risk that it will be unstable past this release. If you need python 3.8, try locking to this release or any previous release.
What's Changed
- Fix typo in
embed.py
docs by @ArthurConmy in #677 - Move the HookedSAE / HookedSAETransformer warning to a less prominent… by @ArthurConmy in #676
- NamesFilter can be a string by @jettjaniak in #679
- Adding RMSNorm to apply_ln_to_stack by @gaabrielfranco in #663
- added arena content as a notebook by @bryce13950 in #674
- Test arena cleanup by @bryce13950 in #681
- docs: update Main_Demo.ipynb by @eltociear in #658
- Add support for Qwen2 models by @g-w1 in #662
- Added gemma-2 2b by @curt-tigges in #687
- Python 3.8 removal by @bryce13950 in #690
- 2.3.0 by @bryce13950 in #688
New Contributors
- @gaabrielfranco made their first contribution in #663
- @eltociear made their first contribution in #658
- @g-w1 made their first contribution in #662
- @curt-tigges made their first contribution in #687
Full Changelog: v2.2.2...v2.3.0
v2.2.2
Quick little bug fix!
What's Changed
- Fix attention result projection by @callummcdougall in #666
- fix: fixing broken backward hooks change by @chanind in #673
Full Changelog: v2.2.1...v2.2.2
v2.2.1
Quick little bug fix in the Abstract Attention component shape.
What's Changed
- Fix attention result projection by @callummcdougall in #666
Full Changelog: v2.2.0...v2.2.1
v2.2.0
Here's an important one! This release adds Gemma-2, and it also greatly improves model accuracy across the board. It is highly recommended that everyone update to this version immediately to take advantage of these accuracy improvements.
What's Changed
- Fix typo in Main_Demo.ipynb by @ianand in #636
- Add comparing-to-huggingface.ipynb. by @joelburget in #637
- Add tests for gated mlp by @anthonyduong9 in #638
- Match Huggingface MLP implementation exactly. by @joelburget in #641
- Add tests for ActivationCache by @FlyingPumba in #643
- Moved mixtral weights to another module by @bryce13950 in #646
- Fixed weight conversion by @bryce13950 in #648
- Move out pretrained weight conversions by @richardkronick in #647
- Match Huggingface GPT2 implementation exactly by @joelburget in #645
- Fix Out bias not being summed in attention component when using 4 bit precision by @FlyingPumba in #654
- Mlp cleanup by @bryce13950 in #652
- Added support for Gemma-2 by @neelnanda-io in #650
- add tests for Attention by @anthonyduong9 in #639
- Release 2.2 by @bryce13950 in #656
New Contributors
- @ianand made their first contribution in #636
- @FlyingPumba made their first contribution in #643
Full Changelog: v2.1.0...v2.2.0
v2.1.0
New model support, and a handful of bug fixes to documentation!
What's Changed
- Encoder-Decoder (T5) support by @somvy in #605
- Update README links to ARENA mech interp tutorials by @gileshd in #630
- Lock datasets version by @courtney-sims in #632
New Contributors
- @somvy made their first contribution in #605
- @gileshd made their first contribution in #630
- @courtney-sims made their first contribution in #632
Full Changelog: v2.0.1...v2.1.0
v2.0.1
Minor little fix to demos. There were some bad Urls within the demos that have been fixed. Test coverage has also been increased with this release.
What's Changed
- Fix demos pip install packages from unfound repos by @anthonyduong9 in #625
- Unit tests loading from pretrained fill missing keys by @richardkronick in #623
New Contributors
- @richardkronick made their first contribution in #623
Full Changelog: v2.0.0...v2.0.1