Skip to content

Commit

Permalink
Version 1.0.14, update README & changelog
Browse files Browse the repository at this point in the history
  • Loading branch information
rwightman committed Jan 19, 2025
1 parent c6b74eb commit 5d535d7
Show file tree
Hide file tree
Showing 3 changed files with 151 additions and 49 deletions.
57 changes: 9 additions & 48 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,15 @@

## What's New

## Jan 19, 2025
* Fix loading of LeViT safetensor weights, remove conversion code which should have been deactivated
* Add 'SO150M' ViT weights trained with SBB recipes, decent results, but not optimal shape for ImageNet-12k/1k pretrain/ft
* `vit_so150m_patch16_reg4_gap_256.sbb_e250_in12k_ft_in1k` - 86.7% top-1
* `vit_so150m_patch16_reg4_gap_384.sbb_e250_in12k_ft_in1k` - 87.4% top-1
* `vit_so150m_patch16_reg4_gap_256.sbb_e250_in12k`
* Misc typing, typo, etc. cleanup
* 1.0.14 release to get above LeViT fix out

## Jan 9, 2025
* Add support to train and validate in pure `bfloat16` or `float16`
* `wandb` project name arg added by https://github.com/caojiaolong, use arg.experiment for name
Expand Down Expand Up @@ -116,7 +125,6 @@ Add a set of new very well trained ResNet & ResNet-V2 18/34 (basic block) weight
* [mobilenetv3_large_150d.ra4_e3600_r256_in1k](http://hf.co/timm/mobilenetv3_large_150d.ra4_e3600_r256_in1k) - 81.81 @ 320, 80.94 @ 256
* [mobilenetv3_large_100.ra4_e3600_r224_in1k](http://hf.co/timm/mobilenetv3_large_100.ra4_e3600_r224_in1k) - 77.16 @ 256, 76.31 @ 224


### Aug 21, 2024
* Updated SBB ViT models trained on ImageNet-12k and fine-tuned on ImageNet-1k, challenging quite a number of much larger, slower models

Expand Down Expand Up @@ -319,53 +327,6 @@ torch.Size([2, 768, 32, 32])
* Min supported Python version increased to 3.8
* Release 0.9.16

### Jan 8, 2024
Datasets & transform refactoring
* HuggingFace streaming (iterable) dataset support (`--dataset hfids:org/dataset`)
* Webdataset wrapper tweaks for improved split info fetching, can auto fetch splits from supported HF hub webdataset
* Tested HF `datasets` and webdataset wrapper streaming from HF hub with recent `timm` ImageNet uploads to https://huggingface.co/timm
* Make input & target column/field keys consistent across datasets and pass via args
* Full monochrome support when using e:g: `--input-size 1 224 224` or `--in-chans 1`, sets PIL image conversion appropriately in dataset
* Improved several alternate crop & resize transforms (ResizeKeepRatio, RandomCropOrPad, etc) for use in PixParse document AI project
* Add SimCLR style color jitter prob along with grayscale and gaussian blur options to augmentations and args
* Allow train without validation set (`--val-split ''`) in train script
* Add `--bce-sum` (sum over class dim) and `--bce-pos-weight` (positive weighting) args for training as they're common BCE loss tweaks I was often hard coding

### Nov 23, 2023
* Added EfficientViT-Large models, thanks [SeeFun](https://github.com/seefun)
* Fix Python 3.7 compat, will be dropping support for it soon
* Other misc fixes
* Release 0.9.12

### Nov 20, 2023
* Added significant flexibility for Hugging Face Hub based timm models via `model_args` config entry. `model_args` will be passed as kwargs through to models on creation.
* See example at https://huggingface.co/gaunernst/vit_base_patch16_1024_128.audiomae_as2m_ft_as20k/blob/main/config.json
* Usage: https://github.com/huggingface/pytorch-image-models/discussions/2035
* Updated imagenet eval and test set csv files with latest models
* `vision_transformer.py` typing and doc cleanup by [Laureηt](https://github.com/Laurent2916)
* 0.9.11 release

### Nov 3, 2023
* [DFN (Data Filtering Networks)](https://huggingface.co/papers/2309.17425) and [MetaCLIP](https://huggingface.co/papers/2309.16671) ViT weights added
* DINOv2 'register' ViT model weights added (https://huggingface.co/papers/2309.16588, https://huggingface.co/papers/2304.07193)
* Add `quickgelu` ViT variants for OpenAI, DFN, MetaCLIP weights that use it (less efficient)
* Improved typing added to ResNet, MobileNet-v3 thanks to [Aryan](https://github.com/a-r-r-o-w)
* ImageNet-12k fine-tuned (from LAION-2B CLIP) `convnext_xxlarge`
* 0.9.9 release

### Oct 20, 2023
* [SigLIP](https://huggingface.co/papers/2303.15343) image tower weights supported in `vision_transformer.py`.
* Great potential for fine-tune and downstream feature use.
* Experimental 'register' support in vit models as per [Vision Transformers Need Registers](https://huggingface.co/papers/2309.16588)
* Updated RepViT with new weight release. Thanks [wangao](https://github.com/jameslahm)
* Add patch resizing support (on pretrained weight load) to Swin models
* 0.9.8 release pending

### Sep 1, 2023
* TinyViT added by [SeeFun](https://github.com/seefun)
* Fix EfficientViT (MIT) to use torch.autocast so it works back to PT 1.10
* 0.9.7 release

## Introduction

Py**T**orch **Im**age **M**odels (`timm`) is a collection of image models, layers, utilities, optimizers, schedulers, data-loaders / augmentations, and reference training / validation scripts that aim to pull together a wide variety of SOTA models with ability to reproduce ImageNet training results.
Expand Down
141 changes: 141 additions & 0 deletions hfdocs/source/changes.mdx
Original file line number Diff line number Diff line change
@@ -1,5 +1,146 @@
# Changelog

## Jan 19, 2025
* Fix loading of LeViT safetensor weights, remove conversion code which should have been deactivated
* Add 'SO150M' ViT weights trained with SBB recipes, decent results, but not optimal shape for ImageNet-12k/1k pretrain/ft
* `vit_so150m_patch16_reg4_gap_256.sbb_e250_in12k_ft_in1k` - 86.7% top-1
* `vit_so150m_patch16_reg4_gap_384.sbb_e250_in12k_ft_in1k` - 87.4% top-1
* `vit_so150m_patch16_reg4_gap_256.sbb_e250_in12k`
* Misc typing, typo, etc. cleanup
* 1.0.14 release to get above LeViT fix out

## Jan 9, 2025
* Add support to train and validate in pure `bfloat16` or `float16`
* `wandb` project name arg added by https://github.com/caojiaolong, use arg.experiment for name
* Fix old issue w/ checkpoint saving not working on filesystem w/o hard-link support (e.g. FUSE fs mounts)
* 1.0.13 release

## Jan 6, 2025
* Add `torch.utils.checkpoint.checkpoint()` wrapper in `timm.models` that defaults `use_reentrant=False`, unless `TIMM_REENTRANT_CKPT=1` is set in env.

## Dec 31, 2024
* `convnext_nano` 384x384 ImageNet-12k pretrain & fine-tune. https://huggingface.co/models?search=convnext_nano%20r384
* Add AIM-v2 encoders from https://github.com/apple/ml-aim, see on Hub: https://huggingface.co/models?search=timm%20aimv2
* Add PaliGemma2 encoders from https://github.com/google-research/big_vision to existing PaliGemma, see on Hub: https://huggingface.co/models?search=timm%20pali2
* Add missing L/14 DFN2B 39B CLIP ViT, `vit_large_patch14_clip_224.dfn2b_s39b`
* Fix existing `RmsNorm` layer & fn to match standard formulation, use PT 2.5 impl when possible. Move old impl to `SimpleNorm` layer, it's LN w/o centering or bias. There were only two `timm` models using it, and they have been updated.
* Allow override of `cache_dir` arg for model creation
* Pass through `trust_remote_code` for HF datasets wrapper
* `inception_next_atto` model added by creator
* Adan optimizer caution, and Lamb decoupled weighgt decay options
* Some feature_info metadata fixed by https://github.com/brianhou0208
* All OpenCLIP and JAX (CLIP, SigLIP, Pali, etc) model weights that used load time remapping were given their own HF Hub instances so that they work with `hf-hub:` based loading, and thus will work with new Transformers `TimmWrapperModel`

## Nov 28, 2024
* More optimizers
* Add MARS optimizer (https://arxiv.org/abs/2411.10438, https://github.com/AGI-Arena/MARS)
* Add LaProp optimizer (https://arxiv.org/abs/2002.04839, https://github.com/Z-T-WANG/LaProp-Optimizer)
* Add masking from 'Cautious Optimizers' (https://arxiv.org/abs/2411.16085, https://github.com/kyleliang919/C-Optim) to Adafactor, Adafactor Big Vision, AdamW (legacy), Adopt, Lamb, LaProp, Lion, NadamW, RMSPropTF, SGDW
* Cleanup some docstrings and type annotations re optimizers and factory
* Add MobileNet-V4 Conv Medium models pretrained on in12k and fine-tuned in1k @ 384x384
* https://huggingface.co/timm/mobilenetv4_conv_medium.e250_r384_in12k_ft_in1k
* https://huggingface.co/timm/mobilenetv4_conv_medium.e250_r384_in12k
* https://huggingface.co/timm/mobilenetv4_conv_medium.e180_ad_r384_in12k
* https://huggingface.co/timm/mobilenetv4_conv_medium.e180_r384_in12k
* Add small cs3darknet, quite good for the speed
* https://huggingface.co/timm/cs3darknet_focus_s.ra4_e3600_r256_in1k

## Nov 12, 2024
* Optimizer factory refactor
* New factory works by registering optimizers using an OptimInfo dataclass w/ some key traits
* Add `list_optimizers`, `get_optimizer_class`, `get_optimizer_info` to reworked `create_optimizer_v2` fn to explore optimizers, get info or class
* deprecate `optim.optim_factory`, move fns to `optim/_optim_factory.py` and `optim/_param_groups.py` and encourage import via `timm.optim`
* Add Adopt (https://github.com/iShohei220/adopt) optimizer
* Add 'Big Vision' variant of Adafactor (https://github.com/google-research/big_vision/blob/main/big_vision/optax.py) optimizer
* Fix original Adafactor to pick better factorization dims for convolutions
* Tweak LAMB optimizer with some improvements in torch.where functionality since original, refactor clipping a bit
* dynamic img size support in vit, deit, eva improved to support resize from non-square patch grids, thanks https://github.com/wojtke
*
## Oct 31, 2024
Add a set of new very well trained ResNet & ResNet-V2 18/34 (basic block) weights. See https://huggingface.co/blog/rwightman/resnet-trick-or-treat

## Oct 19, 2024
* Cleanup torch amp usage to avoid cuda specific calls, merge support for Ascend (NPU) devices from [MengqingCao](https://github.com/MengqingCao) that should work now in PyTorch 2.5 w/ new device extension autoloading feature. Tested Intel Arc (XPU) in Pytorch 2.5 too and it (mostly) worked.

## Oct 16, 2024
* Fix error on importing from deprecated path `timm.models.registry`, increased priority of existing deprecation warnings to be visible
* Port weights of InternViT-300M (https://huggingface.co/OpenGVLab/InternViT-300M-448px) to `timm` as `vit_intern300m_patch14_448`

### Oct 14, 2024
* Pre-activation (ResNetV2) version of 18/18d/34/34d ResNet model defs added by request (weights pending)
* Release 1.0.10

### Oct 11, 2024
* MambaOut (https://github.com/yuweihao/MambaOut) model & weights added. A cheeky take on SSM vision models w/o the SSM (essentially ConvNeXt w/ gating). A mix of original weights + custom variations & weights.

|model |img_size|top1 |top5 |param_count|
|---------------------------------------------------------------------------------------------------------------------|--------|------|------|-----------|
|[mambaout_base_plus_rw.sw_e150_r384_in12k_ft_in1k](http://huggingface.co/timm/mambaout_base_plus_rw.sw_e150_r384_in12k_ft_in1k)|384 |87.506|98.428|101.66 |
|[mambaout_base_plus_rw.sw_e150_in12k_ft_in1k](http://huggingface.co/timm/mambaout_base_plus_rw.sw_e150_in12k_ft_in1k)|288 |86.912|98.236|101.66 |
|[mambaout_base_plus_rw.sw_e150_in12k_ft_in1k](http://huggingface.co/timm/mambaout_base_plus_rw.sw_e150_in12k_ft_in1k)|224 |86.632|98.156|101.66 |
|[mambaout_base_tall_rw.sw_e500_in1k](http://huggingface.co/timm/mambaout_base_tall_rw.sw_e500_in1k) |288 |84.974|97.332|86.48 |
|[mambaout_base_wide_rw.sw_e500_in1k](http://huggingface.co/timm/mambaout_base_wide_rw.sw_e500_in1k) |288 |84.962|97.208|94.45 |
|[mambaout_base_short_rw.sw_e500_in1k](http://huggingface.co/timm/mambaout_base_short_rw.sw_e500_in1k) |288 |84.832|97.27 |88.83 |
|[mambaout_base.in1k](http://huggingface.co/timm/mambaout_base.in1k) |288 |84.72 |96.93 |84.81 |
|[mambaout_small_rw.sw_e450_in1k](http://huggingface.co/timm/mambaout_small_rw.sw_e450_in1k) |288 |84.598|97.098|48.5 |
|[mambaout_small.in1k](http://huggingface.co/timm/mambaout_small.in1k) |288 |84.5 |96.974|48.49 |
|[mambaout_base_wide_rw.sw_e500_in1k](http://huggingface.co/timm/mambaout_base_wide_rw.sw_e500_in1k) |224 |84.454|96.864|94.45 |
|[mambaout_base_tall_rw.sw_e500_in1k](http://huggingface.co/timm/mambaout_base_tall_rw.sw_e500_in1k) |224 |84.434|96.958|86.48 |
|[mambaout_base_short_rw.sw_e500_in1k](http://huggingface.co/timm/mambaout_base_short_rw.sw_e500_in1k) |224 |84.362|96.952|88.83 |
|[mambaout_base.in1k](http://huggingface.co/timm/mambaout_base.in1k) |224 |84.168|96.68 |84.81 |
|[mambaout_small.in1k](http://huggingface.co/timm/mambaout_small.in1k) |224 |84.086|96.63 |48.49 |
|[mambaout_small_rw.sw_e450_in1k](http://huggingface.co/timm/mambaout_small_rw.sw_e450_in1k) |224 |84.024|96.752|48.5 |
|[mambaout_tiny.in1k](http://huggingface.co/timm/mambaout_tiny.in1k) |288 |83.448|96.538|26.55 |
|[mambaout_tiny.in1k](http://huggingface.co/timm/mambaout_tiny.in1k) |224 |82.736|96.1 |26.55 |
|[mambaout_kobe.in1k](http://huggingface.co/timm/mambaout_kobe.in1k) |288 |81.054|95.718|9.14 |
|[mambaout_kobe.in1k](http://huggingface.co/timm/mambaout_kobe.in1k) |224 |79.986|94.986|9.14 |
|[mambaout_femto.in1k](http://huggingface.co/timm/mambaout_femto.in1k) |288 |79.848|95.14 |7.3 |
|[mambaout_femto.in1k](http://huggingface.co/timm/mambaout_femto.in1k) |224 |78.87 |94.408|7.3 |

* SigLIP SO400M ViT fine-tunes on ImageNet-1k @ 378x378, added 378x378 option for existing SigLIP 384x384 models
* [vit_so400m_patch14_siglip_378.webli_ft_in1k](https://huggingface.co/timm/vit_so400m_patch14_siglip_378.webli_ft_in1k) - 89.42 top-1
* [vit_so400m_patch14_siglip_gap_378.webli_ft_in1k](https://huggingface.co/timm/vit_so400m_patch14_siglip_gap_378.webli_ft_in1k) - 89.03
* SigLIP SO400M ViT encoder from recent multi-lingual (i18n) variant, patch16 @ 256x256 (https://huggingface.co/timm/ViT-SO400M-16-SigLIP-i18n-256). OpenCLIP update pending.
* Add two ConvNeXt 'Zepto' models & weights (one w/ overlapped stem and one w/ patch stem). Uses RMSNorm, smaller than previous 'Atto', 2.2M params.
* [convnext_zepto_rms_ols.ra4_e3600_r224_in1k](https://huggingface.co/timm/convnext_zepto_rms_ols.ra4_e3600_r224_in1k) - 73.20 top-1 @ 224
* [convnext_zepto_rms.ra4_e3600_r224_in1k](https://huggingface.co/timm/convnext_zepto_rms.ra4_e3600_r224_in1k) - 72.81 @ 224

### Sept 2024
* Add a suite of tiny test models for improved unit tests and niche low-resource applications (https://huggingface.co/blog/rwightman/timm-tiny-test)
* Add MobileNetV4-Conv-Small (0.5x) model (https://huggingface.co/posts/rwightman/793053396198664)
* [mobilenetv4_conv_small_050.e3000_r224_in1k](http://hf.co/timm/mobilenetv4_conv_small_050.e3000_r224_in1k) - 65.81 top-1 @ 256, 64.76 @ 224
* Add MobileNetV3-Large variants trained with MNV4 Small recipe
* [mobilenetv3_large_150d.ra4_e3600_r256_in1k](http://hf.co/timm/mobilenetv3_large_150d.ra4_e3600_r256_in1k) - 81.81 @ 320, 80.94 @ 256
* [mobilenetv3_large_100.ra4_e3600_r224_in1k](http://hf.co/timm/mobilenetv3_large_100.ra4_e3600_r224_in1k) - 77.16 @ 256, 76.31 @ 224

### Aug 21, 2024
* Updated SBB ViT models trained on ImageNet-12k and fine-tuned on ImageNet-1k, challenging quite a number of much larger, slower models

| model | top1 | top5 | param_count | img_size |
| -------------------------------------------------- | ------ | ------ | ----------- | -------- |
| [vit_mediumd_patch16_reg4_gap_384.sbb2_e200_in12k_ft_in1k](https://huggingface.co/timm/vit_mediumd_patch16_reg4_gap_384.sbb2_e200_in12k_ft_in1k) | 87.438 | 98.256 | 64.11 | 384 |
| [vit_mediumd_patch16_reg4_gap_256.sbb2_e200_in12k_ft_in1k](https://huggingface.co/timm/vit_mediumd_patch16_reg4_gap_256.sbb2_e200_in12k_ft_in1k) | 86.608 | 97.934 | 64.11 | 256 |
| [vit_betwixt_patch16_reg4_gap_384.sbb2_e200_in12k_ft_in1k](https://huggingface.co/timm/vit_betwixt_patch16_reg4_gap_384.sbb2_e200_in12k_ft_in1k) | 86.594 | 98.02 | 60.4 | 384 |
| [vit_betwixt_patch16_reg4_gap_256.sbb2_e200_in12k_ft_in1k](https://huggingface.co/timm/vit_betwixt_patch16_reg4_gap_256.sbb2_e200_in12k_ft_in1k) | 85.734 | 97.61 | 60.4 | 256 |
* MobileNet-V1 1.25, EfficientNet-B1, & ResNet50-D weights w/ MNV4 baseline challenge recipe

| model | top1 | top5 | param_count | img_size |
|--------------------------------------------------------------------------------------------------------------------------|--------|--------|-------------|----------|
| [resnet50d.ra4_e3600_r224_in1k](http://hf.co/timm/resnet50d.ra4_e3600_r224_in1k) | 81.838 | 95.922 | 25.58 | 288 |
| [efficientnet_b1.ra4_e3600_r240_in1k](http://hf.co/timm/efficientnet_b1.ra4_e3600_r240_in1k) | 81.440 | 95.700 | 7.79 | 288 |
| [resnet50d.ra4_e3600_r224_in1k](http://hf.co/timm/resnet50d.ra4_e3600_r224_in1k) | 80.952 | 95.384 | 25.58 | 224 |
| [efficientnet_b1.ra4_e3600_r240_in1k](http://hf.co/timm/efficientnet_b1.ra4_e3600_r240_in1k) | 80.406 | 95.152 | 7.79 | 240 |
| [mobilenetv1_125.ra4_e3600_r224_in1k](http://hf.co/timm/mobilenetv1_125.ra4_e3600_r224_in1k) | 77.600 | 93.804 | 6.27 | 256 |
| [mobilenetv1_125.ra4_e3600_r224_in1k](http://hf.co/timm/mobilenetv1_125.ra4_e3600_r224_in1k) | 76.924 | 93.234 | 6.27 | 224 |

* Add SAM2 (HieraDet) backbone arch & weight loading support
* Add Hiera Small weights trained w/ abswin pos embed on in12k & fine-tuned on 1k

|model |top1 |top5 |param_count|
|---------------------------------|------|------|-----------|
|hiera_small_abswin_256.sbb2_e200_in12k_ft_in1k |84.912|97.260|35.01 |
|hiera_small_abswin_256.sbb2_pd_e200_in12k_ft_in1k |84.560|97.106|35.01 |

### Aug 8, 2024
* Add RDNet ('DenseNets Reloaded', https://arxiv.org/abs/2403.19588), thanks [Donghyun Kim](https://github.com/dhkim0225)

Expand Down
2 changes: 1 addition & 1 deletion timm/version.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = '1.0.14.dev0'
__version__ = '1.0.14'

0 comments on commit 5d535d7

Please sign in to comment.