The first non pre-release since Oct 2022 with a long list of changes from 0.6.x releases...
May 12, 2023
- Fix Python 3.7 import error re Final[] typing annotation
May 11, 2023
timm
0.9 released, transition from 0.8.xdev releases
May 10, 2023
- Hugging Face Hub downloading is now default, 1132 models on https://huggingface.co/timm, 1163 weights in
timm
- DINOv2 vit feature backbone weights added thanks to Leng Yue
- FB MAE vit feature backbone weights added
- OpenCLIP DataComp-XL L/14 feat backbone weights added
- MetaFormer (poolformer-v2, caformer, convformer, updated poolformer (v1)) w/ weights added by Fredo Guan
- Experimental
get_intermediate_layers
function on vit/deit models for grabbing hidden states (inspired by DINO impl). This is WIP and may change significantly... feedback welcome.
- Model creation throws error if
pretrained=True
and no weights exist (instead of continuing with random initialization)
- Fix regression with inception / nasnet TF sourced weights with 1001 classes in original classifiers
- bitsandbytes (https://github.com/TimDettmers/bitsandbytes) optimizers added to factory, use
bnb
prefix, ie bnbadam8bit
- Misc cleanup and fixes
- Final testing before switching to a 0.9 and bringing
timm
out of pre-release state
April 27, 2023
- 97% of
timm
models uploaded to HF Hub and almost all updated to support multi-weight pretrained configs
- Minor cleanup and refactoring of another batch of models as multi-weight added. More fused_attn (F.sdpa) and features_only support, and torchscript fixes.
April 21, 2023
- Gradient accumulation support added to train script and tested (
--grad-accum-steps
), thanks Taeksang Kim
- More weights on HF Hub (cspnet, cait, volo, xcit, tresnet, hardcorenas, densenet, dpn, vovnet, xception_aligned)
- Added
--head-init-scale
and --head-init-bias
to train.py to scale classiifer head and set fixed bias for fine-tune
- Remove all InplaceABN (
inplace_abn
) use, replaced use in tresnet with standard BatchNorm (modified weights accordingly).
April 12, 2023
- Add ONNX export script, validate script, helpers that I've had kicking around for along time. Tweak 'same' padding for better export w/ recent ONNX + pytorch.
- Refactor dropout args for vit and vit-like models, separate drop_rate into
drop_rate
(classifier dropout), proj_drop_rate
(block mlp / out projections), pos_drop_rate
(position embedding drop), attn_drop_rate
(attention dropout). Also add patch dropout (FLIP) to vit and eva models.
- fused F.scaled_dot_product_attention support to more vit models, add env var (TIMM_FUSED_ATTN) to control, and config interface to enable/disable
- Add EVA-CLIP backbones w/ image tower weights, all the way up to 4B param 'enormous' model, and 336x336 OpenAI ViT mode that was missed.
April 5, 2023
- ALL ResNet models pushed to Hugging Face Hub with multi-weight support
- New ImageNet-12k + ImageNet-1k fine-tunes available for a few anti-aliased ResNet models
resnetaa50d.sw_in12k_ft_in1k
- 81.7 @ 224, 82.6 @ 288
resnetaa101d.sw_in12k_ft_in1k
- 83.5 @ 224, 84.1 @ 288
seresnextaa101d_32x8d.sw_in12k_ft_in1k
- 86.0 @ 224, 86.5 @ 288
seresnextaa101d_32x8d.sw_in12k_ft_in1k_288
- 86.5 @ 288, 86.7 @ 320
March 31, 2023
- Add first ConvNext-XXLarge CLIP -> IN-1k fine-tune and IN-12k intermediate fine-tunes for convnext-base/large CLIP models.
model |
top1 |
top5 |
img_size |
param_count |
gmacs |
macts |
convnext_xxlarge.clip_laion2b_soup_ft_in1k |
88.612 |
98.704 |
256 |
846.47 |
198.09 |
124.45 |
convnext_large_mlp.clip_laion2b_soup_ft_in12k_in1k_384 |
88.312 |
98.578 |
384 |
200.13 |
101.11 |
126.74 |
convnext_large_mlp.clip_laion2b_soup_ft_in12k_in1k_320 |
87.968 |
98.47 |
320 |
200.13 |
70.21 |
88.02 |
convnext_base.clip_laion2b_augreg_ft_in12k_in1k_384 |
87.138 |
98.212 |
384 |
88.59 |
45.21 |
84.49 |
convnext_base.clip_laion2b_augreg_ft_in12k_in1k |
86.344 |
97.97 |
256 |
88.59 |
20.09 |
37.55 |
- Add EVA-02 MIM pretrained and fine-tuned weights, push to HF hub and update model cards for all EVA models. First model over 90% top-1 (99% top-5)! Check out the original code & weights at https://github.com/baaivision/EVA for more details on their work blending MIM, CLIP w/ many model, dataset, and train recipe tweaks.
model |
top1 |
top5 |
param_count |
img_size |
eva02_large_patch14_448.mim_m38m_ft_in22k_in1k |
90.054 |
99.042 |
305.08 |
448 |
eva02_large_patch14_448.mim_in22k_ft_in22k_in1k |
89.946 |
99.01 |
305.08 |
448 |
eva_giant_patch14_560.m30m_ft_in22k_in1k |
89.792 |
98.992 |
1014.45 |
560 |
eva02_large_patch14_448.mim_in22k_ft_in1k |
89.626 |
98.954 |
305.08 |
448 |
eva02_large_patch14_448.mim_m38m_ft_in1k |
89.57 |
98.918 |
305.08 |
448 |
eva_giant_patch14_336.m30m_ft_in22k_in1k |
89.56 |
98.956 |
1013.01 |
336 |
eva_giant_patch14_336.clip_ft_in1k |
89.466 |
98.82 |
1013.01 |
336 |
eva_large_patch14_336.in22k_ft_in22k_in1k |
89.214 |
98.854 |
304.53 |
336 |
eva_giant_patch14_224.clip_ft_in1k |
88.882 |
98.678 |
1012.56 |
224 |
eva02_base_patch14_448.mim_in22k_ft_in22k_in1k |
88.692 |
98.722 |
87.12 |
448 |
eva_large_patch14_336.in22k_ft_in1k |
88.652 |
98.722 |
304.53 |
336 |
eva_large_patch14_196.in22k_ft_in22k_in1k |
88.592 |
98.656 |
304.14 |
196 |
eva02_base_patch14_448.mim_in22k_ft_in1k |
88.23 |
98.564 |
87.12 |
448 |
eva_large_patch14_196.in22k_ft_in1k |
87.934 |
98.504 |
304.14 |
196 |
eva02_small_patch14_336.mim_in22k_ft_in1k |
85.74 |
97.614 |
22.13 |
336 |
eva02_tiny_patch14_336.mim_in22k_ft_in1k |
80.658 |
95.524 |
5.76 |
336 |
- Multi-weight and HF hub for DeiT and MLP-Mixer based models
March 22, 2023
- More weights pushed to HF hub along with multi-weight support, including:
regnet.py
, rexnet.py
, byobnet.py
, resnetv2.py
, swin_transformer.py
, swin_transformer_v2.py
, swin_transformer_v2_cr.py
- Swin Transformer models support feature extraction (NCHW feat maps for
swinv2_cr_*
, and NHWC for all others) and spatial embedding outputs.
- FocalNet (from https://github.com/microsoft/FocalNet) models and weights added with significant refactoring, feature extraction, no fixed resolution / sizing constraint
- RegNet weights increased with HF hub push, SWAG, SEER, and torchvision v2 weights. SEER is pretty poor wrt to performance for model size, but possibly useful.
- More ImageNet-12k pretrained and 1k fine-tuned
timm
weights:
rexnetr_200.sw_in12k_ft_in1k
- 82.6 @ 224, 83.2 @ 288
rexnetr_300.sw_in12k_ft_in1k
- 84.0 @ 224, 84.5 @ 288
regnety_120.sw_in12k_ft_in1k
- 85.0 @ 224, 85.4 @ 288
regnety_160.lion_in12k_ft_in1k
- 85.6 @ 224, 86.0 @ 288
regnety_160.sw_in12k_ft_in1k
- 85.6 @ 224, 86.0 @ 288 (compare to SWAG PT + 1k FT this is same BUT much lower res, blows SEER FT away)
- Model name deprecation + remapping functionality added (a milestone for bringing 0.8.x out of pre-release). Mappings being added...
- Minor bug fixes and improvements.
Feb 26, 2023
- Add ConvNeXt-XXLarge CLIP pretrained image tower weights for fine-tune & features (fine-tuning TBD) -- see model card
- Update
convnext_xxlarge
default LayerNorm eps to 1e-5 (for CLIP weights, improved stability)
- 0.8.15dev0
Feb 20, 2023
- Add 320x320
convnext_large_mlp.clip_laion2b_ft_320
and convnext_lage_mlp.clip_laion2b_ft_soup_320
CLIP image tower weights for features & fine-tune
- 0.8.13dev0 pypi release for latest changes w/ move to huggingface org
Feb 16, 2023
safetensor
checkpoint support added
- Add ideas from 'Scaling Vision Transformers to 22 B. Params' (https://arxiv.org/abs/2302.05442) -- qk norm, RmsNorm, parallel block
- Add F.scaled_dot_product_attention support (PyTorch 2.0 only) to
vit_*
, vit_relpos*
, coatnet
/ maxxvit
(to start)
- Lion optimizer (w/ multi-tensor option) added (https://arxiv.org/abs/2302.06675)
- gradient checkpointing works with
features_only=True
Feb 7, 2023
- New inference benchmark numbers added in results folder.
- Add convnext LAION CLIP trained weights and initial set of in1k fine-tunes
convnext_base.clip_laion2b_augreg_ft_in1k
- 86.2% @ 256x256
convnext_base.clip_laiona_augreg_ft_in1k_384
- 86.5% @ 384x384
convnext_large_mlp.clip_laion2b_augreg_ft_in1k
- 87.3% @ 256x256
convnext_large_mlp.clip_laion2b_augreg_ft_in1k_384
- 87.9% @ 384x384
- Add DaViT models. Supports
features_only=True
. Adapted from https://github.com/dingmyu/davit by Fredo.
- Use a common NormMlpClassifierHead across MaxViT, ConvNeXt, DaViT
- Add EfficientFormer-V2 model, update EfficientFormer, and refactor LeViT (closely related architectures). Weights on HF hub.
- New EfficientFormer-V2 arch, significant refactor from original at (https://github.com/snap-research/EfficientFormer). Supports
features_only=True
.
- Minor updates to EfficientFormer.
- Refactor LeViT models to stages, add
features_only=True
support to new conv
variants, weight remap required.
- Move ImageNet meta-data (synsets, indices) from
/results
to timm/data/_info
.
- Add ImageNetInfo / DatasetInfo classes to provide labelling for various ImageNet classifier layouts in
timm
- Update
inference.py
to use, try: python inference.py /folder/to/images --model convnext_small.in12k --label-type detail --topk 5
- Ready for 0.8.10 pypi pre-release (final testing).
Jan 20, 2023
-
Add two convnext 12k -> 1k fine-tunes at 384x384
convnext_tiny.in12k_ft_in1k_384
- 85.1 @ 384
convnext_small.in12k_ft_in1k_384
- 86.2 @ 384
-
Push all MaxxViT weights to HF hub, and add new ImageNet-12k -> 1k fine-tunes for rw
base MaxViT and CoAtNet 1/2 models
Jan 11, 2023
- Update ConvNeXt ImageNet-12k pretrain series w/ two new fine-tuned weights (and pre FT
.in12k
tags)
convnext_nano.in12k_ft_in1k
- 82.3 @ 224, 82.9 @ 288 (previously released)
convnext_tiny.in12k_ft_in1k
- 84.2 @ 224, 84.5 @ 288
convnext_small.in12k_ft_in1k
- 85.2 @ 224, 85.3 @ 288
Jan 6, 2023
- Finally got around to adding
--model-kwargs
and --opt-kwargs
to scripts to pass through rare args directly to model classes from cmd line
train.py /imagenet --model resnet50 --amp --model-kwargs output_stride=16 act_layer=silu
train.py /imagenet --model vit_base_patch16_clip_224 --img-size 240 --amp --model-kwargs img_size=240 patch_size=12
- Cleanup some popular models to better support arg passthrough / merge with model configs, more to go.
Jan 5, 2023
- ConvNeXt-V2 models and weights added to existing
convnext.py
Dec 23, 2022 🎄☃
- Add FlexiViT models and weights from https://github.com/google-research/big_vision (check out paper at https://arxiv.org/abs/2212.08013)
- NOTE currently resizing is static on model creation, on-the-fly dynamic / train patch size sampling is a WIP
- Many more models updated to multi-weight and downloadable via HF hub now (convnext, efficientnet, mobilenet, vision_transformer*, beit)
- More model pretrained tag and adjustments, some model names changed (working on deprecation translations, consider main branch DEV branch right now, use 0.6.x for stable use)
- More ImageNet-12k (subset of 22k) pretrain models popping up:
efficientnet_b5.in12k_ft_in1k
- 85.9 @ 448x448
vit_medium_patch16_gap_384.in12k_ft_in1k
- 85.5 @ 384x384
vit_medium_patch16_gap_256.in12k_ft_in1k
- 84.5 @ 256x256
convnext_nano.in12k_ft_in1k
- 82.9 @ 288x288
Dec 8, 2022
- Add 'EVA l' to
vision_transformer.py
, MAE style ViT-L/14 MIM pretrain w/ EVA-CLIP targets, FT on ImageNet-1k (w/ ImageNet-22k intermediate for some)
model |
top1 |
param_count |
gmac |
macts |
hub |
eva_large_patch14_336.in22k_ft_in22k_in1k |
89.2 |
304.5 |
191.1 |
270.2 |
link |
eva_large_patch14_336.in22k_ft_in1k |
88.7 |
304.5 |
191.1 |
270.2 |
link |
eva_large_patch14_196.in22k_ft_in22k_in1k |
88.6 |
304.1 |
61.6 |
63.5 |
link |
eva_large_patch14_196.in22k_ft_in1k |
87.9 |
304.1 |
61.6 |
63.5 |
link |
Dec 6, 2022
- Add 'EVA g', BEiT style ViT-g/14 model weights w/ both MIM pretrain and CLIP pretrain to
beit.py
.
model |
top1 |
param_count |
gmac |
macts |
hub |
eva_giant_patch14_560.m30m_ft_in22k_in1k |
89.8 |
1014.4 |
1906.8 |
2577.2 |
link |
eva_giant_patch14_336.m30m_ft_in22k_in1k |
89.6 |
1013 |
620.6 |
550.7 |
link |
eva_giant_patch14_336.clip_ft_in1k |
89.4 |
1013 |
620.6 |
550.7 |
link |
eva_giant_patch14_224.clip_ft_in1k |
89.1 |
1012.6 |
267.2 |
192.6 |
link |
Dec 5, 2022
- Pre-release (
0.8.0dev0
) of multi-weight support (model_arch.pretrained_tag
). Install with pip install --pre timm
- vision_transformer, maxvit, convnext are the first three model impl w/ support
- model names are changing with this (previous _21k, etc. fn will merge), still sorting out deprecation handling
- bugs are likely, but I need feedback so please try it out
- if stability is needed, please use 0.6.x pypi releases or clone from 0.6.x branch
- Support for PyTorch 2.0 compile is added in train/validate/inference/benchmark, use
--torchcompile
argument
- Inference script allows more control over output, select k for top-class index + prob json, csv or parquet output
- Add a full set of fine-tuned CLIP image tower weights from both LAION-2B and original OpenAI CLIP models
model |
top1 |
param_count |
gmac |
macts |
hub |
vit_huge_patch14_clip_336.laion2b_ft_in12k_in1k |
88.6 |
632.5 |
391 |
407.5 |
link |
vit_large_patch14_clip_336.openai_ft_in12k_in1k |
88.3 |
304.5 |
191.1 |
270.2 |
link |
vit_huge_patch14_clip_224.laion2b_ft_in12k_in1k |
88.2 |
632 |
167.4 |
139.4 |
link |
vit_large_patch14_clip_336.laion2b_ft_in12k_in1k |
88.2 |
304.5 |
191.1 |
270.2 |
link |
vit_large_patch14_clip_224.openai_ft_in12k_in1k |
88.2 |
304.2 |
81.1 |
88.8 |
link |
vit_large_patch14_clip_224.laion2b_ft_in12k_in1k |
87.9 |
304.2 |
81.1 |
88.8 |
link |
vit_large_patch14_clip_224.openai_ft_in1k |
87.9 |
304.2 |
81.1 |
88.8 |
link |
vit_large_patch14_clip_336.laion2b_ft_in1k |
87.9 |
304.5 |
191.1 |
270.2 |
link |
vit_huge_patch14_clip_224.laion2b_ft_in1k |
87.6 |
632 |
167.4 |
139.4 |
link |
vit_large_patch14_clip_224.laion2b_ft_in1k |
87.3 |
304.2 |
81.1 |
88.8 |
link |
vit_base_patch16_clip_384.laion2b_ft_in12k_in1k |
87.2 |
86.9 |
55.5 |
101.6 |
link |
vit_base_patch16_clip_384.openai_ft_in12k_in1k |
87 |
86.9 |
55.5 |
101.6 |
link |
vit_base_patch16_clip_384.laion2b_ft_in1k |
86.6 |
86.9 |
55.5 |
101.6 |
link |
vit_base_patch16_clip_384.openai_ft_in1k |
86.2 |
86.9 |
55.5 |
101.6 |
link |
vit_base_patch16_clip_224.laion2b_ft_in12k_in1k |
86.2 |
86.6 |
17.6 |
23.9 |
link |
vit_base_patch16_clip_224.openai_ft_in12k_in1k |
85.9 |
86.6 |
17.6 |
23.9 |
link |
vit_base_patch32_clip_448.laion2b_ft_in12k_in1k |
85.8 |
88.3 |
17.9 |
23.9 |
link |
vit_base_patch16_clip_224.laion2b_ft_in1k |
85.5 |
86.6 |
17.6 |
23.9 |
link |
vit_base_patch32_clip_384.laion2b_ft_in12k_in1k |
85.4 |
88.3 |
13.1 |
16.5 |
link |
vit_base_patch16_clip_224.openai_ft_in1k |
85.3 |
86.6 |
17.6 |
23.9 |
link |
vit_base_patch32_clip_384.openai_ft_in12k_in1k |
85.2 |
88.3 |
13.1 |
16.5 |
link |
vit_base_patch32_clip_224.laion2b_ft_in12k_in1k |
83.3 |
88.2 |
4.4 |
5 |
link |
vit_base_patch32_clip_224.laion2b_ft_in1k |
82.6 |
88.2 |
4.4 |
5 |
link |
vit_base_patch32_clip_224.openai_ft_in1k |
81.9 |
88.2 |
4.4 |
5 |
link |
- Port of MaxViT Tensorflow Weights from official impl at https://github.com/google-research/maxvit
- There was larger than expected drops for the upscaled 384/512 in21k fine-tune weights, possible detail missing, but the 21k FT did seem sensitive to small preprocessing
model |
top1 |
param_count |
gmac |
macts |
hub |
maxvit_xlarge_tf_512.in21k_ft_in1k |
88.5 |
475.8 |
534.1 |
1413.2 |
link |
maxvit_xlarge_tf_384.in21k_ft_in1k |
88.3 |
475.3 |
292.8 |
668.8 |
link |
maxvit_base_tf_512.in21k_ft_in1k |
88.2 |
119.9 |
138 |
704 |
link |
maxvit_large_tf_512.in21k_ft_in1k |
88 |
212.3 |
244.8 |
942.2 |
link |
maxvit_large_tf_384.in21k_ft_in1k |
88 |
212 |
132.6 |
445.8 |
link |
maxvit_base_tf_384.in21k_ft_in1k |
87.9 |
119.6 |
73.8 |
332.9 |
link |
maxvit_base_tf_512.in1k |
86.6 |
119.9 |
138 |
704 |
link |
maxvit_large_tf_512.in1k |
86.5 |
212.3 |
244.8 |
942.2 |
link |
maxvit_base_tf_384.in1k |
86.3 |
119.6 |
73.8 |
332.9 |
link |
maxvit_large_tf_384.in1k |
86.2 |
212 |
132.6 |
445.8 |
link |
maxvit_small_tf_512.in1k |
86.1 |
69.1 |
67.3 |
383.8 |
link |
maxvit_tiny_tf_512.in1k |
85.7 |
31 |
33.5 |
257.6 |
link |
maxvit_small_tf_384.in1k |
85.5 |
69 |
35.9 |
183.6 |
link |
maxvit_tiny_tf_384.in1k |
85.1 |
31 |
17.5 |
123.4 |
link |
maxvit_large_tf_224.in1k |
84.9 |
211.8 |
43.7 |
127.4 |
link |
maxvit_base_tf_224.in1k |
84.9 |
119.5 |
24 |
95 |
link |
maxvit_small_tf_224.in1k |
84.4 |
68.9 |
11.7 |
53.2 |
link |
maxvit_tiny_tf_224.in1k |
83.4 |
30.9 |
5.6 |
35.8 |
link |
Oct 15, 2022
- Train and validation script enhancements
- Non-GPU (ie CPU) device support
- SLURM compatibility for train script
- HF datasets support (via ReaderHfds)
- TFDS/WDS dataloading improvements (sample padding/wrap for distributed use fixed wrt sample count estimate)
- in_chans !=3 support for scripts / loader
- Adan optimizer
- Can enable per-step LR scheduling via args
- Dataset 'parsers' renamed to 'readers', more descriptive of purpose
- AMP args changed, APEX via
--amp-impl apex
, bfloat16 supportedf via --amp-dtype bfloat16
- main branch switched to 0.7.x version, 0.6x forked for stable release of weight only adds
- master -> main branch rename