Release v1.15.0: SynapseAI v1.19.0, FLUX, Mllama, DeepSeek, Falcon 3 · huggingface/optimum-habana

SynapseAI v1.19

Upgrade to SynapseAI 1.19 #1667 @regisss

FLUX

FLUX with diffusers 0.31.0 #1450 @dsocek
FLUX Fine-Tuning for Gaudi #1482 @dsocek
Flux Image-To-Image pipeline #1524 @dsocek

New models

Optimized inference of Cohere model on HPU #1329 @XinyuYe-Intel
Idefics2 #1270 @sywangyi
Optimized inference of XGLM model on HPU #1323 @XinyuYe-Intel
Add mllama support #1419 @sywangyi
Enable paligemma model for image-to-text example #1407 @kaixuanliu
Enable Gemma2 Inference on Gaudi #1504 @Luca-Calabria
Minicpm enabling #1342 @pi314ever
Enable Falcon-mamba #1480 @yuanwu2017
Add support for Baichuan2 #1479 @xhaihao
Enable DeepSeek-V2 #1475 @yao-matrix
Add chatglm #1478 @mengker33
Falcon Model Support #1612 @alekseyfa

Various model optimizations

Enable flash attention for gemma #1454 @atakaha
Support loading 4 bit Qwen2 #1476 @mengniwang95
Fixed Gemma FP8 flash_attention lower throughput issue #1510 @kplau1128
Disable default sdpa in Albert (#22) #1517 @astachowiczhabana
Implement fused sdpa for wav2vec2 (#18) #1520 @astachowiczhabana
Memory optimization for gpt_bitcode #1513 @astachowiczhabana
Support beam search with reuse_cache and bucket_internal #1472 @Wei-Lin-Intel
Add mixtral trl sft #1349 @lkk12014402
Enable tiiuae/falcon-11B-vlm in image_to_text example #1490 @sywangyi
Enable fusedsdpa kernel for vision part of mllama #1531 @sywangyi
Enable dynamic compile for mpi(training) #1509 @chaojun-zhang
Add DynamicMoE support for Mixtral #1511 @kwisniewski98
Implemented fusedSDPA for stable diffusion (#36) #1545 @astachowiczhabana
Fix Accuracy Calculation Issue in GPT-NeoX #1591 @yafshar

Sentence Transformers

Update sentence transformer to v3.2.1 #1470 @ZhengHongming888

Textual Inversion XL

Add textual inversion XL for Gaudi #868 @dsocek

TIMM

Enable pyTorch-IMage-Models (TIMM) with HPUs #1459 @ZhengHongming888

Context Parallelism

Adding support for Context Parallelism using Deepseed's DistributedAttention #1501 @bhargaveede
Move parallel_state.py to the distributed folder a6ee7c2044e6ddf7d19ae3ad663149e51d6f89e7 @regisss

CI improvements

Tests for text gen output text #1411 @vidyasiv
Add split runners to CI (2 devices per runner for fast tests) 72df37df46d1d2a2665c5d1be43b13704b7c8ada @regisss
Fix fast CI to work with split runners #1534 @regisss
Add Llama 3.1 ft to CI #1529 @MohitIntel

Documentation

Optimum-Habana docs re-org #1488 @dsocek

Other

Fix facebook/hf-seamless-m4t-medium crash #1433 @sywangyi
Fix bias update in scoped all reduce #1456 @skavulya
fea(pytests): Added skip for unsuported tests for mistral/mixtral #1462 @imangohari1
Remove deprecated Mixed precision flags #1471 @vivekgoe
Readme: replace tabs with spaces #1485 @mgonchar
Move fast tests to Gaudi2 #1498 @regisss
Remove torch req from LM example #1491 @astachowiczhabana
Remove keep_input_mutations #1492 @astachowiczhabana
Fix trust_remote_code #1493 @astachowiczhabana
Upgrade ViT README with torch.compile #1494 @astachowiczhabana
Corrected Throughput measure for GaudiDDPMPipeline #1460 @deepak-gowda-narayana
[SW-196761] Add G3 in T5-L README #1523 @astachowiczhabana
Fix tuple object error #1354 @SupreetSinghPalne
Add warmup time and compile time log for the eval/prediction. #1489 @jiminha
Add support for MLPERF optimized pipeline from example #1465 @ANSHUMAN87
Add check_neural_compressor_min_version for 4 bit behavior #1500 @xin3he
Pass "lazy_mode" arg to GaudiLlamaModel GaudiTrainer #1515 @astachowiczhabana
Removed workaround for NaN bug causing graph break. #1516 @astachowiczhabana
text_generation: improve parameters check #1527 @mgonchar
transformers: fixed some typos #1528 @mgonchar
Makes the with_stack of the profiler changeable #1497 @ranzhejiang
Fix dtype issue with valid sequence length in torch.compile bs=1 #1532 @wszczurekhabana
Migrate OH CLIP (roberta-clip) training to torch.compile #1507 @chaojun-zhang
test_text_generation: fix non-Gaudi2 case #1530 @mgonchar
text-generation: improve output printing #1486 @mgonchar
Text-generation, model set-up: torch.compile for attributes instead of models' types #1452 @dsmertin
Fix bridgetower example #1481 @astachowiczhabana
Migrate OH Wave2Vec-AC training to torch.compile - README update #1537 @astachowiczhabana
Migrate OH T5-large training to torch.compile #1506 @chaojun-zhang
trainer: fixed spelling #1538 @mgonchar
Create CI Eager/Lazy for Language Modeling #1448 @Luca-Calabria
Fixes for llava-next test failures in 1.19 #1535 @tthakkal
Refactor Qwen2 Family #1541 @Wei-Lin-Intel
Add support for optimized SDXL pipeline #1519 @sushildubey171
Add the checkout parameters of falcon-mamba pytest #1540 @yuanwu2017
Avoid negative values in eval metrics #1533 @deepak-gowda-narayana
Fix lm_eval script for starcoder and gemma #1463 @skavulya
Add option to use bf16 in PT sdp (#5) #1514 @astachowiczhabana
Fix tests.test_peft_inference failure #1543 @sywangyi
Update lm_eval version #1473 @alexey-belyakov
Fix bad import in Baichuan code #1547 @regisss
Restore performance in generate #1546 @ugolowic
Fix for llava models not generating text with test failures in 1.19 #1548 @tthakkal
Refactor KV cache, Rope , reduce common code #1148 @abhilash1910
Adjust Qwen2-7B test case #1551 @Wei-Lin-Intel
[run_lm_eval.py] Fixed too many print dump json info #1553 @FocusLuo
Fix for single_card llama7b and falcon40b CI errors #1549 @MohitIntel
Apply --sdp_on_bf16 to image-to-text examples #1557 @schoi-habana
Fix accuracy regression in Gemma #1556 @skavulya
Fix FusedSDPA wrapper from TransformerEngine #1562 @pbielak
Run albert-xxlarge-v1 CI as torch.compile mode #1563 @yeonsily
Update README commands for the models to use --sdp_on_bf16 #1566 @yeonsily
Minicpm patch #1567 @pi314ever
Updated gemma_2b_it CI #1561 @Luca-Calabria
Fixed Adalora Test for OH 1.15 #1564 @npiroozan
Fixed LORACP Test for OH 1.15 #1568 @npiroozan
Fix prefix llama ci failure #1570 @sywangyi
Fix mllama test #1569 @sywangyi
Fix lazy_mode assignment #1558 @vidyasiv
Generation utils update (minor) #1468 @yafshar
Style: removed tabs #1577 @mgonchar
Enable num_return_sequences in beam search #1536 @mengker33
gpt_bigcode: added internal bucketing fix #1526 @mgonchar
Update the Gaudi trainer with transformers 4.45.2 #1398 @yafshar
Revert "add check_neural_compressor_min_version for 4 bit behavior" #1578 @xin3he
Revert PR #1473 #1582 @regisss
Fixed spelling #1576 @mgonchar
Update docs for baichuan2 training #1586 @xhaihao
Add WA flag for falcon-180b to resolve text-gen critical reset error during tests #1590 @hchauhan123
Update transformers tests generation util v4.45.2 #1441 @malkomes
Limit position embeddings in inference #1598 @bhargaveede
Verify model output is provided when check_output is enabled #1597 @vidyasiv
Update README.md #1595 @skaulintel
Fix scikit-learn to 1.5.2 to fix f1 evaluation crash in 1.6.0 #1596 @sywangyi
Update language-modeling README file #1599 @vivekgoe
Revert common KVCache not to check token_idx #1594 @jiminha
Revert LlamaKVCache due to memory increase #1605 @jiminha
Replace the UNET custom attention processors #1608 @yafshar
Fix run_generation test commands for TRL out usage example #1621 @shepark
Update sdp_on_bf16 option for ST example #1615 @ZhengHongming888
Update save lora weights for diffusers with text_encoder_2 layers #1626 @skavulya
Fix save_lora_weights in pipeline_utils.py #1643 @regisss
Check rope_scaling attr #1609 @jiminha
Skip certain tests for G1 with empty param list #1613 @hsubramony
Revert "Update transformers tests generation util v4.45.2 (#1441)" #1614 @yeonsily
Audio classification readme update #1604 @hsubramony
Fix readme cmds for clip-roberta #1603 @hsubramony
Add arbitrary scales #1625 @jiminha
Modify Qwen2 TRL command to avoid OOM. #1630 @jiminha
Fix distributed issue for ST Trainer #1649 @ZhengHongming888
Fix distributed issue for timm #1653 @ZhengHongming888
Refactor mixtral moe block. #1635 @lkk12014402
Speech-recognition: downgrade datasets version #1646 @hsubramony
Add sdp_on_bf16 to controlnet #1631 @skaulintel
Quick fix for quantization/custom op list loading #1657 @dsocek
Fix bug for GaudiMixtralAttentionLongSequence forward #1650 @kaixuanliu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.15.0: SynapseAI v1.19.0, FLUX, Mllama, DeepSeek, Falcon 3

SynapseAI v1.19

FLUX

New models

Various model optimizations

Sentence Transformers

Textual Inversion XL

TIMM

Context Parallelism

CI improvements

Documentation

Other

Contributors