SynapseAI v1.19
FLUX
- FLUX with diffusers 0.31.0 #1450 @dsocek
- FLUX Fine-Tuning for Gaudi #1482 @dsocek
- Flux Image-To-Image pipeline #1524 @dsocek
New models
- Optimized inference of Cohere model on HPU #1329 @XinyuYe-Intel
- Idefics2 #1270 @sywangyi
- Optimized inference of XGLM model on HPU #1323 @XinyuYe-Intel
- Add mllama support #1419 @sywangyi
- Enable paligemma model for image-to-text example #1407 @kaixuanliu
- Enable Gemma2 Inference on Gaudi #1504 @Luca-Calabria
- Minicpm enabling #1342 @pi314ever
- Enable Falcon-mamba #1480 @yuanwu2017
- Add support for Baichuan2 #1479 @xhaihao
- Enable DeepSeek-V2 #1475 @yao-matrix
- Add chatglm #1478 @mengker33
- Falcon Model Support #1612 @alekseyfa
Various model optimizations
- Enable flash attention for gemma #1454 @atakaha
- Support loading 4 bit Qwen2 #1476 @mengniwang95
- Fixed Gemma FP8 flash_attention lower throughput issue #1510 @kplau1128
- Disable default sdpa in Albert (#22) #1517 @astachowiczhabana
- Implement fused sdpa for wav2vec2 (#18) #1520 @astachowiczhabana
- Memory optimization for gpt_bitcode #1513 @astachowiczhabana
- Support beam search with reuse_cache and bucket_internal #1472 @Wei-Lin-Intel
- Add mixtral trl sft #1349 @lkk12014402
- Enable tiiuae/falcon-11B-vlm in image_to_text example #1490 @sywangyi
- Enable fusedsdpa kernel for vision part of mllama #1531 @sywangyi
- Enable dynamic compile for mpi(training) #1509 @chaojun-zhang
- Add DynamicMoE support for Mixtral #1511 @kwisniewski98
- Implemented fusedSDPA for stable diffusion (#36) #1545 @astachowiczhabana
- Fix Accuracy Calculation Issue in GPT-NeoX #1591 @yafshar
Sentence Transformers
- Update sentence transformer to v3.2.1 #1470 @ZhengHongming888
Textual Inversion XL
TIMM
- Enable pyTorch-IMage-Models (TIMM) with HPUs #1459 @ZhengHongming888
Context Parallelism
- Adding support for Context Parallelism using Deepseed's DistributedAttention #1501 @bhargaveede
- Move parallel_state.py to the distributed folder a6ee7c2044e6ddf7d19ae3ad663149e51d6f89e7 @regisss
CI improvements
- Tests for text gen output text #1411 @vidyasiv
- Add split runners to CI (2 devices per runner for fast tests) 72df37df46d1d2a2665c5d1be43b13704b7c8ada @regisss
- Fix fast CI to work with split runners #1534 @regisss
- Add Llama 3.1 ft to CI #1529 @MohitIntel
Documentation
Other
- Fix facebook/hf-seamless-m4t-medium crash #1433 @sywangyi
- Fix bias update in scoped all reduce #1456 @skavulya
- fea(pytests): Added skip for unsuported tests for mistral/mixtral #1462 @imangohari1
- Remove deprecated Mixed precision flags #1471 @vivekgoe
- Readme: replace tabs with spaces #1485 @mgonchar
- Move fast tests to Gaudi2 #1498 @regisss
- Remove torch req from LM example #1491 @astachowiczhabana
- Remove keep_input_mutations #1492 @astachowiczhabana
- Fix trust_remote_code #1493 @astachowiczhabana
- Upgrade ViT README with torch.compile #1494 @astachowiczhabana
- Corrected Throughput measure for GaudiDDPMPipeline #1460 @deepak-gowda-narayana
- [SW-196761] Add G3 in T5-L README #1523 @astachowiczhabana
- Fix tuple object error #1354 @SupreetSinghPalne
- Add warmup time and compile time log for the eval/prediction. #1489 @jiminha
- Add support for MLPERF optimized pipeline from example #1465 @ANSHUMAN87
- Add check_neural_compressor_min_version for 4 bit behavior #1500 @xin3he
- Pass "lazy_mode" arg to GaudiLlamaModel GaudiTrainer #1515 @astachowiczhabana
- Removed workaround for NaN bug causing graph break. #1516 @astachowiczhabana
- text_generation: improve parameters check #1527 @mgonchar
- transformers: fixed some typos #1528 @mgonchar
- Makes the with_stack of the profiler changeable #1497 @ranzhejiang
- Fix dtype issue with valid sequence length in torch.compile bs=1 #1532 @wszczurekhabana
- Migrate OH CLIP (roberta-clip) training to torch.compile #1507 @chaojun-zhang
- test_text_generation: fix non-Gaudi2 case #1530 @mgonchar
- text-generation: improve output printing #1486 @mgonchar
- Text-generation, model set-up: torch.compile for attributes instead of models' types #1452 @dsmertin
- Fix bridgetower example #1481 @astachowiczhabana
- Migrate OH Wave2Vec-AC training to torch.compile - README update #1537 @astachowiczhabana
- Migrate OH T5-large training to torch.compile #1506 @chaojun-zhang
- trainer: fixed spelling #1538 @mgonchar
- Create CI Eager/Lazy for Language Modeling #1448 @Luca-Calabria
- Fixes for llava-next test failures in 1.19 #1535 @tthakkal
- Refactor Qwen2 Family #1541 @Wei-Lin-Intel
- Add support for optimized SDXL pipeline #1519 @sushildubey171
- Add the checkout parameters of falcon-mamba pytest #1540 @yuanwu2017
- Avoid negative values in eval metrics #1533 @deepak-gowda-narayana
- Fix lm_eval script for starcoder and gemma #1463 @skavulya
- Add option to use bf16 in PT sdp (#5) #1514 @astachowiczhabana
- Fix tests.test_peft_inference failure #1543 @sywangyi
- Update lm_eval version #1473 @alexey-belyakov
- Fix bad import in Baichuan code #1547 @regisss
- Restore performance in generate #1546 @ugolowic
- Fix for llava models not generating text with test failures in 1.19 #1548 @tthakkal
- Refactor KV cache, Rope , reduce common code #1148 @abhilash1910
- Adjust Qwen2-7B test case #1551 @Wei-Lin-Intel
- [run_lm_eval.py] Fixed too many print dump json info #1553 @FocusLuo
- Fix for single_card llama7b and falcon40b CI errors #1549 @MohitIntel
- Apply --sdp_on_bf16 to image-to-text examples #1557 @schoi-habana
- Fix accuracy regression in Gemma #1556 @skavulya
- Fix FusedSDPA wrapper from TransformerEngine #1562 @pbielak
- Run albert-xxlarge-v1 CI as torch.compile mode #1563 @yeonsily
- Update README commands for the models to use --sdp_on_bf16 #1566 @yeonsily
- Minicpm patch #1567 @pi314ever
- Updated gemma_2b_it CI #1561 @Luca-Calabria
- Fixed Adalora Test for OH 1.15 #1564 @npiroozan
- Fixed LORACP Test for OH 1.15 #1568 @npiroozan
- Fix prefix llama ci failure #1570 @sywangyi
- Fix mllama test #1569 @sywangyi
- Fix lazy_mode assignment #1558 @vidyasiv
- Generation utils update (minor) #1468 @yafshar
- Style: removed tabs #1577 @mgonchar
- Enable num_return_sequences in beam search #1536 @mengker33
- gpt_bigcode: added internal bucketing fix #1526 @mgonchar
- Update the Gaudi trainer with transformers 4.45.2 #1398 @yafshar
- Revert "add check_neural_compressor_min_version for 4 bit behavior" #1578 @xin3he
- Revert PR #1473 #1582 @regisss
- Fixed spelling #1576 @mgonchar
- Update docs for baichuan2 training #1586 @xhaihao
- Add WA flag for falcon-180b to resolve text-gen critical reset error during tests #1590 @hchauhan123
- Update transformers tests generation util v4.45.2 #1441 @malkomes
- Limit position embeddings in inference #1598 @bhargaveede
- Verify model output is provided when check_output is enabled #1597 @vidyasiv
- Update README.md #1595 @skaulintel
- Fix scikit-learn to 1.5.2 to fix f1 evaluation crash in 1.6.0 #1596 @sywangyi
- Update language-modeling README file #1599 @vivekgoe
- Revert common KVCache not to check token_idx #1594 @jiminha
- Revert LlamaKVCache due to memory increase #1605 @jiminha
- Replace the UNET custom attention processors #1608 @yafshar
- Fix run_generation test commands for TRL out usage example #1621 @shepark
- Update sdp_on_bf16 option for ST example #1615 @ZhengHongming888
- Update save lora weights for diffusers with text_encoder_2 layers #1626 @skavulya
- Fix save_lora_weights in pipeline_utils.py #1643 @regisss
- Check rope_scaling attr #1609 @jiminha
- Skip certain tests for G1 with empty param list #1613 @hsubramony
- Revert "Update transformers tests generation util v4.45.2 (#1441)" #1614 @yeonsily
- Audio classification readme update #1604 @hsubramony
- Fix readme cmds for clip-roberta #1603 @hsubramony
- Add arbitrary scales #1625 @jiminha
- Modify Qwen2 TRL command to avoid OOM. #1630 @jiminha
- Fix distributed issue for ST Trainer #1649 @ZhengHongming888
- Fix distributed issue for timm #1653 @ZhengHongming888
- Refactor mixtral moe block. #1635 @lkk12014402
- Speech-recognition: downgrade datasets version #1646 @hsubramony
- Add sdp_on_bf16 to controlnet #1631 @skaulintel
- Quick fix for quantization/custom op list loading #1657 @dsocek
- Fix bug for GaudiMixtralAttentionLongSequence forward #1650 @kaixuanliu