Version v0.1.11rc1 Release Today!
What's Changed
Hotfix
- [hotfix] resharding cost issue (#1742) by YuliangLiu0306
- [hotfix] solver bug caused by dict type comm cost (#1686) by YuliangLiu0306
- [hotfix] fix wrong type name in profiler (#1678) by Boyuan Yao
- [hotfix]unit test (#1670) by YuliangLiu0306
- [hotfix] add recompile after graph manipulatation (#1621) by YuliangLiu0306
- [hotfix] got sliced types (#1614) by YuliangLiu0306
Release
Doc
- [doc] update recommendation system catalogue (#1732) by binmakeswell
- [doc] update recommedation system urls (#1725) by Jiarui Fang
Zero
- [zero] add chunk init function for users (#1729) by HELSON
- [zero] add constant placement policy (#1705) by HELSON
Pre-commit
Autoparallel
- [autoparallel] runtime_backward_apply (#1720) by YuliangLiu0306
- [autoparallel] moved tests to test_tensor_shard (#1713) by Frank Lee
- [autoparallel] resnet block runtime apply (#1709) by YuliangLiu0306
- [autoparallel] fixed broken node handler tests (#1708) by Frank Lee
- [autoparallel] refactored the autoparallel module for organization (#1706) by Frank Lee
- [autoparallel] adapt runtime passes (#1703) by YuliangLiu0306
- [autoparallel] collated all deprecated files (#1700) by Frank Lee
- [autoparallel] init new folder structure (#1696) by Frank Lee
- [autoparallel] adapt solver and CostGraph with new handler (#1695) by YuliangLiu0306
- [autoparallel] add output handler and placeholder handler (#1694) by YuliangLiu0306
- [autoparallel] add pooling handler (#1690) by YuliangLiu0306
- [autoparallel] where_handler_v2 (#1688) by YuliangLiu0306
- [autoparallel] fix C version rotor inconsistency (#1691) by Boyuan Yao
- [autoparallel] added sharding spec conversion for linear handler (#1687) by Frank Lee
- [autoparallel] add reshape handler v2 and fix some previous bug (#1683) by YuliangLiu0306
- [autoparallel] add unary element wise handler v2 (#1674) by YuliangLiu0306
- [autoparallel] add following node generator (#1673) by YuliangLiu0306
- [autoparallel] add layer norm handler v2 (#1671) by YuliangLiu0306
- [autoparallel] fix insecure subprocess (#1680) by Boyuan Yao
- [autoparallel] add rotor C version (#1658) by Boyuan Yao
- [autoparallel] added utils for broadcast operation (#1665) by Frank Lee
- [autoparallel] update CommSpec (#1667) by YuliangLiu0306
- [autoparallel] added bias comm spec to matmul strategy (#1664) by Frank Lee
- [autoparallel] add batch norm handler v2 (#1666) by YuliangLiu0306
- [autoparallel] remove no strategy nodes (#1652) by YuliangLiu0306
- [autoparallel] added compute resharding costs for node handler (#1662) by Frank Lee
- [autoparallel] added new strategy constructor template (#1661) by Frank Lee
- [autoparallel] added node handler for bmm (#1655) by Frank Lee
- [autoparallel] add conv handler v2 (#1663) by YuliangLiu0306
- [autoparallel] adapt solver with gpt (#1653) by YuliangLiu0306
- [autoparallel] implemented all matmul strategy generator (#1650) by Frank Lee
- [autoparallel] change the following nodes strategies generation logic (#1636) by YuliangLiu0306
- [autoparallel] where handler (#1651) by YuliangLiu0306
- [autoparallel] implemented linear projection strategy generator (#1639) by Frank Lee
- [autoparallel] adapt solver with mlp (#1638) by YuliangLiu0306
- [autoparallel] Add pofo sequence annotation (#1637) by Boyuan Yao
- [autoparallel] add elementwise handler (#1622) by YuliangLiu0306
- [autoparallel] add embedding handler (#1620) by YuliangLiu0306
- [autoparallel] protect bcast handler from invalid strategies (#1631) by YuliangLiu0306
- [autoparallel] add layernorm handler (#1629) by YuliangLiu0306
- [autoparallel] recover the merged node strategy index (#1613) by YuliangLiu0306
- [autoparallel] added new linear module handler (#1616) by Frank Lee
- [autoparallel] added new node handler (#1612) by Frank Lee
- [autoparallel]add bcast matmul strategies (#1605) by YuliangLiu0306
- [autoparallel] refactored the data structure for sharding strategy (#1610) by Frank Lee
- [autoparallel] add bcast op handler (#1600) by YuliangLiu0306
- [autoparallel] added all non-bcast matmul strategies (#1603) by Frank Lee
- [autoparallel] added strategy generator and bmm strategies (#1602) by Frank Lee
- [autoparallel] add reshape handler (#1594) by YuliangLiu0306
- [autoparallel] refactored shape consistency to remove redundancy (#1591) by Frank Lee
- [autoparallel] add resnet autoparallel unit test and add backward weight communication cost (#1589) by YuliangLiu0306
- [autoparallel] added generate_sharding_spec to utils (#1590) by Frank Lee
- [autoparallel] added solver option dataclass (#1588) by Frank Lee
- [autoparallel] adapt solver with resnet (#1583) by YuliangLiu0306
Fx/meta/rpc
- [fx/meta/rpc] move _meta_registration.py to fx folder / register fx functions with compatibility checks / remove color debug (#1710) by Super Daniel
Embeddings
- [embeddings] add doc in readme (#1711) by Jiarui Fang
- [embeddings] more detailed timer (#1692) by Jiarui Fang
- [embeddings] cache option (#1635) by Jiarui Fang
- [embeddings] use cache_ratio instead of cuda_row_num (#1611) by Jiarui Fang
- [embeddings] add already_split_along_rank flag for tablewise mode (#1584) by CsRic
Unittest
- [unittest] added doc for the pytest wrapper (#1704) by Frank Lee
- [unittest] supported condititonal testing based on env var (#1701) by Frank Lee
Embedding
- [embedding] rename FreqAwareEmbedding -> CachedEmbedding (#1699) by Jiarui Fang
- [embedding] polish async copy (#1657) by Jiarui Fang
- [embedding] add more detail profiling (#1656) by Jiarui Fang
- [embedding] print profiling results (#1654) by Jiarui Fang
- [embedding] non-blocking cpu-gpu copy (#1647) by Jiarui Fang
- [embedding] isolate cache_op from forward (#1645) by CsRic
- [embedding] rollback for better FAW performance (#1625) by Jiarui Fang
- [embedding] updates some default parameters by Jiarui Fang
Fx/profiler
- [fx/profiler] assigned UUID to each unrecorded tensor/ improved performance on GPT-2 (#1679) by Super Daniel
- [fx/profiler] provide a table of summary. (#1634) by Super Daniel
- [fx/profiler] tuned the calculation of memory estimation (#1619) by Super Daniel
Pipeline/fix-bug
- [pipeline/fix-bug] num_microbatches support any integrate | stable chimera | launch tool for rpc pp framework (#1684) by Kirigaya Kazuto
Pipeline/rank_recorder
- [pipeline/rank_recorder] fix bug when process data before backward | add a tool for multiple ranks debug (#1681) by Kirigaya Kazuto
Feature
- [feature] A new ZeRO implementation (#1644) by HELSON
- Revert "[feature] new zero implementation (#1623)" (#1643) by Jiarui Fang
- [feature] new zero implementation (#1623) by HELSON
Fx
- [fx] Add concrete info prop (#1677) by Boyuan Yao
- [fx] refactor code for profiler / enable fake tensor movement. (#1646) by Super Daniel
- [fx] fix offload codegen test (#1648) by Boyuan Yao
- [fx] Modify offload codegen (#1618) by Boyuan Yao
- [fx] PoC of runtime shape consistency application (#1607) by YuliangLiu0306
- [fx] Add pofo solver (#1608) by Boyuan Yao
- [fx] Add offload codegen (#1598) by Boyuan Yao
- [fx] provide an accurate estimation of memory. (#1587) by Super Daniel
- [fx] Improve linearize and rotor solver (#1586) by Boyuan Yao
- [fx] Add nested checkpoint in activation checkpoint codegen (#1585) by Boyuan Yao
Pipeline/pytree
- [pipeline/pytree] add pytree to process args and kwargs | provide
data_process_func
to process args and kwargs after forward (#1642) by Kirigaya Kazuto
Fix
Moe
- [moe] initialize MoE groups by ProcessGroup (#1640) by HELSON
- [moe] fix moe bugs (#1633) by HELSON
- [moe] fix MoE bugs (#1628) by HELSON
Tensor
- [tensor] use communication autograd func (#1617) by YuliangLiu0306
Pipeline/chimera
- [pipeline/chimera] test chimera | fix bug of initializing (#1615) by Kirigaya Kazuto
- [pipeline/chimera] reconstruct PipelineBase and Worker to support more feasible custom schedule | finish Chimera (#1595) by Kirigaya Kazuto
Workflow
Fx/tuning
- [fx/tuning] tune performance on rotor with meta info. (#1599) by Super Daniel
Hotfix/rotor
- [hotfix/rotor] fix variable names (#1597) by Super Daniel
Nfc
- [NFC] add OPT serving (#1581) by binmakeswell
- [NFC] polish ./colossalai/trainer/hooks/_lr_scheduler_hook.py code style (#1576) by Boyuan Yao
- [NFC] polish colossalai/zero/sharded_model/reduce_scatter.py code style (#1554) by Fazzie-Maqianli
- [NFC] polish utils/tensor_detector/init.py code style (#1573) by CsRic
- [NFC] polish colossalai/nn/lr_scheduler/multistep.py code style (#1572) by Sze-qq
- [NFC] polish colossalai/nn/lr_scheduler/torch.py code style (#1571) by superhao1995
- [NFC] polish colossalai/nn/parallel/data_parallel.py code style (#1570) by Jiatong Han
- [NFC] polish colossalai/pipeline/utils.py code style (#1562) by Zirui Zhu
- [NFC] polish colossalai/fx/tracer/meta_patch/patched_module/convolution.py code style (#1563) by Xue Fuzhao
- [NFC] polish colossalai/gemini/update/chunkv2.py code style (#1565) by Zangwei Zheng
- [NFC] polish colossalai/nn/layer/colossalai_layer/dropout.py code style (#1568) by DouJS
- [NFC] polish colossalai/utils/tensor_detector/tensor_detector.py code style (#1566) by LuGY
- [NFC] polish colossalai/nn/_ops/embedding.py code style (#1561) by BigOneLiXiaoMing
- [NFC] polish colossalai/builder/init.py code style (#1560) by Ziheng Qin
- [NFC] polish colossalai/testing/comparison.py code style. (#1558) by Super Daniel
- [NFC] polish colossalai/nn/layer/colossalai_layer/linear.py (#1556) by Ofey Chan
- [NFC] polish code colossalai/gemini/update/search_utils.py (#1557) by Kai Wang (Victor Kai)
- [NFC] polish colossalai/nn/_ops/layernorm.py code style (#1555) by yuxuan-lou
- [NFC] polish colossalai/nn/loss/loss_2p5d.py code style (#1553) by shenggan
- [NFC] polish colossalai/nn/_ops/embedding_bag.py code style (#1552) by Maruyama_Aya
- [NFC] polish colossalai/nn/lr_scheduler/cosine.py code style by binmakeswell
- [NFC] polish colossalai/utils/multi_tensor_apply/multi_tensor_apply.py code style (#1559) by Kirigaya Kazuto
Full Changelog: v0.1.11rc1...v0.1.10