Version v0.1.8 Release Today!
What's Changed
Hotfix
- [hotfix] torchvison fx unittests miss import pytest (#1277) by Jiarui Fang
- [hotfix] fix an assertion bug in base schedule. (#1250) by YuliangLiu0306
- [hotfix] fix sharded optim step and clip_grad_norm (#1226) by ver217
- [hotfix] fx get comm size bugs (#1233) by Jiarui Fang
- [hotfix] fx shard 1d pass bug fixing (#1220) by Jiarui Fang
- [hotfix]fixed p2p process send stuck (#1181) by YuliangLiu0306
- [hotfix]different overflow status lead to communication stuck. (#1175) by YuliangLiu0306
- [hotfix]fix some bugs caused by refactored schedule. (#1148) by YuliangLiu0306
Tensor
- [tensor] distributed checkpointing for parameters (#1240) by Jiarui Fang
- [tensor] redistribute among different process groups (#1247) by Jiarui Fang
- [tensor] a shorter shard and replicate spec (#1245) by Jiarui Fang
- [tensor] redirect .data.get to a tensor instance (#1239) by HELSON
- [tensor] add zero_like colo op, important for Optimizer (#1236) by Jiarui Fang
- [tensor] fix some unittests (#1234) by Jiarui Fang
- [tensor] fix a assertion in colo_tensor cross_entropy (#1232) by HELSON
- [tensor] add unitest for colo_tensor 1DTP cross_entropy (#1230) by HELSON
- [tensor] torch function return colotensor (#1229) by Jiarui Fang
- [tensor] improve robustness of class 'ProcessGroup' (#1223) by HELSON
- [tensor] sharded global process group (#1219) by Jiarui Fang
- [Tensor] add cpu group to ddp (#1200) by Jiarui Fang
- [tensor] remove gpc in tensor tests (#1186) by Jiarui Fang
- [tensor] revert local view back (#1178) by Jiarui Fang
- [Tensor] rename some APIs in TensorSpec and Polish view unittest (#1176) by Jiarui Fang
- [Tensor] rename parallel_action (#1174) by Ziyue Jiang
- [Tensor] distributed view supports inter-process hybrid parallel (#1169) by Jiarui Fang
- [Tensor] remove ParallelAction, use ComputeSpec instread (#1166) by Jiarui Fang
- [tensor] add embedding bag op (#1156) by ver217
- [tensor] add more element-wise ops (#1155) by ver217
- [tensor] fixed non-serializable colo parameter during model checkpointing (#1153) by Frank Lee
- [tensor] dist spec s2s uses all-to-all (#1136) by ver217
- [tensor] added repr to spec (#1147) by Frank Lee
Fx
- [fx] added ndim property to proxy (#1253) by Frank Lee
- [fx] fixed tracing with apex-based T5 model (#1252) by Frank Lee
- [fx] refactored the file structure of patched function and module (#1238) by Frank Lee
- [fx] methods to get fx graph property. (#1246) by YuliangLiu0306
- [fx]add split module pass and unit test from pipeline passes (#1242) by YuliangLiu0306
- [fx] fixed huggingface OPT and T5 results misalignment (#1227) by Frank Lee
- [fx]get communication size between partitions (#1224) by YuliangLiu0306
- [fx] added patches for tracing swin transformer (#1228) by Frank Lee
- [fx] fixed timm tracing result misalignment (#1225) by Frank Lee
- [fx] added timm model tracing testing (#1221) by Frank Lee
- [fx] added torchvision model tracing testing (#1216) by Frank Lee
- [fx] temporarily used (#1215) by XYE
- [fx] added testing for all albert variants (#1211) by Frank Lee
- [fx] added testing for all gpt variants (#1210) by Frank Lee
- [fx]add uniform policy (#1208) by YuliangLiu0306
- [fx] added testing for all bert variants (#1207) by Frank Lee
- [fx] supported model tracing for huggingface bert (#1201) by Frank Lee
- [fx] added module patch for pooling layers (#1197) by Frank Lee
- [fx] patched conv and normalization (#1188) by Frank Lee
- [fx] supported data-dependent control flow in model tracing (#1185) by Frank Lee
Rename
- [rename] convert_to_dist -> redistribute (#1243) by Jiarui Fang
Checkpoint
- [checkpoint] save sharded optimizer states (#1237) by Jiarui Fang
- [checkpoint]support generalized scheduler (#1222) by Yi Zhao
- [checkpoint] make unitest faster (#1217) by Jiarui Fang
- [checkpoint] checkpoint for ColoTensor Model (#1196) by Jiarui Fang
Polish
Refactor
- [refactor] move process group from _DistSpec to ColoTensor. (#1203) by Jiarui Fang
- [refactor] remove gpc dependency in colotensor's _ops (#1189) by Jiarui Fang
- [refactor] move chunk and chunkmgr to directory gemini (#1182) by Jiarui Fang
Context
- [context]support arbitary module materialization. (#1193) by YuliangLiu0306
- [context]use meta tensor to init model lazily. (#1187) by YuliangLiu0306
Ddp
- [ddp] ColoDDP uses bucket all-reduce (#1177) by ver217
- [ddp] refactor ColoDDP and ZeroDDP (#1146) by ver217
Colotensor
- [ColoTensor] add independent process group (#1179) by Jiarui Fang
- [ColoTensor] rename APIs and add output_replicate to ComputeSpec (#1168) by Jiarui Fang
- [ColoTensor] improves init functions. (#1150) by Jiarui Fang
Zero
- [zero] sharded optim supports loading local state dict (#1170) by ver217
- [zero] zero optim supports loading local state dict (#1171) by ver217
Workflow
- [workflow] polish readme and dockerfile (#1165) by Frank Lee
- [workflow] auto-publish docker image upon release (#1164) by Frank Lee
- [workflow] fixed release post workflow (#1154) by Frank Lee
- [workflow] fixed format error in yaml file (#1145) by Frank Lee
- [workflow] added workflow to auto draft the release post (#1144) by Frank Lee
Gemini
Pipeline
- [pipeline]add customized policy (#1139) by YuliangLiu0306
- [pipeline]support more flexible pipeline (#1138) by YuliangLiu0306
Ci
Full Changelog: v0.1.8...v0.1.7