Hidet v0.2.3
What's Changed
- [Version] Bump version to v0.2.3.dev by @yaoyaoding in #144
- [Workflow] Update workflow to use the stable version of pytorch by @yaoyaoding in #145
- [Operator] Resolve matmul to batch_matmul when lower than sm80 by @hjjq in #146
- [Dynamo] non-linear operator support + tests by @AndreSlavescu in #143
- Remove tutorial msg by @LDY1998 in #149
- [BUG] Conversion compile issue by @xinli-git in #150
- [Dynamo] Fix dynamo tests and dump graph IR by @xinli-git in #153
- [CI] Benchmark periodically by @yaoyaoding in #155
- [CI] Update bench script by @yaoyaoding in #156
- [CI] Add more env information to benchmark script by @yaoyaoding in #157
- [CI] Remove benchmark workflow, but run it in dedicated server by @yaoyaoding in #159
- [CI] Update benchmark script by @yaoyaoding in #160
- [CI] Change the search space in benchmark script from 0 to 2 by @yaoyaoding in #161
- [CI] Update benchmark script by @yaoyaoding in #162
- [CI] Update benchmark scripts by @yaoyaoding in #163
- [IR][Pass] Refactor the fusion implementation by @yaoyaoding in #164
- [Dynamo] Add operator support to run UNet2DConditionModel from diffusers by @xinli-git in #151
- [IR][Dynamic Shape] Enhance the Tensor Program IR to support dynamic shape by @yaoyaoding in #165
- [Operator] Allow matmul_f16 fuse epilogue by @yaoyaoding in #167
- [CI] Update benchmark script by @yaoyaoding in #168
- [CUDA] Lazy initializing cuda context by @yaoyaoding in #169
- [Fixbug] Allow one backend fail in benchmark script by @yaoyaoding in #170
- [Fixbug] Use auto-scheduler for fp64 reduction by @yaoyaoding in #171
- [Operator] Add
gather
operator andtorch.zeros
,torch.neg
mapping by @yaoyaoding in #174 - [CI] Update benchmark script by @yaoyaoding in #179
- [Fixbug] Add
_stacklevel
to pytorch softmax mapping by @yaoyaoding in #178 - [IR] Add unroll pragma for loop statement by @yaoyaoding in #180
- [Operator] Flash Attention by @hjjq in #175
- [Fixbug] Fix a bug in the mapping from device to its memory pool by @yaoyaoding in #181
- [Dynamo] Small enchancements for graph dump ir and task arguments by @xinli-git in #172
- [Docs] Update install instruction by @hjjq in #182
- change norm to use smaller inputs to reduce running time by @xinli-git in #185
- [IR] Add explicit unroll by @yaoyaoding in #184
- [Runtime] Allow pass torch tensor to
PackedFunc
directly by @yaoyaoding in #183 - Refactor codegen to separate GPU/CPU code generation by @LDY1998 in #176
- [Pass] Support inline function by @yaoyaoding in #186
New Contributors
Full Changelog: v0.2.2...v0.2.3