Release Hidet v0.2.3 · hidet-org/hidet

What's Changed

[Version] Bump version to v0.2.3.dev by @yaoyaoding in #144
[Workflow] Update workflow to use the stable version of pytorch by @yaoyaoding in #145
[Operator] Resolve matmul to batch_matmul when lower than sm80 by @hjjq in #146
[Dynamo] non-linear operator support + tests by @AndreSlavescu in #143
Remove tutorial msg by @LDY1998 in #149
[BUG] Conversion compile issue by @xinli-git in #150
[Dynamo] Fix dynamo tests and dump graph IR by @xinli-git in #153
[CI] Benchmark periodically by @yaoyaoding in #155
[CI] Update bench script by @yaoyaoding in #156
[CI] Add more env information to benchmark script by @yaoyaoding in #157
[CI] Remove benchmark workflow, but run it in dedicated server by @yaoyaoding in #159
[CI] Update benchmark script by @yaoyaoding in #160
[CI] Change the search space in benchmark script from 0 to 2 by @yaoyaoding in #161
[CI] Update benchmark script by @yaoyaoding in #162
[CI] Update benchmark scripts by @yaoyaoding in #163
[IR][Pass] Refactor the fusion implementation by @yaoyaoding in #164
[Dynamo] Add operator support to run UNet2DConditionModel from diffusers by @xinli-git in #151
[IR][Dynamic Shape] Enhance the Tensor Program IR to support dynamic shape by @yaoyaoding in #165
[Operator] Allow matmul_f16 fuse epilogue by @yaoyaoding in #167
[CI] Update benchmark script by @yaoyaoding in #168
[CUDA] Lazy initializing cuda context by @yaoyaoding in #169
[Fixbug] Allow one backend fail in benchmark script by @yaoyaoding in #170
[Fixbug] Use auto-scheduler for fp64 reduction by @yaoyaoding in #171
[Operator] Add gather operator and torch.zeros, torch.neg mapping by @yaoyaoding in #174
[CI] Update benchmark script by @yaoyaoding in #179
[Fixbug] Add _stacklevel to pytorch softmax mapping by @yaoyaoding in #178
[IR] Add unroll pragma for loop statement by @yaoyaoding in #180
[Operator] Flash Attention by @hjjq in #175
[Fixbug] Fix a bug in the mapping from device to its memory pool by @yaoyaoding in #181
[Dynamo] Small enchancements for graph dump ir and task arguments by @xinli-git in #172
[Docs] Update install instruction by @hjjq in #182
change norm to use smaller inputs to reduce running time by @xinli-git in #185
[IR] Add explicit unroll by @yaoyaoding in #184
[Runtime] Allow pass torch tensor to PackedFunc directly by @yaoyaoding in #183
Refactor codegen to separate GPU/CPU code generation by @LDY1998 in #176
[Pass] Support inline function by @yaoyaoding in #186

New Contributors

@LDY1998 made their first contribution in #149

Full Changelog: v0.2.2...v0.2.3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hidet v0.2.3

What's Changed

New Contributors

Contributors