Hidet v0.2.4
What's Changed
- [Version] Bump version to v0.2.4.dev by @yaoyaoding in #188
- [Dynamo] module tests + operator support by @AndreSlavescu in #148
- Refactor compilation workflow to support CPU without CUDA by @LDY1998 in #189
- [Stack] Allow the the ulimit stack size less than expected by @yaoyaoding in #195
- [Readme] Add platform requirements by @yaoyaoding in #196
- [DataType] Add complex64 and complex128 data type by @yaoyaoding in #200
- [Example] Add an example of running GPT-2 model by @yaoyaoding in #203
- [Fusion] Use inline pass in fusion to allow template call functions with kernel params by @yaoyaoding in #197
- [Frontend][Operator] Add missing operators for dinov2 by @yaoyaoding in #206
- [Backend] Add openmp support by @yaoyaoding in #208
- [Operator] Update batch_matmul to use Hidet Script by @hjjq in #207
- [Cache] Add cache management command line interface by @yaoyaoding in #212
- [IR] Creation-time constant fold for constant expressions by @yaoyaoding in #209
- [Torch][Operator] Allow change torch tensor device when possible by @yaoyaoding in #214
- [Torch][Operator] Add op mapping for torch.min/max/minimum/maximum by @yaoyaoding in #216
- [Typo] Fix a typo in resnext.py by @eltociear in #210
- [Operator] Adding missing operators for llama by @yaoyaoding in #219
- [IR] Adding more support for dynamic shape on Task and FlowGraph level by @yaoyaoding in #220
- [Torch] Add mapping for
torch.ops.aten.add
andtorch.ops.aten.cos
by @yaoyaoding in #223 - [Operator][Backend] Add nvcc flags for faster math and update Attention schedule by @hjjq in #221
- [CI] Always clear the cache before tests by @yaoyaoding in #224
- fix batch_matmul for invalid mma config for sm < 80 by @xinli-git in #227
- [Dynamic Shape] Adding more dynamic shape support by @yaoyaoding in #228
- [CI] Add
importlib_metadata
torequirements-dev.txt
by @yaoyaoding in #233 - [Script] Add list comprehension support in hidet script by @yaoyaoding in #235
- [Refactor][Dynamic Shape] Introduce SymbolVar to implement dynamic shape by @yaoyaoding in #236
- [Script] Add pointer arthematic by @yaoyaoding in #237
- [Operator][Torch] Add causal fmha and torch sdpa mapping by @hjjq in #238
- [Fixbug][Pass] Fix a bug in the
inline_let_stmt
pass by @yaoyaoding in #240 - [Options] Add option for controlling parallel build with number of jobs or memory reserved for each job by @xinli-git in #230
- [Typo] Fix a typo by @BolinSNLHM in #245
- [Typo] Fix minor spelling mistake by @Aalanli in #246
- [Fixbug] Fix a bug in StmtRewriter which discard declare scope information by @yaoyaoding in #248
- [Refactor] Adding support for compiled model by @yaoyaoding in #247
- [Operator] batch_matmul: Remove duplicate smem declaration by @hjjq in #249
- [Operator] Adding CPU support for matrix multiplication by @BolinSNLHM in #251
- [Hidet Script] Allow
bind_tuple
argument inmapping.on(...)
andgrid(...)
by @yaoyaoding in #254 - [Hidet Script] Add
in
andnot in
expression in hidet script by @yaoyaoding in #255 - [Codegen] Include header files as needed by @yaoyaoding in #256
- [Operator] Add new operator "normalize" that makes a group of layers (layer norm, group norm and instance norm) faster using hidet script by @xinli-git in #257
- [Testing][Models] Add gpt2 module in testing models by @yaoyaoding in #252
- [Fixbug] Fix test warnings and the incompatibility of two recent PRs by @yaoyaoding in #258
- [Operator] Add sm75 support for attention by @hjjq in #259
- [Operator] batch_matmul: Remove unroll and reduce tuning space by @hjjq in #260
- [Fixbug] Fix a bug when fused operator has no input by @yaoyaoding in #263
- [Graph] Translate softmax and reduce to hidet script by @Aalanli in #242
- [Fixbug] batch_matmul: move cc checking inside schedule by @hjjq in #264
- [Refactor] Refactor building system and adding compiled products by @yaoyaoding in #261
- [Fixbug] Reduce the default unroll factor to 4 by @yaoyaoding in #266
- [Torch] Add some torch frontend mappings for roberta-base by @hjjq in #267
- [Refactor] Remove
schedules
submodule underhidet.graph.ops
by @yaoyaoding in #269 - [Device] Add support for mixed cpu and cuda kernels in the same flow graph by @yaoyaoding in #270
- [Dynamic Shape] Adding dynamic shape support for reduce by @Aalanli in #268
- [Complex Type] Add more support for complex data type by @yaoyaoding in #271
- [Tools] Model translator by @Aalanli in #273
- [Model] Llama model implementation in hidet by @Aalanli in #243
- [Operator] Add support for cross attention by @hjjq in #275
- [Operator] Add dynamic shape support and tests for Operators. by @Aalanli in #274
- [Fusion] Enhance the prologue epilogue fusion by @yaoyaoding in #277
- [Drivers] Suppress OSError by @hjjq in #278
- [Dynamic Shape] More correctness guards by @Aalanli in #276
- [Operator] Make Convolution gemms fusible by resolving to batch_matmul by @hjjq in #279
- [External Tasks] Move task build into method call for external kernel support by @xinli-git in #282
- [Distributed] add nccl primitives by @soodoshll in #280
- [Operators] Conv2d fp16 implicit gemm kernel by @Aalanli in #283
New Contributors
- @eltociear made their first contribution in #210
- @BolinSNLHM made their first contribution in #245
- @Aalanli made their first contribution in #246
Full Changelog: v0.2.3...v0.2.4