Hidet v0.4.0
What's Changed
- [Fix] Fixing an error triggered by the operator
any
(#369) by Bolin Sun 6a4c2e5 - [Fix] added torch.t for mobilebert-uncased model (#353) by zhumakhan 95d95a4
- [CI] Use same image for tests and publishing test execution (#463) by c-fteixeira 49fd332
- [BUG] fix bug in disallow in graph (#464) by Vadim Gimpelson d84f2c5
- [CI] Move Publish workflow to internal ARC runners (#461) by c-fteixeira b5d6aaf
- [CI] Update container for CI (#460) by Vadim Gimpelson b973591
- [Bug] Rename test_arithmetic.py -> test_arithmetic2.py (#459) by Vadim Gimpelson 6aa6cf8
- Update requirements-dev.txt to use pytorch version >= 2.3.0 (#458) by Vadim Gimpelson 6b32295
- [CI] Repeat start_instance (#361) by vadiklyutiy cf5cadd
- [Operators] Adding
leaky_relu
support (#360) by Bolin Sun 7401ccc - [Fix] Fixing an error triggered while compiling the
torch.nn.Upsample
module withalign_corners=True
(#344) by Bolin Sun 2c34cfc - [PERF] Remote workaround for loops in
add_hints_pass
(#356) by vadiklyutiy 3195be5 - [Operators] Registering tensor methods whose PyTorch function equivalents are supported by Hidet (#347) by Bolin Sun 44ab5ad
- [PERF] Introduce add_hint_pass (#355) by vadiklyutiy c014dab
- [CI] Promote nvidia docker container to version 24.4 (#354) by vadiklyutiy cb809b9
- [Fix] type casting for attention mask from fp32 -> f16 (#323) by zhumakhan 9a10dc0
- [Fix] Added missing torch.multiply and torch.nn.functional.unfold ops for conv-bert-base model (#351) by zhumakhan 18842ee
- [Fix] Fixing a bug in
register_methods
(#331) by Bolin Sun c87c515 - [Fix] Handling special cases in
setitem
regarding dtype and device (#332) by Bolin Sun ff9445e - [BUG] Fixed search_space bug in
bench_op.py
(#348) by vadiklyutiy 29e4c0e - [OPS] Dissallow in fxgraph not supported functions (#317) by vadiklyutiy 984cf75
- [OPTIONS] Remove dynamo_config['search_space'] (#342) by vadiklyutiy 0814bd8
- [Operator] Adding support for
torch.Tensor.view_as
(#334) by Bolin Sun 5f19dd0 - [Operators] Adding support for
torch.nn.TransformerEncoder
(#327) by Bolin Sun d625146 - [OPTIONS] Inherit
options
fromtorch.compile()
(#260) by vadiklyutiy 3638a0b - [Operator] Adding
__ge__
method for theTensor
class (#330) by Bolin Sun ed5feff - [Fix] Fixing an error triggered by
ClampOp
(#329) by Bolin Sun 05984cb - [Fix] Handling hidet errors caused by device difference in
getitem
(#322) by Bolin Sun 5a90820 - [Fix] Fixing a RuntimeError triggered by
tensor_reshape
function inregister_functions.py
(#328) by Bolin Sun 0cd2f83 - [Operators] Adding PyTorch operators encountered while compiling
DALLE2_pytorch
(#319) by Bolin Sun ecb99b1 - [Fix] Fix the bug in
tensor_expand
caused by attempting to modifyimmutable_list
(#320) by Bolin Sun bb89e22 - [Chore] replace copyrights with citations (#315) by xiaocenxiaocen 3fba091
- [Operator] Extending the functionality support for
einsum
(#312) by Bolin Sun 703e92a - Handle dtype and device in hidet.ones_like op (#316) by zhumakhan f031eb3
- [PERF] Reduce fixed overhead for model run (#310) by vadiklyutiy fadf67d
- Increase batch size for bert to decrease fluctuations (#236) by vadiklyutiy a8db40c
- Setitem with tensor values. And Boolean type promotion (#290) by zhumakhan 60e75ca
- [BUG] when device is None, device_from_torch returns 'cpu' by default. Fixed (#311) by zhumakhan d047440
- [Graph][Ops] fp32 accumulation for cute matmul (#292) by xiaocenxiaocen a813605
- [Perf] support vectorized epilogue fusion (#220) by xiaocenxiaocen ddacf36
- Removing constant tensors that are not needed after subgraph rewrite pass (#252) by zhumakhan db49f68
- [Fix] Handling
Tensor.to(..., device=....)
on symbolic tensors (#284) by Bolin Sun 6357880 - [Operator] torch.any (#287) by zhumakhan 8a42a65
- [Graph][Ops] fp32 accumulation for matmul_f16 (#268) by xiaocenxiaocen 5bf255a
- adding support for torch.any (#277) by zhumakhan 2c4c672
- fix: handles race condition on parallel config directory creation (#285) by c-fteixeira b465dd3
- [SCRIPTS] Adopt our scripts to use
mode
fromtorch.compile
(#274) by vadiklyutiy 0f825b3 - [Fix] Handling
getitem
special case (#281) by Bolin Sun 564561e - [Operator] Added advanced tensor indexing (#251) by zhumakhan 018ca2c
- [Operator] Adding support to
repeat_interleave
and more (#270) by Bolin Sun b52bc88 - [PERF] Increase accuracy of pick up the best candidate (#269) by vadiklyutiy 3834643
- [Operator] Registering
torch.Tensor.copy_
(#259) by Bolin Sun af5c893 - [OPTIONS] Use Attention by default (#261) by vadiklyutiy 33ad85b
- [Operator] Registering torch.sigmoid_ (#258) by Bolin Sun c9fb801
- [Operator] Adding support for
torch.Tensor.div
(#249) by Bolin Sun c8d4663 - [Operator] Adding
torch.Tensor.expand_as
support (#250) by Bolin Sun 923f078 - [Operator] Adding support to operators
torch.Tensor.max
andtorch.Tensor.new_full
(#238) by Bolin Sun c5912a4 - Delete options
use_fp16
anduse_fp16_reduction
(#239) by vadiklyutiy e7fe23b - Inherit
mode
argument fromtorch.compile
and set corresponding options (#237) by vadiklyutiy 91f666e - [Operators] Registering
torch.as_tensor
(#235) by Bolin Sun 540367b - [Operator] Registering
torch.Tensor.argmax
(#234) by Bolin Sun bdd7acd - [Ir][CuTE] lower cute dialect (#109) (#230) by xiaocenxiaocen 783a549
- Xiaocenxiaocen/expose more ldst instructions (#216) by xiaocenxiaocen 8f03f9e
- steal_weight option fixes && fixes for mistral model (#209) by zhumakhan 9728c21
- Fix issues related to mistral model (#213) by zhumakhan 68e801b
- [BENCHs] Refactor transformers tests. Add llama2, mistral, gemma, gpt2 to script (#210) by vadiklyutiy 59028d8
- [BUGFIX] Init cuda info before run forks for IR generation (#208) by vadiklyutiy 3012546
- [Ir] add utilities for CuTe (#107) by xiaocenxiaocen 423e112
- [BUG] Clear
_job_queue
inparallel_imap
for tests (#204) by vadiklyutiy bf39bd6 - [OPTIONS] Don't create hidet config if it's not exist (#203) by vadiklyutiy 294d261
- feat: parallel job execution for tests (#147) by c-fteixeira db588f9
- __getitem__ with N dimensional index tensor (#185) by zhumakhan f46a184
- [Fix] Remove YOLOv7 from tests/benchmarks/run_configs.json (#187) by Bolin Sun 5fc4271
- [Operator] Adding meshgrid operator support (#183) by Bolin Sun d8158a9
- [Bug] Fix number of groups under certain case (#181) by Max Hu 8a6cbfd
- [COMPTIME] Reduce the number of
fork
inmultithreading.Pool
(#180) by vadiklyutiy 9e576dc - [COMPTIME] Add
chunksize
arg topool.imap
(#178) by vadiklyutiy 7c50af6 - optimize grouping method (#174) by Max Hu 9b9a22b
- [App] SyncLLM + AsyncLLM interface (#166) by Jack Lee e51f0c0
- [Ir][Primitives] add hopper instructions (#83) by xiaocenxiaocen 4225298
- [OPS] Add
torch.Tensor.sin
,torch.Tensor.cos
andtorch._C._nn.pad
(#175) by vadiklyutiy 90a6231 - [App] ResNet Compiled App (2/2) - Pipeline (#165) by Kevin Tong d308f8f
- Revive dynamic shape support with
torch.compile
(#162) by vadiklyutiy cf343ab - [Models] Gemma implementation (#132) by Jack Lee 3a84820
- Support Transpose2D (#77) by zhiwei-fang dd2e9d2
- [App] Cleanup SD Implementation (#143) by Kevin Tong 359763e
- [Fixbug] Set _is_exiting correctly (#163) by Jack Lee 1c8b31f
- [App] Fix LLM app tracing (#158) by Jack Lee f618977
- [Operator] triu + tril operators (#146) by Jack Lee 70894fa
- Gemma+torch.compile fixes(autocast, rtruediv) (#159) by vadiklyutiy 710ac50
- [IR] [Primitives] Add thread cluster on sm_90 (#145) by Kevin Tong ccc28d6
- [App] Minor bugfixes for LLM app (#157) by Jack Lee 179f058
- [COMPTIME] Specialize
Constant._binary()
for compilation speedup (#148) by vadiklyutiy 8a1eab4 - [Operator] Fix symbolic broadcasting (#131) by Jack Lee 1252220
- [Operator] Register missing math primitives (#134) by Jack Lee 61b0052
- [Ir][Primitives] fix __shfl_xor_sync (#155) by xiaocenxiaocen 37c75a6
- [COMPTIME] Parallelize
apply_prologue_epilog
(fusion) and IR generation(implement*
) (#127) by vadiklyutiy 9e96c45 - [Graph] Enhance forward debug instrument (#130) by Jack Lee 4267686
- Stable Diffusion App Infra (#103) by Kevin Tong 8f03f9e
- [LLM App] LLM Application initial support (#121) by Yaoyao Ding fc61f48
- [Models] Support for tokenizers in C++ runtime (#69) by Jack Lee c14de4e
- [Graph] Add major UNet building components (#97) by Kevin Tong 364ba9c
- [CI] Add clang-format script/action (#120) by Jack Lee cdff99a
- [Graph] Stable Diffusion Rope Module (#95) by Kevin Tong 6fa5803
- [App] Complete UNet Definition (#99) by Kevin Tong 805620e
- [FFI] Refactor CompiledFunction interface with ctypes (#79) by Jack Lee a8c9d94
- [STYLE] Format cpp/h files (#454) by vadiklyutiy 1f1b011
- [cuDNN] Add cudnn conv2d (#453) by vadiklyutiy bc5a6df
Contributors
- @yaoyaoding
- @xiaocenxiaocen
- @vadiklyutiy
- @maxyanghu
- @BolinSNLHM
- @zhumakhan
- @c-fteixeira
- @jacklee1792
- @KTong821
- @zhiwei-fang
Full Changelog: v0.3.1...v0.4.0