-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvements & a bug #77
Comments
Thank you @dzaima for your advice! |
The default for an intrinsic is to be tail-agnostic, e.g. In my intrinsics viewer you can click on the options under The QEMU options mentioned change its behavior on those elements as the RVV spec allows implementation behavior to vary - by default QEMU makes tail-agnostic and tail-undisturbed behave the same, but the options I mentioned make them set all-bits- Your |
Thank you so much for the explanation. And your intrinsics viewer is an awesome work!! |
Some notes of better possible implementations from me scrolling through this for a bit:
blendv
uses e.g.__riscv_vmsne_vx_i64m1_b64(__riscv_vsra_vx_i64m1(x, 63, 2), 0, 2)
where__riscv_vmslt_vx_i64m1_b64(x, 0, 2)
would do.Expanding vbool to a mask: in e.g.
__riscv_vmerge_vvm_i64m1(__riscv_vmv_v_x_i64m1(0, 2), __riscv_vmv_v_x_i64m1(UINT64_MAX, 2), mask, 2)
, a_vxm_
version can be used, giving__riscv_vmerge_vxm_i64m1(__riscv_vmv_v_x_i64m1(0, 2), -1, mask, 2)
. This compiles to avmerge.vim
, with the-1
as an immediate._sd
/_ss
functions: a tail-undisturbed op can be used to preserve the top element(s), e.g.Your current definitions for them don't always behave correctly as your
__riscv_vslideup_vx
s need a_tu
; you can observe tests failing withrvv_ta_all_1s=on,rvv_ma_all_1s=on
added to QEMU's-cpu
Widening ops should LMUL-truncate the input, not output, to avoid overly large temporary registers. And
_vf4
/_vf8
can be used too:_mm_mulhi_epu16
&_mm_mulhi_epi16
have exact RVV equivalents without any temporary widening -__riscv_vmulhu_vv_u16m1
&__riscv_vmulh_vv_i16m1
._mm_mullo_epi16
&_mm_mullo_epi32
are just__riscv_vmul_vv_i16m1
&__riscv_vmul_vv_i32m1
.(additionally, clang gives (harmless) warnings on
__riscv_vmv_v_x_i16m1(UINT16_MAX, 8)
(also for UINT8 too), as that's passing an unsigned value to a signed parameter.__riscv_vmv_v_x_i16m1(-1, 8)
is both shorter and avoids the warning (but most places with these should be usingvmerge_vxm
or similar anyway); to cross-compile with clang all you need to do is add--target=riscv64-linux-gnu
toclang++
invocations)The text was updated successfully, but these errors were encountered: