Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add stream_sp_mem kernel #651

Merged
merged 2 commits into from
Nov 27, 2024
Merged

Conversation

fairydreaming
Copy link

This PR adds stream_sp_mem kernel. The kernel is based on stream_sp kernel, the only difference is that movss store instructions were replaced by a combination of:

  • unpcklps instructions that unpack scalar values from FPR1-FPR4 into FPR1 and values from FPR5-FPR8 into FPR5
  • movntps instructions that store values from FPR1 and FPR5 into memory

Please verify the number of UOPS, as I'm not 100% sure about this.

@TomTheBear
Copy link
Member

UOPS should be 32+8+2 = 42

@fairydreaming
Copy link
Author

UOPS should be 32+8+2 = 42

Fixed.

@TomTheBear TomTheBear merged commit 20a8e84 into RRZE-HPC:master Nov 27, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants