-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: E4M3fnuz FP8 format added #281
Conversation
@maktukmak thank you for this pull-request.
|
7076e64
to
8c61881
Compare
@dacorvo, I excluded CUDA for |
This PR is stale because it has been open 15 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
@dacorvo , I fixed the style so it may pass the checks now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this pull-request !
Rebased and merged as #310 |
Currently, the E4M3 fP8 format implemented is ARM-Intel-Nvidia style. However, there is another style, IEEE 754 (torch name is float8_e4m3fnuz), which has different bit configuration and min-max values. This pull request aims to incorporate this style. The unit tests currently pass on CPU because it supports both styles. However, they will fail when tested on other devices. I need guidance on how to design the tests so that they only run a specific style based on the device. Once I have this information, I can complete this PR.