-
Notifications
You must be signed in to change notification settings - Fork 689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Native BLAS operations slow? #854
Comments
That's surprising to me... I'd hope @luhenry would be open to adding an env variable or something to disable native code. Is it true even for very large matrices? I'd have thought there was a size at which native is going to win. If there's a threshold I'm happy to put one into Breeze (which we already do for dot product) |
@kwalcock would you have a reproducing case? Generally, calling into native would be slower for very small matrices (overhead of the call mostly). That would only be visible if you are doing many many operations. Happy to look into any case you share! :) |
I'll add that "lots of tiny matmuls/matvecs" is imho a valid use case to optimize for, either at the netlib level or the Breeze level |
I suspect the threshold would be different for everyone. Here we could run our program twice to measure and then pick native on or off. In the data I was working with, a typical problem has four multiplications of (57 * 768) x (768 * 768) and then 10,000 multiplications of (1 * 1536) x (1536 * 1536). That is then scaled to infinitely many problems. I don't know whether that is small, medium, or large. |
That's definitely big enough I would have thought native would win out
…On Fri, Nov 10, 2023 at 4:10 PM Keith Alcock ***@***.***> wrote:
I suspect the threshold would be different for everyone. Here we could run
our program twice to measure and then pick native on or off. In the data I
was working with, a typical problem has four multiplications of (57 * 768)
x (768 * 768) and then 10,000 multiplications of (1 * 1536) x (1536 *
1536). That is then scaled to infinitely many problems. I don't know
whether that is small, medium, or large.
—
Reply to this email directly, view it on GitHub
<#854 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAACLILVK2CNKUBBV4DJ6Y3YD266RAVCNFSM6AAAAAA7GRDZ3KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBWGU4DENZQG4>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
So far, any computer that is able to run with the native support for BLAS that is supplied by the netlib transitive dependency (i.e., Linux-aarch64 or Linux-amd64) runs my app about 3x slower than the Scala/Java implementation. Do others notice the same? I just have some fairly simple matrix multiplications and vector additions. Is there any way to just disable use of native code, because the Scala interface is really nice and I'd like to keep using it? In order to compare performance, I remove the .so files from the blas-3.0.1.jar file so that the native code fails to load, but I can't ship my project with that hack being necessary.
The text was updated successfully, but these errors were encountered: