Generalize matrix multiply to support `C = f.(A * B + x)` #56

chriselrod · 2021-01-21T22:03:17Z

This is for the same of implementing dense layers for neural networks.

The reverse pass also needs
g.(Cbar) * B' and A' * g.(Cbar), but the fact that we have two instances of g.(Cbar) here means we should probably evaluate g just once per element of Cbar.

However, perhaps somewhat related to #40, we should add support for batching matrix operations of different sizes as well -- in particular, the reverse pass of

gCbar = g.(Cbar)
Abar = gCbar * B'
Bbar = A' * gCbar

should perhaps be evaluated with one function call that can minimize the (already low) threading and synchronization overhead.

Ideally, we'd have an API of doing this a little more generically, but it'd help for allocating threads to know that a lot of the array dimension here are the same.

The text was updated successfully, but these errors were encountered:

DilumAluthge · 2021-01-21T22:49:37Z

I'm assuming that here g is the adjoint of f?

chriselrod · 2021-01-21T23:01:11Z

Yes, in the example if f === tanh then g(x) = x * muladd(-x, x, 1).

Although of course f and g would be arbitrary (so long as they're defined on numbers), it's just the primary use case and motivation would involve g = f'.

I think this is a compelling use case for Octavian. While performance is close to MKL at smallish sizes, I think we'd have a nice advantage over the current dense layer implementation which looks like this on the forward pass (x is the bias vector, normally called b):

C = A * B
@. C = f(C .+ x)

Note that the broadcast is single threaded. I imagine we could get a nice performance advantage at the sizes people would actually consider CPU-training over MKL, and definitely over the default OpenBLAS through this fusion and threading the entire operation.

DilumAluthge · 2021-01-23T20:39:39Z

I think this is a compelling use case for Octavian. While performance is close to MKL at smallish sizes, I think we'd have a nice advantage over the current dense layer implementation which looks like this on the forward pass (x is the bias vector, normally called b):
C = A * B
@. C = f(C .+ x)
Note that the broadcast is single threaded. I imagine we could get a nice performance advantage at the sizes people would actually consider CPU-training over MKL, and definitely over the default OpenBLAS through this fusion and threading the entire operation.

This is really exciting!

DilumAluthge · 2021-01-23T20:40:23Z

Note that the broadcast is single threaded.

Two questions:

Can we vectorize the broadcast by using @avx?
Would it make sense to try to multi-thread the broadcast?

chriselrod · 2021-01-24T21:20:58Z

Yes and yes.

The goal would be to just launch threads once per pass (i.e., once for forward, and once for back).
This would reduce threading overhead, and also let us make sure locality is good / minimize how much data has to be move between cores.

DilumAluthge added the enhancement New feature or request label Jan 21, 2021

DilumAluthge mentioned this issue Feb 25, 2021

Neural network meta-issue #70

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalize matrix multiply to support `C = f.(A * B + x)` #56

Generalize matrix multiply to support `C = f.(A * B + x)` #56

chriselrod commented Jan 21, 2021

DilumAluthge commented Jan 21, 2021

chriselrod commented Jan 21, 2021

DilumAluthge commented Jan 23, 2021

DilumAluthge commented Jan 23, 2021

chriselrod commented Jan 24, 2021 •

edited

Loading

Generalize matrix multiply to support C = f.(A * B + x) #56

Generalize matrix multiply to support C = f.(A * B + x) #56

Comments

chriselrod commented Jan 21, 2021

DilumAluthge commented Jan 21, 2021

chriselrod commented Jan 21, 2021

DilumAluthge commented Jan 23, 2021

DilumAluthge commented Jan 23, 2021

chriselrod commented Jan 24, 2021 • edited Loading

Generalize matrix multiply to support `C = f.(A * B + x)` #56

Generalize matrix multiply to support `C = f.(A * B + x)` #56

chriselrod commented Jan 24, 2021 •

edited

Loading