Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensor layout should use strides #518

Open
ianmccul opened this issue Nov 24, 2024 · 1 comment
Open

Tensor layout should use strides #518

ianmccul opened this issue Nov 24, 2024 · 1 comment

Comments

@ianmccul
Copy link
Contributor

I don't think I'll get it done right now, so putting this here to track progress.

Currently, the shape of a tensor is controlled by _shape, a vector of dimensions of each rank, and _mapper, which I assume is a permutation of indices. There is also _invmapper, which I assume is the inverse permutation of _mapper.

This design has several disadvantages, including

  • Getting the memory location of some element (i,j,k,...) of a tensor is quite difficult
  • It doesn't support strided storage. If you have a tensor T(i,j,k), and you want to slice it to fix one of the dimensions, say T(i,5,k) as a rank-2 tensor, it cannot be done with this design (unless the 2nd index is the leading dimension of the storage, and even then the start of the tensor is no longer at zero).
  • Currently, although permute() is a fast operation, a lot of operations that you might do after that require that the storage is contiguous, so end up copying and rearranging memory anyway, even in cases where in principle it doesn't need to (such as transposing a matrix and then calling a BLAS or LAPACK function). In principle this could be avoided in the current design, in some cases, but the calculation of whether the layout is compatible with BLAS is too difficult.

The representation of the layout should change to store the dimension and stride of each rank. Something like (exposition only, an actual implementation wouldn't use two separate vectors):

vector<int> dim;
vector<int> stride;
size_type base;

The location of some element (i,j,k,...) is then Tensor->Mem + base + (stride[0]*i + stride[1]*j + stride[2]*k + ...)

Permuting indices is still fast: just apply the permutation to the dim and stride arrays. Slicing a tensor is also a simple operation. For example if you want a rank 2 tensor that represents T(i,5,j) then the layout changes to simply remove that dimension from the layout, and update the base index to base += 5 * stride[1]. Similarly, projecting onto some range along some dimension is also fairly simple (like Python Array[:, 1:3].

Determining whether the layout is compatible with a BLAS or LAPACK call is fairly easy: the rank must be 2, and one of the strides must be 1. If stride[0] == 1 then it is a column-major matrix (blas trans parameter is N), and if stride[1] == 1 then it is row-major (blas trans parameter is T). The leading dimension is the largest stride.

@ianmccul
Copy link
Contributor Author

the calculation of whether the layout is compatible with BLAS is too difficult.

Actually this cannot be true: in the current design the only possible _mapper values for a rank 2 tensor are {0,1} and {1,0}, which map onto BLAS normal and transpose, respectively (or the other way around, for column major storage). Why isn't this done in the existing BLAS/LAPACK wrappers?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant