Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

linalg::Svd() outputs NaN when the input tensor contains only zeros #505

Open
IvanaGyro opened this issue Nov 9, 2024 · 3 comments
Open
Assignees
Labels
bug Something isn't working question Further information is requested

Comments

@IvanaGyro
Copy link
Collaborator

IvanaGyro commented Nov 9, 2024

linalg::Svd() outputs NaN when the input tensor contains only zeros. This issue only happens on GPU and doesn't happen when the data type is float. This bug is the culprit of the broken Svd.gpu_U1_zeros_test test case.

Below is the minimal test case to reproduce the bug.

#include <vector>

#include "cuda_runtime_api.h"
#include "gtest/gtest.h"

#include "Generator.hpp"
#include "linalg.hpp"
#include "Tensor.hpp"
#include "Type.hpp"
#include "Device.hpp"

  TEST(Svd, GpuZeros) {
    cudaSetDevice(Device.cuda);
    Tensor all_zeros = zeros(64, Type.Double, Device.cuda);
    all_zeros.reshape_(16, 4);
    std::vector<Tensor> svd_output = linalg::Svd(all_zeros, /* is_UvT */ true);
    cudaDeviceSynchronize();
    ASSERT_EQ(svd_output.size(), 3);
    Tensor& singular_values = svd_output[0];
    ASSERT_EQ(singular_values.shape().size(), 1);
    ASSERT_EQ(singular_values.shape()[0], 4);
    EXPECT_EQ(singular_values.at({0}), 0);
    EXPECT_EQ(singular_values.at({1}), 0);
    EXPECT_EQ(singular_values.at({2}), 0);
    EXPECT_EQ(singular_values.at({3}), 0);
  }

current output:

: Failure
Expected equality of these values:
  singular_values.at({0})
    Which is: < -nan > dtype: [Double (Float64)]
  0

: Failure
Expected equality of these values:
  singular_values.at({1})
    Which is: < -nan > dtype: [Double (Float64)]
  0

: Failure
Expected equality of these values:
  singular_values.at({2})
    Which is: < -nan > dtype: [Double (Float64)]
  0

: Failure
Expected equality of these values:
  singular_values.at({3})
    Which is: < -nan > dtype: [Double (Float64)]
  0
@IvanaGyro IvanaGyro added the bug Something isn't working label Nov 10, 2024
@IvanaGyro IvanaGyro self-assigned this Nov 14, 2024
@IvanaGyro
Copy link
Collaborator Author

IvanaGyro commented Nov 14, 2024

Only when the size of singular values are 4, 23, 28, 33, 48, 51, 53, and 60, the output singular values contain NaN. Below is the code for finding out these numbers and the output of the code. I am not sure why these numbers cause the problem. However, I confirm that NaN is output by cusolverDnXgesvdp(), which is a cuSOLVER function.

#include <iostream>
#include <utility>
#include <vector>

#include "cuda_runtime_api.h"
#include "gtest/gtest.h"

#include "backend/Storage.hpp"
#include "Generator.hpp"
#include "linalg.hpp"
#include "Tensor.hpp"
#include "Type.hpp"
#include "Device.hpp"

TEST(Svd, GetZerosWhenInputtingZeros) {
  cudaSetDevice(Device.cuda);
  auto any_non_zero = [](const Tensor& tensor) -> bool {
    Storage storage = tensor.storage();
    size_t size = storage.size();
    double* ptr = reinterpret_cast<double*>(storage.data());
    for (int i = 0; i < size; ++i) {
      if (*ptr++ != 0) {
        return true;
      }
    }
    return false;
  };
  std::vector<std::pair<size_t, size_t>> shapes_get_nan;
  for (size_t m = 1; m < 61; ++m) {
    for (size_t n = 1; n < 61; ++n) {
      Tensor all_zeros = zeros(m * n, Type.Double, Device.cuda);
      all_zeros.reshape_(m, n);
      std::vector<Tensor> svd_output = linalg::Svd(all_zeros, /* is_UvT */ true);
      Tensor& singular_values = svd_output[0];
      if (any_non_zero(singular_values)) {
        shapes_get_nan.emplace_back(m, n);
      }
    }
  }
  std::cout << "{";
  for (const std::pair<size_t, size_t>& shape : shapes_get_nan) {
    std::cout << std::dec << "{" << shape.first << ", " << shape.second << "}, ";
  }
  std::cout << "}" << std::endl;
}

output:

{{4, 4}, {4, 5}, {4, 6}, {4, 7}, {4, 8}, {4, 9}, {4, 10}, {4, 11}, {4, 12}, {4, 13}, {4, 14}, {4, 15}, {4, 16}, {4, 17}, {4, 18}, {4, 19}, {4, 20}, 
{4, 21}, {4, 22}, {4, 23}, {4, 24}, {4, 25}, {4, 26}, {4, 27}, {4, 28}, {4, 29}, {4, 30}, {4, 31}, {4, 32}, {4, 33}, {4, 34}, {4, 35}, {4, 36}, 
{4, 37}, {4, 38}, {4, 39}, {4, 40}, {4, 41}, {4, 42}, {4, 43}, {4, 44}, {4, 45}, {4, 46}, {4, 47}, {4, 48}, {4, 49}, {4, 50}, {4, 51}, {4, 52}, 
{4, 53}, {4, 54}, {4, 55}, {4, 56}, {4, 57}, {4, 58}, {4, 59}, {4, 60}, {5, 4}, {6, 4}, {7, 4}, {8, 4}, {9, 4}, {10, 4}, {11, 4}, {12, 4}, {13, 4}, 
{14, 4}, {15, 4}, {16, 4}, {17, 4}, {18, 4}, {19, 4}, {20, 4}, {21, 4}, {22, 4}, {23, 4}, {23, 23}, {23, 24}, {23, 25}, {23, 26}, {23, 27},
{23, 28}, {23, 29}, {23, 30}, {23, 31}, {23, 32}, {23, 33}, {23, 34}, {23, 35}, {23, 36}, {23, 37}, {23, 38}, {23, 39}, {23, 40}, {23, 41}, 
{23, 42}, {23, 43}, {23, 44}, {23, 45}, {23, 46}, {23, 47}, {23, 48}, {23, 49}, {23, 50}, {23, 51}, {23, 52}, {23, 53}, {23, 54}, {23, 55}, 
{23, 56}, {23, 57}, {23, 58}, {23, 59}, {23, 60}, {24, 4}, {24, 23}, {25, 4}, {25, 23}, {26, 4}, {26, 23}, {27, 4}, {27, 23}, {28, 4}, {28, 23}, 
{28, 28}, {28, 29}, {28, 30}, {28, 31}, {28, 32}, {28, 33}, {28, 34}, {28, 35}, {28, 36}, {28, 37}, {28, 38}, {28, 39}, {28, 40}, {28, 41}, 
{28, 42}, {28, 43}, {28, 44}, {28, 45}, {28, 46}, {28, 47}, {28, 48}, {28, 49}, {28, 50}, {28, 51}, {28, 52}, {28, 53}, {28, 54}, {28, 55}, 
{28, 56}, {28, 57}, {28, 58}, {28, 59}, {28, 60}, {29, 4}, {29, 23}, {29, 28}, {30, 4}, {30, 23}, {30, 28}, {31, 4}, {31, 23}, {31, 28}, 
{32, 4}, {32, 23}, {32, 28}, {33, 4}, {33, 23}, {33, 28}, {33, 33}, {33, 34}, {33, 35}, {33, 36}, {33, 37}, {33, 38}, {33, 39}, {33, 40}, 
{33, 41}, {33, 42}, {33, 43}, {33, 44}, {33, 45}, {33, 46}, {33, 47}, {33, 48}, {33, 49}, {33, 50}, {33, 51}, {33, 52}, {33, 53}, {33, 54}, 
{33, 55}, {33, 56}, {33, 57}, {33, 58}, {33, 59}, {33, 60}, {34, 4}, {34, 23}, {34, 28}, {34, 33}, {35, 4}, {35, 23}, {35, 28}, {35, 33}, 
{36, 4}, {36, 23}, {36, 28}, {36, 33}, {37, 4}, {37, 23}, {37, 28}, {37, 33}, {38, 4}, {38, 23}, {38, 28}, {38, 33}, {39, 4}, {39, 23}, {39, 28},
 {39, 33}, {40, 4}, {40, 23}, {40, 28}, {40, 33}, {41, 4}, {41, 23}, {41, 28}, {41, 33}, {42, 4}, {42, 23}, {42, 28}, {42, 33}, {43, 4}, {43, 23}, 
{43, 28}, {43, 33}, {44, 4}, {44, 23}, {44, 28}, {44, 33}, {45, 4}, {45, 23}, {45, 28}, {45, 33}, {46, 4}, {46, 23}, {46, 28}, {46, 33}, {47, 4}, 
{47, 23}, {47, 28}, {47, 33}, {48, 4}, {48, 23}, {48, 28}, {48, 33}, {48, 48}, {48, 49}, {48, 50}, {48, 51}, {48, 52}, {48, 53}, {48, 54}, 
{48, 55}, {48, 56}, {48, 57}, {48, 58}, {48, 59}, {48, 60}, {49, 4}, {49, 23}, {49, 28}, {49, 33}, {49, 48}, {50, 4}, {50, 23}, {50, 28}, 
{50, 33}, {50, 48}, {51, 4}, {51, 23}, {51, 28}, {51, 33}, {51, 48}, {51, 51}, {51, 52}, {51, 53}, {51, 54}, {51, 55}, {51, 56}, {51, 57}, 
{51, 58}, {51, 59}, {51, 60}, {52, 4}, {52, 23}, {52, 28}, {52, 33}, {52, 48}, {52, 51}, {53, 4}, {53, 23}, {53, 28}, {53, 33}, {53, 48}, 
{53, 51}, {53, 53}, {53, 54}, {53, 55}, {53, 56}, {53, 57}, {53, 58}, {53, 59}, {53, 60}, {54, 4}, {54, 23}, {54, 28}, {54, 33}, {54, 48}, 
{54, 51}, {54, 53}, {55, 4}, {55, 23}, {55, 28}, {55, 33}, {55, 48}, {55, 51}, {55, 53}, {56, 4}, {56, 23}, {56, 28}, {56, 33}, {56, 48}, 
{56, 51}, {56, 53}, {57, 4}, {57, 23}, {57, 28}, {57, 33}, {57, 48}, {57, 51}, {57, 53}, {58, 4}, {58, 23}, {58, 28}, {58, 33}, {58, 48}, 
{58, 51}, {58, 53}, {58, 58}, {58, 59}, {58, 60}, {59, 4}, {59, 23}, {59, 28}, {59, 33}, {59, 48}, {59, 51}, {59, 53}, {59, 58}, {60, 4}, 
{60, 23}, {60, 28}, {60, 33}, {60, 48}, {60, 51}, {60, 53}, {60, 58}, {60, 60}, }

@IvanaGyro
Copy link
Collaborator Author

Should we force the SVD function to output all zeros when the input values are all zeros?

@IvanaGyro IvanaGyro added the question Further information is requested label Nov 14, 2024
@ianmccul
Copy link
Contributor

I have posted so many bug reports for cusolver SVD ... it is very disappointing that 6 years later there are still lots of bugs in it. Their test infrastructure must be pathetic.

It is actually useful in some situations if the SVD of a matrix of zeros gives random unitaries for the U and V matrices. The reason is that if you need to select from a vector that has a zero singular value, then you should get a random choice, rather than (1,0,0,0,..), which is a rather biased selection! But probably not something that you want to incorporate into the SVD unconditionally ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants