Add support for `org.apache.spark.sql.catalyst.expressions.Bin` #2760

ustcfy · 2025-01-13T08:47:05Z

This PR adds support for org.apache.spark.sql.catalyst.expressions.Bin.
What it does is taking a long and outputting its binary representation.

Signed-off-by: Yan Feng <[email protected]>

thirtiseven

LGTM, love to see some Java tests

thirtiseven · 2025-01-13T08:49:21Z

src/main/cpp/src/cast_long_to_binary_string.cu

+#include <cudf/column/column_device_view.cuh>
+#include <cudf/column/column_factories.hpp>
+#include <cudf/detail/null_mask.hpp>
+#include <cudf/detail/nvtx/ranges.hpp>


Useless include (not sure)?

thirtiseven · 2025-01-13T08:56:58Z

src/main/cpp/tests/cast_long_to_binary_string.cpp

+
+TEST_F(LongToBinaryStringTests, FromLongToBinary)
+{
+  auto const longs = cudf::test::fixed_width_column_wrapper<int64_t>{0L, 1L, 10L, -0L, -1L};


Could we add some edge case like null, LONG_MAX, LONG_MIN?

do we have a test case at plugin level to make sure Bin(13.3) returns 1101 ?

do we have a test case at plugin level to make sure Bin(13.3) returns 1101 ?

I tested it locally, and Bin(13.3) indeed returns 1101. I will soon submit the plugin PR.

This is really a Spark test not a good test for binary ops. Spark only accepts a Long as the input to bin

https://github.com/apache/spark/blob/3569e768e657d4e28ee7520808ec910cdff2b099/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala#L1010

So any floating point input gets a cast to long inserted in before bin is called. So that test is really a test that Spark is doing the right thing. Even then it would probably be something that we would want to put in the integration tests if we did test it at all.

Note that you can also pass in a string as an input and it will still try to cast it to a long before calling bin.

thirtiseven · 2025-01-13T09:12:38Z

src/main/cpp/src/cast_long_to_binary_string.cu

+  {
+    auto const size = 64 - __clzll(value);
+    // If the value is 0, the size should be 1
+    return size > 0 ? size : 1;


nit: we can first check if the value == 0 to save a __clzll call in this case.

But wouldn't that introduce thread divergence? I understand how that can be an advantage on a CPU, but I don't really see it on a GPU.

The PTX translated from the code is as follows:

auto const size = 64 - __clzll(value); return size > 0 ? size : 1; // PTX clz.b64 %r1, %rd1; // %r1: number of leading zeros in %rd1 (value) mov.u32 %r2, 64; // %r2: constant value 64 sub.s32 %r3, %r2, %r1; // %r3: size = 64 - %r1 max.s32 %r4, %r3, 1; // %r4: return max(size, 1)

if (!value) return 1; return 64 - __clzll(value); // PTX setp.eq.s64 %p1, %rd1, 0; // %p1: set to true if %rd1 (value) is zero clz.b64 %r1, %rd1; // %r1: number of leading zeros in %rd1 (value) mov.u32 %r2, 64; // %r2: constant value 64 sub.s32 %r3, %r2, %r1; // %r3: size = 64 - %r1 selp.b32 %r4, 1, %r3, %p1; // %r4: if %p1 is true, return 1; otherwise, return %r3 (final size)

But wouldn't that introduce thread divergence? I understand how that can be an advantage on a CPU, but I don't really see it on a GPU.

Yes, I was thinking there must be an if else to check size == 0 case so we can put it earlier than clz to save some calls without introducing new branch. But it looks like the compiler will optimize the size > 0 ? size : 1; to a max.s32 so it's branch less then the original approach looks better in anyway.

I have now changed it to this: return max(64 - __clzll(value), 1);

wjxiz1992 · 2025-01-13T09:28:21Z

src/main/cpp/tests/cast_long_to_binary_string.cpp

+  auto results = spark_rapids_jni::long_to_binary_string(longs, cudf::get_default_stream());
+
+  auto const expected = cudf::test::strings_column_wrapper{
+    "0", "1", "1010", "0", "1111111111111111111111111111111111111111111111111111111111111111"};


there you go: https://github.com/apache/spark/blob/master/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MathExpressionsSuite.scala#L470-L490.

revans2

The main thing for me is going to be performance. Do you have any numbers comparing the CPU to the GPU? If you want to wait for a plugin patch that is fine. But we need to do it somewhere.

revans2 · 2025-01-13T19:49:10Z

src/main/cpp/tests/cast_long_to_binary_string.cpp

+
+TEST_F(LongToBinaryStringTests, FromLongToBinary)
+{
+  auto const longs = cudf::test::fixed_width_column_wrapper<int64_t>{0L, 1L, 10L, -0L, -1L};


This is really a Spark test not a good test for binary ops. Spark only accepts a Long as the input to bin

https://github.com/apache/spark/blob/3569e768e657d4e28ee7520808ec910cdff2b099/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala#L1010

So any floating point input gets a cast to long inserted in before bin is called. So that test is really a test that Spark is doing the right thing. Even then it would probably be something that we would want to put in the integration tests if we did test it at all.

Note that you can also pass in a string as an input and it will still try to cast it to a long before calling bin.

src/main/cpp/src/cast_long_to_binary_string.cu

src/main/cpp/CMakeLists.txt

sperlingxx · 2025-01-14T03:59:45Z

src/main/cpp/src/cast_long_to_binary_string.cu

+    auto const value = d_longs.element<LongType>(idx);
+    char* d_buffer   = d_chars + d_offsets[idx];
+    for (auto i = d_sizes[idx] - 1; i >= 0; --i) {
+      *d_buffer++ = value & (1LL << i) ? '1' : '0';


*d_buffer++ = '0' + ((value & (1LL << i)) >> i); perhaps this approach is more efficient since it avoids branching, which might degrade performance on GPUs with warp divergence.

But I am not sure if it is a good practice which is really effective. I would like to hear your opinions on this issue @res-life @ttnghia .

maybe *d_buffer++ = '0' + ((value & (1LL << i)) != 0);? It will be (very slightly) cheaper and easier to read.

@thirtiseven
Yes, this one is also a Branch-Free expression since the compiler shall use setne instruction avoids branching by directly setting a register based on the zero flag (ZF):

cmp rax, 0 setne al and al, 1 add eax, 48

The corresponding codes of my alternative would be translated into:

sar rax, cl add rax, 48

But I am not sure if it is a good practice which is really effective. I would like to hear your opinions on this issue @res-life @ttnghia .

Yes, I think this approach is more efficient.
You may conduct a benchmark test to double confirm.

sperlingxx · 2025-01-14T05:11:42Z

src/main/cpp/src/cast_long_to_binary_string.cu

+
+  __device__ void operator()(cudf::size_type idx)
+  {
+    if (d_longs.is_null(idx)) {


NIT: Just some nice-to-have improvement, use constexpr if instead of if and add an extra template variable nullable for this functor. Because we already knew whether the column_view is nullable or NOT.

Signed-off-by: Yan Feng <[email protected]>

src/main/cpp/src/cast_long_to_binary_string.cu

Signed-off-by: Yan Feng <[email protected]>

res-life · 2025-01-15T04:53:33Z

About the performance, we may try the following approach.

Use warp level parallel, this means one warp(32 threads) handles a long value. Currently one thread handles a long value.
Use unrolling approach:

for (auto i = d_sizes[idx] - 1; i >= 0; --i) {
      *d_buffer++ = value & (1LL << i) ? '1' : '0';

==>>

create a buffer in the stack:
char buf[64];

// totally have 64 lines:
buf[0] = value & (1LL << 0) ? '1' : '0';
buf[1] = value & (1LL << 1) ? '1' : '0';
buf[2] = value & (1LL << 2) ? '1' : '0';
...
buf[62] = value & (1LL << 62) ? '1' : '0';
buf[63] = value & (1LL << 63) ? '1' : '0';
// some of the above lines are unnecessary if the long values have leading have zeros, but keep them to avoid GPU divergence. 

// write out by 4 chars
num_chars = x
while(num_chars > 4) {
  int* d_int_buf = (int*)(d_buffer)
  int* int_buf = (int*)buf
  d_int_buf[i] = int_buf[i]
  num_chars -= 4
}
write out left chars

// Or use mem_cpy to copy multiple chars at one time
mem_cpy(src, dst, 0, size)

res-life · 2025-01-15T06:30:15Z

I found an example, maybe it's useful:

void intToBin(uint64_t i, char* out) {
	assert((out % sizeof(uint64_t)) == 0); // check alignment
	uint64_t zeros   = 0x3030303030303030ULL; // "00000000"
	uint64_t mask    = 0x0101010101010101ULL;
	uint64_t *out_64 = reinterprent_cast<uint64_t*>(out);

	uint64_t bit_index = 0;
	_BitScanForward64(&bit_index, i)
	i = i << bit_index;

	out_64[0] = zeros | _pdep_u64(i >> 56, mask);
	out_64[1] = zeros | _pdep_u64(i >> 48, mask);
	out_64[2] = zeros | _pdep_u64(i >> 40, mask);
	out_64[3] = zeros | _pdep_u64(i >> 32, mask);
	out_64[4] = zeros | _pdep_u64(i >> 24, mask);
	out_64[5] = zeros | _pdep_u64(i >> 16, mask);
	out_64[6] = zeros | _pdep_u64(i >>  8, mask);
	out_64[7] = zeros | _pdep_u64(i >>  0, mask);
	
	_BitScanReverse64(&bit_index, i)
	out[64 - bit_index] = '\0';
}

~~CUDA supports _pdep_u64~~

Sorry, I did not find CUDA version of _pdep_u64

Add support for Bin

13874a5

Signed-off-by: Yan Feng <[email protected]>

thirtiseven self-requested a review January 13, 2025 08:48

Update license

9c59f74

Signed-off-by: Yan Feng <[email protected]>

thirtiseven reviewed Jan 13, 2025

View reviewed changes

wjxiz1992 reviewed Jan 13, 2025

View reviewed changes

revans2 previously approved these changes Jan 13, 2025

View reviewed changes

sperlingxx reviewed Jan 14, 2025

View reviewed changes

src/main/cpp/src/cast_long_to_binary_string.cu Outdated Show resolved Hide resolved

gerashegalov reviewed Jan 14, 2025

View reviewed changes

src/main/cpp/CMakeLists.txt Outdated Show resolved Hide resolved

sperlingxx reviewed Jan 14, 2025

View reviewed changes

Address some comments

51a0d87

Signed-off-by: Yan Feng <[email protected]>

ustcfy dismissed revans2’s stale review via 51a0d87 January 14, 2025 09:05

ustcfy added 2 commits January 14, 2025 17:28

Add java tests

d6d5d4c

Signed-off-by: Yan Feng <[email protected]>

Update copyright years

6fb3589

Signed-off-by: Yan Feng <[email protected]>

ustcfy mentioned this pull request Jan 14, 2025

Add support for org.apache.spark.sql.catalyst.expressions.Bin NVIDIA/spark-rapids#11967

Open

res-life reviewed Jan 14, 2025

View reviewed changes

src/main/cpp/src/cast_long_to_binary_string.cu Show resolved Hide resolved

Remove template variable LongType

a7da182

Signed-off-by: Yan Feng <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for `org.apache.spark.sql.catalyst.expressions.Bin` #2760

Add support for `org.apache.spark.sql.catalyst.expressions.Bin` #2760

ustcfy commented Jan 13, 2025

thirtiseven left a comment

thirtiseven Jan 13, 2025

thirtiseven Jan 13, 2025

binmahone Jan 13, 2025

ustcfy Jan 13, 2025

revans2 Jan 13, 2025

thirtiseven Jan 13, 2025

revans2 Jan 14, 2025

ustcfy Jan 14, 2025 •

edited

Loading

thirtiseven Jan 15, 2025 •

edited

Loading

ustcfy Jan 15, 2025

wjxiz1992 Jan 13, 2025

revans2 left a comment

revans2 Jan 13, 2025

sperlingxx Jan 14, 2025

sperlingxx Jan 14, 2025

thirtiseven Jan 14, 2025

sperlingxx Jan 14, 2025

res-life Jan 14, 2025

sperlingxx Jan 14, 2025

res-life commented Jan 15, 2025 •

edited

Loading

res-life commented Jan 15, 2025 •

edited

Loading

Add support for org.apache.spark.sql.catalyst.expressions.Bin #2760

Are you sure you want to change the base?

Add support for org.apache.spark.sql.catalyst.expressions.Bin #2760

Conversation

ustcfy commented Jan 13, 2025

thirtiseven left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ustcfy Jan 14, 2025 • edited Loading

Choose a reason for hiding this comment

thirtiseven Jan 15, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

revans2 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

res-life commented Jan 15, 2025 • edited Loading

res-life commented Jan 15, 2025 • edited Loading

Add support for `org.apache.spark.sql.catalyst.expressions.Bin` #2760

Add support for `org.apache.spark.sql.catalyst.expressions.Bin` #2760

ustcfy Jan 14, 2025 •

edited

Loading

thirtiseven Jan 15, 2025 •

edited

Loading

res-life commented Jan 15, 2025 •

edited

Loading

res-life commented Jan 15, 2025 •

edited

Loading