[FEA] Reduce disparity between nested and non-nested column handling in lexicographic comparator #11667
Labels
0 - Backlog
In queue waiting for assignment
feature request
New feature or request
improvement
Improvement / enhancement to an existing function
libcudf
Affects libcudf (C++/CUDA) code.
Milestone
Is your feature request related to a problem? Please describe.
As part of #11129 the row lexicographic comparator became templated on whether or not the tables being compared contain any nested columns. This templating was necessary to guarantee that nested and non-nested code paths were compiled separately. That, in turn, was required because the compiler fails to completely optimize the non-nested code path due to the complexity of the nested code path, which slows down code even when no nested data is present. Unfortunately, this now means that calling code must dispatch to separate paths depending on whether the table being operated on has nested data. In addition to being cumbersome in and of itself, this requirement makes code using the lexicographic comparator different from code using the equality comparator.
Describe the solution you'd like
We should consider alternative APIs for the comparator that abstract the nested column dispatch away from the call-site. One option that I considered is to define a wrapper function that accepts a callable to apply (e.g.
thrust::sort
) that requires the comparator as an argument. This wrapper function could instantiate the appropriate comparator and then call the provided callable. @jrhemstad suggested that this could be accomplished using a visitor pattern. We should prototype this approach.Describe alternatives you've considered
I have not yet come up with any alternative solutions, but it would be nice to try to find an even less invasive approach if possible since the visitor pattern does add an extra level of indirection that would be nice to avoid.
Additional context
N/A
The text was updated successfully, but these errors were encountered: