Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace usages of thrust::optional with std::optional #15091

Merged
merged 10 commits into from
Aug 20, 2024

Conversation

miscco
Copy link
Contributor

@miscco miscco commented Feb 20, 2024

We want to get rid of thrust types in API boundaries so replace them by the better suited std types

@miscco miscco requested a review from a team as a code owner February 20, 2024 16:35
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Feb 20, 2024
@miscco miscco force-pushed the replace_thrust_optional branch from eadda39 to 6544074 Compare February 20, 2024 16:46
@miscco miscco requested review from a team as code owners March 10, 2024 19:16
@github-actions github-actions bot added the Python Affects Python cuDF API. label Mar 10, 2024
@bdice bdice added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change and removed Python Affects Python cuDF API. labels Mar 10, 2024
@wence-
Copy link
Contributor

wence- commented Mar 11, 2024

@miscco could you revert the changes to the three pyproject.toml files (I don't think you modified them yourself, maybe a pre-commit hook did?)? Perhaps after merging trunk. Those changes are currently blocking the check-style check.

@miscco
Copy link
Contributor Author

miscco commented Mar 11, 2024

@wence- Done

@wence- wence- removed request for a team, mroeschke and galipremsagar March 11, 2024 14:54
@bdice
Copy link
Contributor

bdice commented Mar 18, 2024

I discussed with @miscco and we decided:

  • We'll change host-only uses of thrust::optional to std::optional in this PR
  • Device uses of thrust::optional can be replaced with cuda::std::optional in a future PR, once we upgrade to CCCL 2.3 or newer

@vyasr
Copy link
Contributor

vyasr commented May 21, 2024

Is there any reason that this didn't get merged? Should we get this updated and pushed through? Apologies if it simply fell through the cracks.

@miscco
Copy link
Contributor Author

miscco commented May 22, 2024

I believe we decided to push this back until rapids moves to cccl 2.3

@vyasr
Copy link
Contributor

vyasr commented May 23, 2024

Got it, thanks!

We'll revisit once #15327 is addressed then.

@bdice
Copy link
Contributor

bdice commented Jul 25, 2024

/ok to test

@bdice
Copy link
Contributor

bdice commented Jul 26, 2024

@miscco Looks like there are legitimate test failures:

[ RUN      ] TransformTest.IsNull
/opt/conda/conda-bld/work/cpp/tests/utilities/column_utilities.cu:262: Failure
Expected equality of these values:
  lhs.null_count()
    Which is: 0
  rhs.null_count()
    Which is: 4
Google Test trace:
/opt/conda/conda-bld/work/cpp/tests/ast/transform_tests.cpp:134:  <--  line of failure

/opt/conda/conda-bld/work/cpp/tests/utilities/column_utilities.cu:262: Failure
Expected equality of these values:
  lhs.null_count()
    Which is: 0
  rhs.null_count()
    Which is: 4
Google Test trace:
/opt/conda/conda-bld/work/cpp/tests/ast/transform_tests.cpp:140:  <--  line of failure

[  FAILED  ] TransformTest.IsNull (2 ms)

@vyasr
Copy link
Contributor

vyasr commented Aug 16, 2024

OK, we have a few runs now here showing the same error. It's a single Python test that's failing in both wheel and conda CI in the same way, and it's happened after merging in the latest a few times, so I'm guessing it's not a fluke. It's a test of the DataFrame.eval method, which would be affected by changes to the evaluator. Plus, there are the earlier C++ failures that @bdice pointed out above. Those aren't failing anymore, so maybe this is a flaky test? It could be accessing garbage memory via an optional, perhaps, and therefore not reproduce consistently? It might be meaningful that it's consistently failing on CUDA 11.8, but that could be a red herring too.

I'll try one last merge of the latest 24.10 to get one more data point, just to be sure.

@miscco
Copy link
Contributor Author

miscco commented Aug 19, 2024

Yeah sorry about that, I do not have capacity to investigate this currently

@vyasr
Copy link
Contributor

vyasr commented Aug 19, 2024

No problem. I pushed on this a little since the original blocker was resolved and I was hoping to help you wrap it up, but since there is more actual work to be done we can put a pin in it for the moment. Not urgent.

@davidwendt
Copy link
Contributor

I was able to recreate this with CUDA 11.8. The cuda::std::optional change in cpp/include/cudf/ast/detail/operators.hpp seems to be the cause. I used the following to show that eval seems to be getting random garbage when using cuda::std::optional vs thrust::optional

>>> import cudf
>>> df = cudf.DataFrame({"a": [1, 2, 3, None, 5]})
>>> df.eval("isnull(a)")
0    True
1    <NA>
2    <NA>
3    <NA>
4    <NA>
dtype: bool
>>> df.eval("isnull(a)")
0     True
1    False
2    False
3     <NA>
4    False
dtype: bool

The above shows there are very different results when calling eval twice on the same input.
The correct output should be:

>>> df.eval("isnull(a)")
0    False
1    False
2    False
3     True
4    False
dtype: bool

So there is some subtle difference between thrust::optional and cuda::std::optional.
Or perhaps there is an error in the expression evaluator that cudf::std::optional is able to manifest given that the AST_TEST was failing in earlier CI builds as well.

@bdice
Copy link
Contributor

bdice commented Aug 19, 2024

Thanks for the analysis @davidwendt.

Should we consider merging all the changes except those in cpp/include/cudf/ast/detail/operators.hpp for now? Or do we think this error is a showstopper for adoption of cuda::std::optional more generally?

@davidwendt
Copy link
Contributor

Should we consider merging all the changes except those in cpp/include/cudf/ast/detail/operators.hpp for now? Or do we think this error is a showstopper for adoption of cuda::std::optional more generally?

I have a hard time believing there is an issue with cuda::std::optional so I'm inclined to undo the changes to operators.hpp and open an issue to figure out what is going on with AST in order to merge this PR sooner than later.

@@ -278,7 +278,7 @@ struct expression_evaluator {
detail::device_data_reference const& input_reference,
IntermediateDataType<has_nulls>* thread_intermediate_storage,
cudf::size_type left_row_index,
thrust::optional<cudf::size_type> right_row_index = {}) const
cuda::std::optional<cudf::size_type> right_row_index = {}) const
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bdice Do you think we could remove the optional here? There is a comment below that says in some cases right_row_index is ignored. I think just making this default to {} is enough to make this parameter optional without making it a formal cuda::std::optional.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exploring this in #16604.

#include <thrust/binary_search.h>
#include <thrust/gather.h>
#include <thrust/host_vector.h>
#include <thrust/optional.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this include is not needed and so can be removed instead of replaced.

@bdice
Copy link
Contributor

bdice commented Aug 19, 2024

Since @miscco said he doesn't have bandwidth to finish this PR right now, I am planning to push some changes (reverting changes in AST code paths) that will let us complete this. I will address @davidwendt's feedback to remove the use of an optional in AST code paths in a follow-up PR.

@bdice
Copy link
Contributor

bdice commented Aug 19, 2024

@davidwendt Tests are passing for me locally as of 0418c91. I opened #16604 to explore the possibility of removing an optional from the AST code paths. I will also use that PR to investigate whether changing to cuda::std::optional in the AST operators shows this problem -- or if it's just failing in the optional right row index.

@davidwendt
Copy link
Contributor

@davidwendt Tests are passing for me locally as of 0418c91. I opened #16604 to explore the possibility of removing an optional from the AST code paths. I will also use that PR to investigate whether changing to cuda::std::optional in the AST operators shows this problem -- or if it's just failing in the optional right row index.

Sounds good. I did this as part of my investigation. Removing the optional right-row-index had no effect on the AST failure/success on my local machine.

@vyasr
Copy link
Contributor

vyasr commented Aug 19, 2024

Thanks both of you! I appreciate you taking point on getting this PR finished.

@bdice
Copy link
Contributor

bdice commented Aug 20, 2024

@davidwendt Could you review again? Or can I dismiss your request for changes? I think this will be ready to go once we rerun the Java CI (it failed, but there are other GPU jobs running so I'll retrigger CI later).

@bdice
Copy link
Contributor

bdice commented Aug 20, 2024

/merge

@rapids-bot rapids-bot bot merged commit 1cccf3e into rapidsai:branch-24.10 Aug 20, 2024
87 checks passed
@bdice
Copy link
Contributor

bdice commented Aug 20, 2024

Thanks @miscco and reviewers!

rapids-bot bot pushed a commit that referenced this pull request Aug 20, 2024
This PR follows up on a request from @davidwendt in #15091 (comment).

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)
  - David Wendt (https://github.com/davidwendt)

URL: #16604
@miscco miscco deleted the replace_thrust_optional branch August 20, 2024 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci conda improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants