-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial cut for a cuVS Java API #450
base: branch-25.02
Are you sure you want to change the base?
Conversation
Co-authored-by: Vivek Narang <[email protected]>
FYI @cjnolet ^ |
/ok to test |
2cfab29
to
1597158
Compare
@naramgvivek10 Let's move CuVSResources to the cuvs package instead of common? That way we can abstract out the internals of Panama out of sight of the users. |
@@ -0,0 +1,354 @@ | |||
/* | |||
* Copyright (c) 2025, NVIDIA CORPORATION. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a small nitpick: I notice this file is prefixed with "CuVS" while most of the other files are prefixed with "Cuvs". I personally prefer "CuVS" (or even "cuVS" if that doesn't break convention too badly) but I would at least make it consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this note. I will fix this soon.
private MethodHandle searchMethodHandle; | ||
private MethodHandle destroyIndexMethodHandle; | ||
private MethodHandle serializeMethodHandle; | ||
private MethodHandle deserializeMethodHandle; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make these MethodHandles static final - since they optimise much nicer by the JVM? I'm happy to rework some of the code flow here, if possible (and it that would be helpful) - though I'm not overly familiar with the development environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your valuable feedback. I will try to incorporate this change. Please don't hesitate to suggest and commit to this branch. @chatman please send the invitation. Thanks!
The dev environment is as follows:
- Nvidia GPU RTX 2080 or above
- Ubuntu 22.04
- JDK 22+
- Install CUDA 12.6
- Install cmake 3.26+
- apt install libnccl-dev ninja-build
- clone this repository and this branch
./build.sh libcuvs # To build the libcuvs C library
cd java && ./build.sh # To build the java API (cuvs-java)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chatman @narangvivek10 I'm getting compile errors. Before trying to describe them, lemme explain a little the environment that I'm using. I thought it might be most straightforward to use an AWS cloud instance, g4dn.xlarge, which has Tesla T4, then install the relevant dev tools. Should that instance be suitable to use, or is there another cloud instance or similar that would be more appropriate?
compile errors: https://gist.github.com/ChrisHegarty/35ab79b3ebb8ce80f0861ec9c58b0975
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any advise on emulation, setup, environment, configuration, etc, would be gratefully appreciated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use local workstations with NVIDIA GPUs. One thing we realized is that gcc 11 works out of the box for compiling libcuvs (./build.sh libcuvs). Since it comes out of the box with Ubuntu 22.04 LTS, it is our development environment of choice. I got things to work on Fedora 39 as well, but with a lot of trickery to make CUDA pick up an older version of GCC (11), since it comes with GCC 14, which doesn't work in my experience. @cjnolet FYI.
In your case, I don't see which GCC is being used. However, there's a step where HNSWLib patch is applied, which is failing for some reason. Can you try to "build.sh libcuvs" on branch-25.02? If that doesn't work at the same step (patch apply for HNSWLib), there's maybe something broken in the build.
* update ci release script to update the version in build.sh * class name updates --------- Co-authored-by: Vivek Narang <[email protected]>
Co-authored-by: Vivek Narang <[email protected]>
/ok to test |
/ok to test |
…s it is not compiling without it
* Update CI script to update version in pom.xml files * Add and update examples * Add GPU info API methods --------- Co-authored-by: Vivek Narang <[email protected]>
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I skimmed the packaging details at @cjnolet's request. There are some merge conflicts with #593. I will push a couple small packaging-related changes to this PR to help it move along. I do not have power to push to this fork, so I can't help here. Please merge in branch-25.02
and run pre-commit
.
The biggest work item needed on the packaging side is to clarify what versioning should look like. I think a patch version (25.02.0
) is important to add here, as we often publish patch releases. Currently the update-version.sh
script uses a full tag like 25.02.00
but that doesn't correspond to what the version strings look like right now. Also, we should probably normalize it to 25.02.0
(drop the extra leading zero), as we do for several packaging systems (all Python wheels are normalized, as is cuDF's Java layer). I would look at how cuDF versions its Java layer for more inspiration, which I linked in a comment.
ci/release/update-version.sh
Outdated
# Update Java API version | ||
sed_runner "s/VERSION=\".*\"/VERSION=\"${NEXT_FULL_TAG}\"/g" java/build.sh | ||
for FILE in java/*/pom.xml; do | ||
sed_runner "/<!--CUVS_JAVA#VERSION_UPDATE_MARKER_START-->.*<!--CUVS_JAVA#VERSION_UPDATE_MARKER_END-->/s//<!--CUVS_JAVA#VERSION_UPDATE_MARKER_START--><version>${NEXT_FULL_TAG}<\/version><!--CUVS_JAVA#VERSION_UPDATE_MARKER_END-->/g" "${FILE}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is currently inconsistent with our normal use of the script. We usually run ./ci/release/update-version.sh 25.02.00
which would assign a value of 25.02.00
. However, the pom.xml
files use 25.02
without the patch version. I would guess that we do want to have a patch version here, but perhaps we want to normalize it to 25.02.0
. That's close to what we do in cuDF's Java code.
Please look at https://github.com/rapidsai/cudf/blob/133e0c869531af94474e0bbb66cb22c5f8ba80f2/ci/release/update-version.sh#L87-L91 and https://github.com/rapidsai/cudf/blob/133e0c869531af94474e0bbb66cb22c5f8ba80f2/java/pom.xml#L24 and see whether we should adopt the same patterns here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Bradley, I've updated the version number format to include a patch number also. This should be better for Java artifacts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @bdice I will update the versioning following your suggestion above and push this change soon.
java/README.md
Outdated
Building | ||
-------- | ||
|
||
`./build.sh` will generate the libcuvs_java.so file in internal/ directory, and then build the final jar file for the cuVS Java API in cuvs-java/ directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`./build.sh` will generate the libcuvs_java.so file in internal/ directory, and then build the final jar file for the cuVS Java API in cuvs-java/ directory. | |
`./build.sh` will generate the `libcuvs_java.so` file in the `internal/` directory, and then build the final jar file for the cuVS Java API in the `cuvs-java/` directory. |
… making version include a patch number, ability for .so to be loaded off env var
I updated the branch by merging latest branch-25.02 changes. Needed to add a filter param to cagra search (#452). FYI @narangvivek10, please take a look. |
(FYI @ChrisHegarty, @narangvivek10), Last working commit in this branch is:
|
/ok to test |
1 similar comment
/ok to test |
build.sh
Outdated
# Build the cuvs Java bindings | ||
if (( ${NUMARGS} == 0 )) || hasArg java; then | ||
# build libcuvs first as the Java API depends on it | ||
./$0 libcuvs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if it is good practice to call this script recursively to build the C++ code. It seems like it will lead to unexpected behavior. For example, how do you ensure that the CLI arguments such as --allgpuarch
or --no-mg
are being passed through?
I would recommend removing this line, and users must call ./build.sh libcuvs java
if the C++ has not yet been built. Perhaps a warning can be issued if hasArg libcuvs
is false. I believe this is how we handle the Python builds, which also depend on C++ libraries being built.
./$0 libcuvs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert any files with only copyright changes.
ci/release/update-version.sh
Outdated
# Update Java API version | ||
sed_runner "s/VERSION=\".*\"/VERSION=\"${NEXT_FULL_TAG}\"/g" java/build.sh | ||
for FILE in java/*/pom.xml; do | ||
sed_runner "/<!--CUVS_JAVA#VERSION_UPDATE_MARKER_START-->.*<!--CUVS_JAVA#VERSION_UPDATE_MARKER_END-->/s//<!--CUVS_JAVA#VERSION_UPDATE_MARKER_START--><version>${NEXT_FULL_TAG}<\/version><!--CUVS_JAVA#VERSION_UPDATE_MARKER_END-->/g" "${FILE}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A Java API for cuVS for easy integration into Apache Lucene or other Java based projects.
Try:
For generating docs,
mvn javadoc:javadoc
Prerequisites:
Todo: