Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: cuvs acceleration for gpu k-means #2816

Merged
merged 33 commits into from
Sep 23, 2024
Merged

feat: cuvs acceleration for gpu k-means #2816

merged 33 commits into from
Sep 23, 2024

Conversation

jacketsj
Copy link
Contributor

@jacketsj jacketsj commented Aug 31, 2024

We currently have a pytorch-based k-means implementation for computing IVF centroids. This PR accelerates it with cuVS.
This uses a tradeoff of faster iterations/less score improvement per iteration.

By default, this is off, since it's primarily useful for very large datasets where large centroid counts are applicable.

Benchmarking (classic k-means scoring):

  • k=16384 clusters
  • text2image-10M base set (10M image embeddings, 200 dimensions, float32, cosine distance)

Results: Slightly better score @ ~1.5x faster. Speedup gets better with more centroids.

Easy test script & outputs
import numpy as np
from lance.cuvs.kmeans import KMeans as KMeansVS
from lance.torch.kmeans import KMeans
import lance
import time
# Note: This kind of approach performs quite poorly on random data (see https://arxiv.org/abs/2405.18680), so it's only worth testing on a real dataset
ds = lance.dataset("path/to/text2image-dataset") # can also use other medium~large datasets
data = np.stack(ds.to_table()["vector"].to_numpy())
max_iters_base = 10
max_iters_cuvs = 12 # iters using cuvs are much faster, but slightly less precise
metric = "cosine"

cuvs_start_time = time.time()
kmeans_cuvs = KMeansVS(
    CLUSTERS,
    metric=metric,
    max_iters=max_iters_cuvs,
    seed=0,
)
kmeans_cuvs.fit(data)
cuvs_end_time = time.time()

base_start_time = time.time()
kmeans = KMeans(
    CLUSTERS,
    metric=metric,
    max_iters=max_iters_base,
    seed=0,
)
kmeans.fit(data)
base_end_time = time.time()
print(f"score after {max_iters_cuvs} iters of kmeans_cuvs better than {max_iters_base} iters of kmeans by {kmeans.total_distance - kmeans_cuvs.total_distance}")
base_time = base_end_time-base_start_time
cuvs_time = cuvs_end_time-cuvs_start_time
print(f"time to run kmeans: {base_time}s. time to run kmeans_cuvs: {cuvs_time} (speedup: {base_time/cuvs_time}x)")

Output:

score after 12 iters of kmeans_cuvs better than 10 iters of kmeans by 5905.7138671875
time to run kmeans: 86.69116258621216s. time to run kmeans_cuvs: 56.66267776489258 (speedup: 1.5299517425899842x)

Additionally, a new "accelerator" choice has been added: "cuvs". This requires one of the added optional dependencies (cuvs-py3X, X in {9,10,11}). This can replace the two routines for which we already have cuda acceleration: IVF model training (Lloyd's algorithm) and IVF assignments. At sufficiently large centroid counts, this can significantly accelerate these steps, resulting in better e2e time. See below:

results_static_20240906_132311_plot_dataset_sift1m_k_10
Although these plots are near-identical, the "cuvs" accelerated variation took ~18.1s to build e2e, while the "cuda" accelerated variation took ~24.4s.

This speedup persists on larger datasets, although I was mistaken in that PQ assignments are a bigger bottleneck as the dataset gets larger (thanks to some improvements I did not see), so this is not the bottleneck step. The next step after this PR will be to accelerate PQ with both cuda and cuvs.

@github-actions github-actions bot added enhancement New feature or request python labels Aug 31, 2024
@jacketsj jacketsj marked this pull request as ready for review August 31, 2024 14:53
python/pyproject.toml Outdated Show resolved Hide resolved
python/python/lance/torch/data.py Outdated Show resolved Hide resolved
python/python/lance/torch/kmeans.py Outdated Show resolved Hide resolved
python/pyproject.toml Outdated Show resolved Hide resolved
@jacketsj jacketsj merged commit ea78168 into main Sep 23, 2024
14 checks passed
@jacketsj jacketsj deleted the jack/cuvs-accel branch September 23, 2024 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants