feat!: support multivector type #3190

BubbleCal · 2024-12-02T09:45:27Z

codecov-commenter · 2024-12-02T10:18:03Z

Codecov Report

Attention: Patch coverage is 70.27601% with 140 lines in your changes missing coverage. Please review.

Project coverage is 78.66%. Comparing base (c9bb25d) to head (ef5c3f0).

Files with missing lines	Patch %	Lines
rust/lance/src/dataset/scanner.rs	57.77%	41 Missing and 16 partials ⚠️
rust/lance/src/index/vector/utils.rs	55.31%	20 Missing and 1 partial ⚠️
rust/lance-linalg/src/distance.rs	66.00%	17 Missing ⚠️
rust/lance-index/src/vector/sq/storage.rs	31.25%	11 Missing ⚠️
rust/lance-index/src/vector/transform.rs	74.35%	8 Missing and 2 partials ⚠️
rust/lance-index/src/vector/flat.rs	52.94%	5 Missing and 3 partials ⚠️
rust/lance/src/index/vector/ivf/v2.rs	95.48%	4 Missing and 2 partials ⚠️
rust/lance-arrow/src/floats.rs	40.00%	3 Missing ⚠️
rust/lance/src/index/vector/ivf.rs	0.00%	0 Missing and 2 partials ⚠️
rust/lance/src/io/exec/knn.rs	0.00%	0 Missing and 2 partials ⚠️
... and 3 more

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3190      +/-   ##
==========================================
+ Coverage   78.58%   78.66%   +0.08%     
==========================================
  Files         250      250              
  Lines       89539    89836     +297     
  Branches    89539    89836     +297     
==========================================
+ Hits        70360    70668     +308     
+ Misses      16293    16256      -37     
- Partials     2886     2912      +26

Flag	Coverage Δ
unittests	`78.66% <70.27%> (+0.08%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: BubbleCal <[email protected]>

BubbleCal · 2024-12-16T05:56:53Z

rust/lance/src/dataset/scanner.rs


-            let mut knn_node = if q.refine_factor.is_some() {
+            let mut knn_node = if q.refine_factor.is_some() || is_multivec {


for multivector, refine is always required

i dont follow, why is it?

this just follows the algo that colbert paper described, this is required for calculating the maxsim distance. without refine, the search just finds nearest chunks without considering maxsim metric

Signed-off-by: BubbleCal <[email protected]>

wjones127 · 2024-12-23T20:54:05Z

python/python/tests/test_vector_index.py

+@pytest.fixture()
+def multivec_dataset(tmp_path):
+    tbl = create_multivec_table()
+    yield lance.write_dataset(tbl, tmp_path)


For new tests, let's create the dataset in memory (unless we actually need to access the individual files.)

Suggested change

@pytest.fixture()

def multivec_dataset(tmp_path):

tbl = create_multivec_table()

yield lance.write_dataset(tbl, tmp_path)

@pytest.fixture()

def multivec_dataset():

tbl = create_multivec_table()

yield lance.write_dataset(tbl, "memory://")

wjones127 · 2024-12-23T20:54:35Z

python/python/tests/test_vector_index.py

+@pytest.fixture()
+def indexed_multivec_dataset(tmp_path):
+    tbl = create_multivec_table()
+    dataset = lance.write_dataset(tbl, tmp_path)
+    yield dataset.create_index(


Suggested change

@pytest.fixture()

def indexed_multivec_dataset(tmp_path):

tbl = create_multivec_table()

dataset = lance.write_dataset(tbl, tmp_path)

yield dataset.create_index(

@pytest.fixture()

def indexed_multivec_dataset(multivec_dataset):

yield multivec_dataset.create_index(

wjones127 · 2024-12-23T20:55:24Z

python/python/tests/test_vector_index.py

+def test_multivec_ann(indexed_multivec_dataset):
+    query = np.random.randn(5 * 128)
+    indexed_multivec_dataset.scanner(nearest={"column": "vector", "q": query, "k": 100})


Let's assert the output has the expected structure?

Also are there errors we need to test? Like we should get ValueError if we pass the wrong query type?

wjones127 · 2024-12-23T21:24:05Z

rust/lance/src/dataset/scanner.rs

+        let (_, element_type) = get_vector_type(self.dataset.schema(), column)?;
+        let dim = get_vector_dim(self.dataset.schema(), column)?;
+        // make sure the query is valid
+        if q.len() % dim != 0 {


Could you explain more how those two are different?

wjones127 · 2024-12-23T21:24:42Z

rust/lance/src/dataset/scanner.rs

-            AggregateMode::Final,
+            AggregateMode::Single,


Why this change?

Final is used for combining the partial results

Signed-off-by: BubbleCal <[email protected]>

wjones127

Looks good. Just have one small suggestion.

wjones127 · 2025-01-07T18:43:39Z

python/python/lance/dataset.py

            if pa.types.is_fixed_size_list(field.type):
                dimension = field.type.list_size
+            elif pa.types.is_list(field.type):


Shouldn't you also check the child is a fixed size list too?

Suggested change

elif pa.types.is_list(field.type):

elif (pa.types.is_list(field.type) and

pa.types.is_fixed_size_list(field.type.value_type)):

Signed-off-by: BubbleCal <[email protected]>

github-actions bot added the enhancement New feature or request label Dec 2, 2024

BubbleCal added 2 commits December 3, 2024 14:07

build

82d32c6

Signed-off-by: BubbleCal <[email protected]>

Merge branch 'main' of https://github.com/lancedb/lance into issue-2951

a171aa6

BubbleCal force-pushed the issue-2951 branch from 55b0e08 to a171aa6 Compare December 11, 2024 03:25

BubbleCal added 10 commits December 11, 2024 15:18

support multivector on flat

3eb132a

support to index multivector

417d133

Signed-off-by: BubbleCal <[email protected]>

fix

edea676

Signed-off-by: BubbleCal <[email protected]>

clean

5b4ba98

Signed-off-by: BubbleCal <[email protected]>

fix

ff3cf64

Signed-off-by: BubbleCal <[email protected]>

support more index types on multivec

c46bf3c

Signed-off-by: BubbleCal <[email protected]>

more tests

9730d77

Signed-off-by: BubbleCal <[email protected]>

fix

40b2454

Signed-off-by: BubbleCal <[email protected]>

fmt

a8515e6

Signed-off-by: BubbleCal <[email protected]>

fix

e541026

Signed-off-by: BubbleCal <[email protected]>

BubbleCal added the feature label Dec 13, 2024

BubbleCal added 3 commits December 16, 2024 13:16

search multivec

2526724

Signed-off-by: BubbleCal <[email protected]>

fix

9f17b4c

Signed-off-by: BubbleCal <[email protected]>

optimize

092d6db

Signed-off-by: BubbleCal <[email protected]>

BubbleCal commented Dec 16, 2024

View reviewed changes

BubbleCal added 3 commits December 16, 2024 13:57

search with multivector

bd0ddc3

Signed-off-by: BubbleCal <[email protected]>

fix

31dd5e2

Signed-off-by: BubbleCal <[email protected]>

query from python

430c430

Signed-off-by: BubbleCal <[email protected]>

BubbleCal marked this pull request as ready for review December 16, 2024 08:38

BubbleCal requested a review from westonpace December 16, 2024 08:39

github-actions bot added the python label Dec 16, 2024

BubbleCal requested review from eddyxu, wjones127 and chebbyChefNEQ December 16, 2024 08:39

fix ut

77cab91

Signed-off-by: BubbleCal <[email protected]>

BubbleCal added 4 commits December 23, 2024 14:40

Merge branch 'main' of https://github.com/lancedb/lance into issue-2951

427eec8

Signed-off-by: BubbleCal <[email protected]>

fix

7501565

Signed-off-by: BubbleCal <[email protected]>

fix sq

69b5aa7

Signed-off-by: BubbleCal <[email protected]>

fix ut

8949d4a

Signed-off-by: BubbleCal <[email protected]>

wjones127 reviewed Dec 23, 2024

View reviewed changes

BubbleCal added 4 commits December 24, 2024 15:20

Merge branch 'main' of https://github.com/lancedb/lance into issue-2951

9bd65fd

more test

d624251

Signed-off-by: BubbleCal <[email protected]>

more tests

d726ad2

Signed-off-by: BubbleCal <[email protected]>

fix

a4057b6

Signed-off-by: BubbleCal <[email protected]>

BubbleCal requested a review from wjones127 December 24, 2024 11:18

fmt

31d62a9

Signed-off-by: BubbleCal <[email protected]>

BubbleCal changed the title ~~feat: support multivector type~~ feat!: support multivector type Dec 24, 2024

BubbleCal added the breaking-change label Dec 24, 2024

BubbleCal added 8 commits December 24, 2024 19:25

bump version

1dacf24

Signed-off-by: BubbleCal <[email protected]>

bump version

2cf35fe

Signed-off-by: BubbleCal <[email protected]>

update cargo.lock

5cedac5

Signed-off-by: BubbleCal <[email protected]>

fix

0593f4e

Signed-off-by: BubbleCal <[email protected]>

Merge branch 'main' of https://github.com/lancedb/lance into issue-2951

f5de6ca

Signed-off-by: BubbleCal <[email protected]>

fix ut

e3ca31b

Signed-off-by: BubbleCal <[email protected]>

fix

36dc7fc

Signed-off-by: BubbleCal <[email protected]>

fix

c7a8613

Signed-off-by: BubbleCal <[email protected]>

wjones127 approved these changes Jan 7, 2025

View reviewed changes

BubbleCal added 7 commits January 8, 2025 10:44

check

13c7470

Signed-off-by: BubbleCal <[email protected]>

Merge branch 'main' of https://github.com/lancedb/lance into issue-2951

a9136b7

fix

9ae7e48

Signed-off-by: BubbleCal <[email protected]>

Merge branch 'main' of https://github.com/lancedb/lance into issue-2951

e07f223

Signed-off-by: BubbleCal <[email protected]>

fix

5a44eda

Signed-off-by: BubbleCal <[email protected]>

fix

e7b2cd8

Signed-off-by: BubbleCal <[email protected]>

fix lint

ef5c3f0

Signed-off-by: BubbleCal <[email protected]>

BubbleCal merged commit 94e7bf9 into lancedb:main Jan 8, 2025
26 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat!: support multivector type #3190

feat!: support multivector type #3190

BubbleCal commented Dec 2, 2024 •

edited

Loading

codecov-commenter commented Dec 2, 2024 •

edited

Loading

BubbleCal Dec 16, 2024

eddyxu Dec 18, 2024

BubbleCal Dec 19, 2024

wjones127 Dec 23, 2024

BubbleCal Dec 24, 2024

wjones127 Dec 23, 2024

BubbleCal Dec 24, 2024

wjones127 Dec 23, 2024

BubbleCal Dec 24, 2024

wjones127 Dec 23, 2024

wjones127 Dec 23, 2024

BubbleCal Dec 24, 2024

wjones127 left a comment

wjones127 Jan 7, 2025


		let mut knn_node = if q.refine_factor.is_some() {
		let mut knn_node = if q.refine_factor.is_some() \|\| is_multivec {

	elif pa.types.is_list(field.type):
	elif (pa.types.is_list(field.type) and
	pa.types.is_fixed_size_list(field.type.value_type)):

feat!: support multivector type #3190

feat!: support multivector type #3190

Conversation

BubbleCal commented Dec 2, 2024 • edited Loading

codecov-commenter commented Dec 2, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wjones127 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BubbleCal commented Dec 2, 2024 •

edited

Loading

codecov-commenter commented Dec 2, 2024 •

edited

Loading