vector-search: avoid unaligned reads #1810

sivukhin · 2024-11-11T21:11:16Z

Now vector search can produce unaligned reads because node metadata is not aligned to word boundary (it has size of 10 bytes) and if we will store F32/F64 vectors in the node - they will be be accessed with half-word offset.

You can test the issue like this:

$> make CC=clang CFLAGS='-fsanitize=alignment' testfixture
$> ./testfixture test/libsql_vector*
...
vector-f32-compress-bf16...sqlite3.c:212025:39: runtime error: load of misaligned address 0x5d89bb2b0b1a for type 'float', which requires 4 byte alignment
0x5d89bb2b0b1a: note: pointer points here
...

This can create real trouble on some architectures. For example, seems like Cortex-A series of ARM CPU supports unaligned reads but this capability must be enabled explicitly (see https://developer.arm.com/documentation/den0013/d/Porting/Alignment)

For the Cortex-A series of processors, unaligned accesses are supported, although you must enable this by setting the U bit in the CP15:SCTL register, indicating that unaligned accesses are permitted.

The complication in the fix lying in the fact that we reuse memory from sqlite blobs a lot in the vector search code and quite often instead of additional allocation we just initialize vector from blob memory (see vectorInitStatic).

So, due to the complication, this PR attack the problem from format perspective and align metadata to double-word boundary - making read to have proper alignment.

Note, that PR introduce V3 on-disk binary format with the fix and unaligned reads will be fixed only for newly created indices (old ones will continue to work as it is - but will have potential unaligned reads issue)

libsql-ffi/bundled/SQLite3MultipleCiphers/src/sqlite3.c

haaawk

LGTM modulo my misunderstanding of the comment

simonwh mentioned this pull request Nov 12, 2024

[Mobile] OpenCV assertion failure on MediaTek devices (SM-A137F): Missing CPU baseline features microsoft/onnxruntime#22780

Open

sivukhin added 4 commits November 13, 2024 00:40

make nodeMetadataSize/edgeMetadataSize dynamic

848b08e

introduce V3 on-disk vector index format with aligned metadata

abf43af

refine comment about format

c4f37ad

build bundles

b11c0c5

sivukhin force-pushed the vector-search-avoid-unaligned-reads branch from b6255ec to b11c0c5 Compare November 12, 2024 20:40

haaawk reviewed Nov 22, 2024

View reviewed changes

libsql-ffi/bundled/SQLite3MultipleCiphers/src/sqlite3.c Show resolved Hide resolved

haaawk approved these changes Nov 22, 2024

View reviewed changes

sivukhin added 3 commits November 24, 2024 18:01

add backward compatibility test for on-disk format of vector index

963256b

fix comment in code

644a91e

add anti-hacker test

27b3fc4

sivukhin added this pull request to the merge queue Nov 24, 2024

Merged via the queue into main with commit a77f422 Nov 24, 2024
19 checks passed

sivukhin deleted the vector-search-avoid-unaligned-reads branch November 24, 2024 15:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vector-search: avoid unaligned reads #1810

vector-search: avoid unaligned reads #1810

sivukhin commented Nov 11, 2024 •

edited

Loading

haaawk left a comment

vector-search: avoid unaligned reads #1810

vector-search: avoid unaligned reads #1810

Conversation

sivukhin commented Nov 11, 2024 • edited Loading

haaawk left a comment

Choose a reason for hiding this comment

sivukhin commented Nov 11, 2024 •

edited

Loading