vector-search: avoid unaligned reads #1810
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Now vector search can produce unaligned reads because node metadata is not aligned to word boundary (it has size of 10 bytes) and if we will store F32/F64 vectors in the node - they will be be accessed with half-word offset.
You can test the issue like this:
This can create real trouble on some architectures. For example, seems like Cortex-A series of ARM CPU supports unaligned reads but this capability must be enabled explicitly (see https://developer.arm.com/documentation/den0013/d/Porting/Alignment)
The complication in the fix lying in the fact that we reuse memory from sqlite blobs a lot in the vector search code and quite often instead of additional allocation we just initialize vector from blob memory (see
vectorInitStatic
).So, due to the complication, this PR attack the problem from format perspective and align metadata to double-word boundary - making read to have proper alignment.
Note, that PR introduce V3 on-disk binary format with the fix and unaligned reads will be fixed only for newly created indices (old ones will continue to work as it is - but will have potential unaligned reads issue)