- Prefix table: for each 20-bit prefix, store the corresponding range of the array.
- Interpolation: Make one or more interpolation steps. Could store max resulting error.
- Drawback: can cause an unpredictable number of resulting iterations.
- Batching: process multiple (8-32) queries at the same time, hiding memory latency
- Query bucketing: given >>1M of queries, partition them into 1M buckets and answer bucket by bucket.
- Eytzinger layout
- B-tree layout
- prefetching (either next Eytzinger iteration, or in the batch)
- Algorithmica: https://en.algorithmica.org/hpc/data-structures/
- [cite/t:@khuong-array-layouts]
- https://www.cai.sk/ojs/index.php/cai/article/view/2019_3_555
- github:RagnarGrootKoerkamp/suffix-array-searching
- Some initial binary search and Btree variants.
- github:RagnarGrootKoerkamp/cpu-benchmarks
- low-level CPU benchmarks to get upper bounds on potential performance
- Max random access cacheline throughput (1 and many threads)
- Also variants for fetching 2/3/4 consecutive cachelines.
Suppose our task is to find an integer
Here, I’d like to compare the memory efficiency of the B-tree and Eytzinger layout. That is: which method puts the least pressure on the memory system, and can thus get higher potential throughput
Let’s say we are searching an array consisting of
A cache-line has 64 bytes. Set