[WIP] Faster binary search

1 High level ideas

Prefix table: for each 20-bit prefix, store the corresponding range of the array.
Interpolation: Make one or more interpolation steps. Could store max resulting error.
- Drawback: can cause an unpredictable number of resulting iterations.
Batching: process multiple (8-32) queries at the same time, hiding memory latency
Query bucketing: given >>1M of queries, partition them into 1M buckets and answer bucket by bucket.
Eytzinger layout
B-tree layout
prefetching (either next Eytzinger iteration, or in the batch)

1.1 Resources

Algorithmica: https://en.algorithmica.org/hpc/data-structures/
[cite/t:@khuong-array-layouts]
https://www.cai.sk/ojs/index.php/cai/article/view/2019_3_555

1.2 Code

github:RagnarGrootKoerkamp/suffix-array-searching
- Some initial binary search and Btree variants.
github:RagnarGrootKoerkamp/cpu-benchmarks
- low-level CPU benchmarks to get upper bounds on potential performance

2 To measure

Max random access cacheline throughput (1 and many threads)
- Also variants for fetching 2/3/4 consecutive cachelines.

3 Memory efficiency

Suppose our task is to find an integer $q$ in a sorted list $A$ of length $n$. One option is to use binary search, but using a B-tree or the Eytzinger layout turns out faster when $n$ is large. See the excellent paper [cite/t:@khuong-array-layouts] for background and detailed comparisons.

Here, I’d like to compare the memory efficiency of the B-tree and Eytzinger layout. That is: which method puts the least pressure on the memory system, and can thus get higher potential throughput

Let’s say we are searching an array consisting of $n=2^k$ $b$ byte elements.

3.1 B-tree

A cache-line has 64 bytes. Set $B = 64/b$. Each node stores $B-1$ values and a gap, and corresponds to $\ell=lg_2(B)$ levels of the tree. $\ell$ values of each cache-line are used.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fast-binary-search.org

fast-binary-search.org

[WIP] Faster binary search

1 High level ideas

1.1 Resources

1.2 Code

2 To measure

3 Memory efficiency

3.1 B-tree

Files

fast-binary-search.org

Latest commit

History

fast-binary-search.org

File metadata and controls

[WIP] Faster binary search

1 High level ideas

1.1 Resources

1.2 Code

2 To measure

3 Memory efficiency

3.1 B-tree