Skip to content

Latest commit

 

History

History
55 lines (43 loc) · 2.17 KB

fast-binary-search.org

File metadata and controls

55 lines (43 loc) · 2.17 KB

[WIP] Faster binary search

1 High level ideas

  • Prefix table: for each 20-bit prefix, store the corresponding range of the array.
  • Interpolation: Make one or more interpolation steps. Could store max resulting error.
    • Drawback: can cause an unpredictable number of resulting iterations.
  • Batching: process multiple (8-32) queries at the same time, hiding memory latency
  • Query bucketing: given >>1M of queries, partition them into 1M buckets and answer bucket by bucket.
  • Eytzinger layout
  • B-tree layout
  • prefetching (either next Eytzinger iteration, or in the batch)

1.1 Resources

1.2 Code

2 To measure

  • Max random access cacheline throughput (1 and many threads)
    • Also variants for fetching 2/3/4 consecutive cachelines.

3 Memory efficiency

Suppose our task is to find an integer $q$ in a sorted list $A$ of length $n$. One option is to use binary search, but using a B-tree or the Eytzinger layout turns out faster when $n$ is large. See the excellent paper [cite/t:@khuong-array-layouts] for background and detailed comparisons.

Here, I’d like to compare the memory efficiency of the B-tree and Eytzinger layout. That is: which method puts the least pressure on the memory system, and can thus get higher potential throughput

Let’s say we are searching an array consisting of $n=2^k$ $b$ byte elements.

3.1 B-tree

A cache-line has 64 bytes. Set $B = 64/b$. Each node stores $B-1$ values and a gap, and corresponds to $\ell=lg_2(B)$ levels of the tree. $\ell$ values of each cache-line are used.