This guide explains how to:
- Replicate the experiments presented in our publication
- Benchmark with your own data
You can download the datasets used in our publication here (complete_binaries.zip
). These files are in binary format (64-bit doubles) and were used thourght the benchmarks in our paper.
Set the environment variable ALP_DATASET_DIR_PATH
with the path to the directory containing the binary datasets that you downloaded in step 1.
Additionally, inside data/datasets_transformer.ipynb
, you can find a Jupyter Notebook script with guidelines to download datasets from their original source and code to convert them to binary format (64-bit doubles). Some datasets may require heavy pre-processing.
There are two toolchain files in the toolchain directory. Adjust them based on the clang
and clang++
versions you are using.
Run the master_script.sh on the following architectures:
- Intel Ice Lake (x86_64, AVX512): M6i and C6i instances
- AMD Zen3 (x86_64, AVX2): M6a and C6a instances
- AWS Graviton2 (ARM64, NEON): M6g, C6g, R6g, and T4g instances
- AWS Graviton3 (ARM64, NEON): M7g, C7g, and R7g instances
- Apple M1: Note that on M1, you should run the following script with
sudo
permissions.
./publication/master_script/master_script.sh
./publication/source_code/bench_compression_ratio/publication_bench_alp_compression_ratio
This target will compress an entire binary file and write the
resulting (estimated) compression ratio results (in bits/value)
from the datasets in double_columns.hpp
to the publication directory.
One CSV file will be created for the datasets which use the ALP scheme
and another one for those which use the ALP_RD scheme. Note that this
is a dry compression (compressed data is not stored).
The following target
./publication/source_code/bench_compression_ratio/bench_{algorithm}_compression_ratio
, in which algorithm
can be:
chimp|chimp128|gorillas|patas|zstd
will create a csv file for each encoding and for each dataset on the
publication
directory. Note that this is a dry compression (compressed data is not stored). For PDE and ELF, we used
their own code for compression ratios.
All of these tests read the CSV samples files locations from the dataset array. Therefore, to test with your own data, add your dataset to this array. Note that these experiments are performed on 1024 values. Why? Check Section 4 of the publication.
Encoding is comprised of the encode
, analyze_ffor
, and ffor
primitives. Benchmarked by running:
./publication/source_code/bench_speed/bench_alp_encode
. Results are located on publication/results/
.
Fused decoding is comprised of the falp
and the patch_exceptions
primitives. Unfused decoding is comprised of the
unffor
, decode
and patch_exceptions
primitives. Benchmark both fused and unfused at the same time on different
implementations and Architectures/ISAs by running the commands below. Results are located on publication/results/
.
Implementation | Command |
---|---|
Scalar | ./publication/source_code/generated/fallback/scalar_nav_uf1/fallback_scalar_nav_1024_uf1_falp_bench |
SIMD | ./publication/source_code/generated/{Arch}/{Arch}_{extension}_intrinsic_uf1/{Arch}_{extension}_intrinsic_1024_uf1_falp_bench |
Auto-Vectorized | ./publication/source_code/generated/fallback/scalar_aav_uf1/fallback_scalar_aav_1024_uf1_falp_bench |
While the correctness can be tested by running:
Implementation | Command |
---|---|
Scalar | /publication/source_code/generated/fallback/scalar_nav_uf1/fallback_scalar_nav_1024_uf1_falp_test |
SIMD | /publication/source_code/generated/{Arch}/{Arch}_{extension}_intrinsic_uf1/{Arch}_{extension}_intrinsic_1024_uf1_falp_test |
Auto-Vectorized | /publication/source_code/generated/fallback/scalar_aav_uf1/fallback_scalar_aav_1024_uf1_falp_test |
The source file of the falp
primitive (FUSED ALP+FOR+Bitpack generated
by FastLanes) for each different implementation are at:
Implementation | Source File |
---|---|
Scalar | /publication/source_code/generated/fallback/scalar_nav_uf1/fallback_scalar_nav_1024_uf1_falp_src.cpp |
SIMD | /publication/source_code/generated/{Arch}/{Arch}_{extension}_intrinsic_uf1/{Arch}_{extension}_intrinsic_1024_uf1_falp_src.cpp |
Auto-Vectorized | /publication/source_code/generated/fallback/scalar_aav_uf1/fallback_scalar_aav_1024_uf1_falp_src.cpp |
Architectures and ISAs:
Architecture {Arch} | ISA {extension} |
---|---|
arm64v8 | neon |
arm64v8 | sve |
wasm | simd128 |
x86_64 | sse |
x86_64 | avx2 |
x86_64 | avx512bw |
Encoding is comprised of rd_encode
and two calls to ffor
(for both the left and right parts). Benchmarked by
running: ./publication/source_code/bench_speed/bench_alp_cutter_encode
. Results are located on publication/results/
.
Decoding is comprised of two calls to unffor
(for both the left and right parts) and the rd_decode
primitives.
Benchmarked by running: ./publication/source_code/bench_speed/bench_alp_cutter_decode
. Results are located on
publication/results/
.
Benchmarked both decoding and encoding by running ./publication/source_code/bench_speed/bench_{algorithm}
, in which
algorithm
can
be: chimp|chimp128|gorillas|patas|zstd
. Results are located on publication/results/i4i
.
We benchmarked PseudoDecimals within BtrBlocks. Results are located on publication/results/i4i
.
We benchmarked Elf using their Java implementation.