This guide explains how to set up and benchmark your own dataset using ALP.
The dataset configuration is provided in a CSV file, where each row describes a column in your dataset.
Below is the explanation of each parameter:
id,column_name,data_type,path,file_type
0,CLOUDf48.bin.f32,float,/Users/azim/CLionProjects/ALP/100x500x500/CLOUDf48.bin.f32,binary
-
id
:- A unique integer identifier for the column.
- Example:
0
-
column_name
:- A descriptive name for the column.
- Example:
CLOUDf48.bin.f32
-
data_type
:- The type of data in the column.
- Allowed values:
float
,double
- Example:
float
-
path
:- The absolute path to the data file for the column.
- Example:
/Users/azim/CLionProjects/ALP/100x500x500/CLOUDf48.bin.f32
-
file_type
:- The format of the data file.
- Allowed values:
binary
,csv
- Example:
binary
Edit the CSV file to define your dataset using the format described above.
id,column_name,data_type,path,file_type
0,AnotherDoubleColumn,double,/Users/azim/CLionProjects/ALP/another_double_column.csv,csv
1,AnotherFloatColumn,float,/Users/azim/CLionProjects/ALP/another_float_column.csv,binary
To enable benchmarking in ALP:
-
Configure the build using CMake with the
ALP_BUILD_BENCHMARKING
option set toON
:cmake -DALP_BUILD_BENCHMARKING=ON -DALP_ENABLE_VERBOSE_OUTPUT=ON -DCMAKE_BUILD_TYPE=Release -S . -B build
-
Build the project:
cmake --build build
Run the benchmark executable:
cd build
./benchmarks/bench_your_dataset
The benchmark tool will save the results here.
The results include the following columns:
idx
: Column index.column
: Column name.data_type
: Data type (float
,double
).size
: Number of bits used to encode this dataset per value.rowgroups_count
: Number of row groups. A row group is composed of 100 vectors.vectors_count
: Number of vectors. A vector always has 1024 values.decompression_speed
: Decompression speed measured in cycles per value.compression_speed
: Compression speed measured in cycles per value.
- Ensure all file paths in your dataset configuration are valid and accessible.
- Verify that the
data_type
andfile_type
values in the CSV match the format of your data files. - If benchmarking fails, check the logs for errors such as missing files or unsupported formats.
By following these steps, you can configure and benchmark your own datasets in ALP, allowing you to evaluate ALP's performance with your data.
We would love to hear about your data and results, so please share them with us. Your feedback can help improve ALP further.