Using Your Own Data in ALP Benchmarks

This guide explains how to set up and benchmark your own dataset using ALP.

Step 1: Understand the Dataset Configuration Format

The dataset configuration is provided in a CSV file, where each row describes a column in your dataset.

Below is the explanation of each parameter:

Example:

id,column_name,data_type,path,file_type
0,CLOUDf48.bin.f32,float,/Users/azim/CLionProjects/ALP/100x500x500/CLOUDf48.bin.f32,binary

Parameters:

id:
- A unique integer identifier for the column.
- Example: 0
column_name:
- A descriptive name for the column.
- Example: CLOUDf48.bin.f32
data_type:
- The type of data in the column.
- Allowed values: float, double
- Example: float
path:
- The absolute path to the data file for the column.
- Example: /Users/azim/CLionProjects/ALP/100x500x500/CLOUDf48.bin.f32
file_type:
- The format of the data file.
- Allowed values: binary, csv
- Example: binary

Step 2: Create Your Dataset Configuration File

Edit the CSV file to define your dataset using the format described above.

Example:

id,column_name,data_type,path,file_type
0,AnotherDoubleColumn,double,/Users/azim/CLionProjects/ALP/another_double_column.csv,csv
1,AnotherFloatColumn,float,/Users/azim/CLionProjects/ALP/another_float_column.csv,binary

Step 3: Build ALP with Benchmarking Enabled

To enable benchmarking in ALP:

Configure the build using CMake with the ALP_BUILD_BENCHMARKING option set to ON:

cmake -DALP_BUILD_BENCHMARKING=ON -DALP_ENABLE_VERBOSE_OUTPUT=ON -DCMAKE_BUILD_TYPE=Release -S . -B build

Build the project:
```
cmake --build build
```

Step 4: Run the Benchmark

Run the benchmark executable:

cd build
./benchmarks/bench_your_dataset

Step 5: Analyze the Results

The benchmark tool will save the results here.

The results include the following columns:

idx: Column index.
column: Column name.
data_type: Data type (float, double).
size: Number of bits used to encode this dataset per value.
rowgroups_count: Number of row groups. A row group is composed of 100 vectors.
vectors_count: Number of vectors. A vector always has 1024 values.
decompression_speed: Decompression speed measured in cycles per value.
compression_speed: Compression speed measured in cycles per value.

Notes

Ensure all file paths in your dataset configuration are valid and accessible.
Verify that the data_type and file_type values in the CSV match the format of your data files.
If benchmarking fails, check the logs for errors such as missing files or unsupported formats.

By following these steps, you can configure and benchmark your own datasets in ALP, allowing you to evaluate ALP's performance with your data.

We would love to hear about your data and results, so please share them with us. Your feedback can help improve ALP further.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how_to_benchmark_your_dataset.md

how_to_benchmark_your_dataset.md

Using Your Own Data in ALP Benchmarks

Step 1: Understand the Dataset Configuration Format

Example:

Parameters:

Step 2: Create Your Dataset Configuration File

Example:

Step 3: Build ALP with Benchmarking Enabled

Step 4: Run the Benchmark

Step 5: Analyze the Results

Notes

Files

how_to_benchmark_your_dataset.md

Latest commit

History

how_to_benchmark_your_dataset.md

File metadata and controls

Using Your Own Data in ALP Benchmarks

Step 1: Understand the Dataset Configuration Format

Example:

Parameters:

Step 2: Create Your Dataset Configuration File

Example:

Step 3: Build ALP with Benchmarking Enabled

Step 4: Run the Benchmark

Step 5: Analyze the Results

Notes