FastChwHwcConverter is a high-performance, multi-thread, header-only C++ library for converting image data formats between HWC (Height, Width, Channels) and CHW (Channels, Height, Width). The library leverages OpenMP parallel processing to provide lightning-fast performance.
Any similar type conversion code you find another project on GitHub will most likely only achieve performance close to the speed of single-thread execution.
- Overview
- The difference between CHW and HWC
- Why Convert Between HWC and CHW Formats?
- Features
- Installation
- Requirements
- Let's Converter
- Benchmark Performance Timing Results
- Contact
Let's consider a 2x2 image with three channels (RGB).
- Example Image Data:
We can store this image data in two different formats: CHW (Channel-Height-Width) and HWC (Height-Width-Channel).
Pixel 1 (R, G, B) Pixel 2 (R, G, B) Pixel 3 (R, G, B) Pixel 4 (R, G, B)
CHW Format: In this format, the data is stored channel by channel. First, all the red channel data, then all the green channel data, and finally all the blue channel data.
For example (2x2 RGB Image):
RRRRGGGGBBBB
Mapping to the actual pixel positions:
R1, R2, R3, R4, G1, G2, G3, G4, B1, B2, B3, B4
HWC Format: In this format, the data is stored by each pixel's channels in sequence. So, the RGB data for each pixel is stored together.
For example (2x2 RGB Image):
RGBRGBRGBRGB
Mapping to the actual pixel positions:
(R1, G1, B1), (R2, G2, B2), (R3, G3, B3), (R4, G4, B4)
The conversion between HWC (Height-Width-Channel) and CHW (Channel-Height-Width) formats is crucial for optimizing image processing tasks. Different machine learning frameworks and libraries have varying data format preferences. For instance, many deep learning frameworks, such as PyTorch, prefer the CHW format, while libraries like OpenCV often use the HWC format. By converting between these formats, we ensure compatibility and efficient data handling, enabling seamless transitions between different processing pipelines and maximizing performance for specific tasks. This flexibility enhances the overall efficiency and effectiveness of image processing and machine learning workflows.
- High-Performance: Utilizes OpenMP for parallel processing. Make full use of CPU multi-core features.
- Header-Only: Include ONLY a single header file. Easy to integrate into your C/C++ project. example.
- Flexible: Supports scaling, clamping, and normalization of image data, any data type.
Simply include the header file FastChwHwcConverter.hpp
in your project:
#include "FastChwHwcConverter.hpp"
- C++11 or later
- OpenMP support (optional, set USE_OPENMP to ON for high performance)
- CMake v3.10 or later (optional)
- OpenCV v4.0 or later (optional, if BUILD_EXAMPLE_OPENCV is ON)
The hwc2chw
function converts image data from HWC format to CHW format.
template <typename Stype, typename Dtype>
void hwc2chw(
const size_t h, const size_t w, const size_t c,
const Stype* src, Dtype* dst,
const Dtype alpha = 1,
const bool clamp = false, const Dtype min_v = 0.0, const Dtype max_v = 1.0,
const bool normalized_mean_stds = false,
const std::array<float, 3> mean = { 0.485, 0.456, 0.406 },
const std::array<float, 3> stds = { 0.229, 0.224, 0.225 }
);
Parameters:
h
: Height of the image.w
: Width of the image.c
: Number of channels.src
: Pointer to the source data in HWC format.dst
: Pointer to the destination data in CHW format.alpha
: Scaling factor (default is 1).clamp
: Whether to clamp the data values (default is false).min_v
: Minimum value for clamping (default is 0.0).max_v
: Maximum value for clamping (default is 1.0).normalized_mean_stds
: Whether to use mean and standard deviation for normalization (default is false).mean
: Array of mean values for normalization (default is {0.485, 0.456, 0.406}).stds
: Array of standard deviation values for normalization (default is {0.229, 0.224, 0.225}).
The chw2hwc
function converts image data from CHW format to HWC format.
template <typename Stype, typename Dtype>
void chw2hwc(
const size_t c, const size_t h, const size_t w,
const Stype* src, Dtype* dst,
const Dtype alpha = 1,
const bool clamp = false, const Dtype min_v = 0, const Dtype max_v = 255
);
Parameters:
c
: Number of channels.h
: Height of the image.w
: Width of the image.src
: Pointer to the source data in CHW format.dst
: Pointer to the destination data in HWC format.alpha
: Scaling factor (default is 1).clamp
: Whether to clamp the data values (default is false).min_v
: Minimum value for clamping (default is 0).max_v
: Maximum value for clamping (default is 255).
This example code(test/example.cpp) demonstrates how to use the FastChwHwcConverter library to convert image data from HWC format to CHW format, and then back to HWC format after AI inference.
#include "FastChwHwcConverter.hpp"
#include <vector>
int main() {
const size_t c = 3;
const size_t w = 1920;
const size_t h = 1080;
// step 1. Defining input and output
const size_t pixel_size = h * w * c;
std::vector<uint8_t> src_uint8(pixel_size); // Source data(hwc)
std::vector<float> src_float(pixel_size); // Source data(chw)
std::vector<float> out_float(pixel_size); // Inference output data(chw)
std::vector<uint8_t> out_uint8(pixel_size); // Inference output data(hwc)
// step 2. Load image data to src_uint8(8U3C)
// step 3. Convert HWC(Height, Width, Channels) to CHW(Channels, Height, Width)
whyb::hwc2chw<uint8_t, float>(h, w, c, (uint8_t*)src_uint8.data(), (float*)src_float.data());
// step 4. Do AI inference
// input: src_float ==infer==> output: out_float
// step 5. Convert CHW(Channels, Height, Width) to HWC(Height, Width, Channels)
whyb::chw2hwc<float, uint8_t>(c, h, w, (float*)out_float.data(), (uint8_t*)out_uint8.data());
return 0;
}
If you are using OpenCV's cv::Mat
, please refer to the test/example-opencv.cpp file.
The table below shows the benchmark performance timing for different image dimensions, channels, and processing configurations.
CPU: Intel(R) Core(TM) i7-13700K
RAM: DDR5 2400MHz 4x32-bit channels
single-thread | single-thread | multi-thread | multi-thread | |||
---|---|---|---|---|---|---|
Width | Height | Channel | hwc2chw | chw2hwc | hwc2chw | chw2hwc |
426 | 240 | 1 | 0.097ms | 0.110ms | 0.113ms | 0.030ms |
426 | 240 | 3 | 0.331ms | 0.314ms | 0.061ms | 0.068ms |
426 | 240 | 4 | 0.439ms | 0.415ms | 0.082ms | 0.082ms |
640 | 360 | 1 | 0.217ms | 0.236ms | 0.048ms | 0.052ms |
640 | 360 | 3 | 0.743ms | 0.705ms | 0.147ms | 0.140ms |
640 | 360 | 4 | 0.881ms | 0.921ms | 0.219ms | 0.203ms |
854 | 480 | 1 | 0.393ms | 0.415ms | 0.094ms | 0.089ms |
854 | 480 | 3 | 1.328ms | 1.269ms | 0.250ms | 0.232ms |
854 | 480 | 4 | 1.717ms | 1.670ms | 0.263ms | 0.262ms |
1280 | 720 | 1 | 0.873ms | 0.937ms | 0.130ms | 0.180ms |
1280 | 720 | 3 | 2.877ms | 2.828ms | 0.449ms | 0.457ms |
1280 | 720 | 4 | 3.558ms | 3.848ms | 0.719ms | 0.616ms |
1920 | 1080 | 1 | 1.949ms | 2.136ms | 0.374ms | 0.342ms |
1920 | 1080 | 3 | 6.587ms | 6.469ms | 1.000ms | 0.672ms |
1920 | 1080 | 4 | 8.144ms | 8.615ms | 0.832ms | 0.914ms |
2560 | 1440 | 1 | 3.530ms | 3.800ms | 0.423ms | 0.476ms |
2560 | 1440 | 3 | 11.470ms | 11.611ms | 1.323ms | 1.169ms |
2560 | 1440 | 4 | 14.139ms | 15.273ms | 2.391ms | 2.567ms |
3840 | 2160 | 1 | 7.976ms | 8.494ms | 1.103ms | 1.387ms |
3840 | 2160 | 3 | 26.299ms | 25.824ms | 5.339ms | 4.438ms |
3840 | 2160 | 4 | 32.941ms | 34.718ms | 5.805ms | 4.514ms |
7680 | 4320 | 1 | 31.536ms | 34.100ms | 5.742ms | 4.976ms |
7680 | 4320 | 3 | 102.875ms | 102.419ms | 19.261ms | 17.294ms |
7680 | 4320 | 4 | 133.081ms | 136.308ms | 23.398ms | 18.445ms |
For any questions or suggestions, please open an issue or contact the me.