Add hardcoded filter3x3 function to benchmark current generic filter function against #644

theotherphil · 2024-05-26T03:14:25Z

No description provided.

ripytide · 2024-05-26T15:49:43Z

It seems #646 adds an implementation for 3x3, 5x5, 7x7, this might be closable then. Or is there something I'm missing with this issue as compared to #643?

theotherphil · 2024-05-26T16:14:32Z

This issue is to add an implementation of filter3x3 along the lines of what we had when this library was first written - a raw loop that averages the pixel values at 9 fixed offsets from the current coordinate - to check that we don’t accidentally end up with an implementation that’s slower than that for this use case.

cospectrum · 2024-05-27T08:45:51Z

The filter has become 4 times slower

cospectrum · 2024-05-27T08:49:40Z

malloc for each pixel 😮‍💨

ripytide · 2024-05-27T09:35:19Z

Not if the compiler optimizes it out, which I'm curious about seeing as it was the same speed as the previous implementation according my benchmarking. I'm hoping the Pixel trait will get compile-time length info at some point which would allow us to remove the heap allocations.

One method would be if image switches to rgb's 0.9 release which I'm working on atm.

theotherphil · 2024-05-27T10:39:09Z

@cospectrum which versions are you comparing?

cospectrum · 2024-05-27T10:39:35Z

@cospectrum which versions are you comparing?

master and 0.25

theotherphil · 2024-05-27T10:42:25Z

Ouch. I’ll take a look today. @ripytide has benchmarks showing no impact at #608 (comment) so something is odd here.

theotherphil · 2024-05-27T11:29:57Z

Comparing v0.25.0 and current master (5e00bd1) on a 2021 MacBook Pro with an M1 Max chip.

The function signatures and set of benchmarks have both changed across this range so I've

Created a filter5x5 and filter7x7 function in version 0.25 which are just copy-pasted from filter3x3 with a different kernel size
Added the following benchmarks

// For v0.25
    #[bench]
    fn bench_filter3x3_i32_filter(b: &mut Bencher) {
        let image = gray_bench_image(500, 500);
        #[rustfmt::skip]
        let kernel: Vec<i32> = vec![
            -1, 0, 1,
            -2, 0, 2,
            -1, 0, 1
        ];

        b.iter(|| {
            let filtered: ImageBuffer<Luma<i16>, Vec<i16>> =
                filter3x3::<_, _, i16>(&image, &kernel);
            black_box(filtered);
        });
    }

    #[bench]
    fn bench_filter5x5_i32_filter(b: &mut Bencher) {
        let image = gray_bench_image(500, 500);
        #[rustfmt::skip]
        let kernel: Vec<i32> = vec![
            -1, 0, 1, 2, 3,
            -2, 0, 2, 3, 5,
            -1, 0, 1, 6, 9,
            -1, 0, 1, 6, 9,
            -1, 0, 1, 6, 9
        ];

        b.iter(|| {
            let filtered: image::ImageBuffer<Luma<i16>, Vec<i16>> =
            filter5x5::<_, _, i16>(&image, &kernel);
            black_box(filtered);
        });
    }

    #[bench]
    fn bench_filter7x7_i32_filter(b: &mut Bencher) {
        let image = gray_bench_image(500, 500);
        #[rustfmt::skip]
        let kernel: Vec<i32> = vec![
            -1, 0, 1, 2, 3, -2, 1,
            -2, 0, 2, 3, 5, -2, 1,
            -1, 0, 1, 6, 9, -2, 1,
            -1, 0, 1, 6, 9, -2, 1,
            -1, 0, 1, 6, 9, -2, 1,
            -1, 0, 1, 6, 9, -2, 1,
            -1, 0, 1, 6, 9, -2, 1
        ];

        b.iter(|| {
            let filtered: image::ImageBuffer<Luma<i16>, Vec<i16>> =
            filter7x7::<_, _, i16>(&image, &kernel);
            black_box(filtered);
        });
    }

// For master
    #[bench]
    fn bench_filter3x3_i32_filter(b: &mut Bencher) {
        let image = gray_bench_image(500, 500);
        #[rustfmt::skip]
        let kernel: Vec<i32> = vec![
            -1, 0, 1,
            -2, 0, 2,
            -1, 0, 1
        ];
        let kernel = Kernel::new(&kernel, 3, 3);

        b.iter(|| {
            let filtered: image::ImageBuffer<Luma<i16>, Vec<i16>> =
            filter_clamped::<_, _, i16>(&image, kernel);
            black_box(filtered);
        });
    }

    #[bench]
    fn bench_filter5x5_i32_filter(b: &mut Bencher) {
        let image = gray_bench_image(500, 500);
        #[rustfmt::skip]
        let kernel: Vec<i32> = vec![
            -1, 0, 1, 2, 3,
            -2, 0, 2, 3, 5,
            -1, 0, 1, 6, 9,
            -1, 0, 1, 6, 9,
            -1, 0, 1, 6, 9
        ];
        let kernel = Kernel::new(&kernel, 5, 5);

        b.iter(|| {
            let filtered: image::ImageBuffer<Luma<i16>, Vec<i16>> =
            filter_clamped::<_, _, i16>(&image, kernel);
            black_box(filtered);
        });
    }

    #[bench]
    fn bench_filter7x7_i32_filter(b: &mut Bencher) {
        let image = gray_bench_image(500, 500);
        #[rustfmt::skip]
        let kernel: Vec<i32> = vec![
            -1, 0, 1, 2, 3, -2, 1,
            -2, 0, 2, 3, 5, -2, 1,
            -1, 0, 1, 6, 9, -2, 1,
            -1, 0, 1, 6, 9, -2, 1,
            -1, 0, 1, 6, 9, -2, 1,
            -1, 0, 1, 6, 9, -2, 1,
            -1, 0, 1, 6, 9, -2, 1
        ];
        let kernel = Kernel::new(&kernel, 7, 7);

        b.iter(|| {
            let filtered: image::ImageBuffer<Luma<i16>, Vec<i16>> =
            filter_clamped::<_, _, i16>(&image, kernel);
            black_box(filtered);
        });
    }

Results:

// Master
filter::benches::bench_filter3x3_i32_filter                                      ... bench:   2,113,838.52 ns/iter (+/- 146,344.38)
filter::benches::bench_filter5x5_i32_filter                                      ... bench:   5,102,450.00 ns/iter (+/- 1,087,624.48)
filter::benches::bench_filter7x7_i32_filter                                      ... bench:   9,306,050.00 ns/iter (+/- 1,542,055.76)

// v0.25
filter::benches::bench_filter3x3_i32_filter                                      ... bench:   1,800,742.73 ns/iter (+/- 97,078.87)
filter::benches::bench_filter5x5_i32_filter                                      ... bench:   2,032,997.93 ns/iter (+/- 444,828.73)
filter::benches::bench_filter7x7_i32_filter                                      ... bench:   4,564,652.15 ns/iter (+/- 1,259,277.64)

So that's 1.17x, 2.55x and 2.04x slowdowns for 3x3, 5x5 and 7x7 kernels on 500x500 grayscale images.

Master now has benchmarks for 3x3, 5x5 and 7x7 images as standard, although with the image size reduced to 300x300, so these comparisons should be easier in future.

ripytide · 2024-05-27T11:47:28Z

@theotherphil could you push your modified v0.25 branch so I can use that for my benchmark?

theotherphil · 2024-05-27T12:04:34Z

This branch has the new benchmarks plus filter5x5 and filter7x7: https://github.com/image-rs/imageproc/tree/v0.25.0-branch

theotherphil · 2024-05-27T12:07:39Z

I'll add some for RGB images too, in case the regressions are larger than than for grayscale.

theotherphil · 2024-05-27T12:15:10Z

RGB:

// v0.25
    #[bench]
    fn bench_filter3x3_i32_filter_rgb(b: &mut Bencher) {
        let image = rgb_bench_image(500, 500);
        #[rustfmt::skip]
        let kernel: Vec<i32> = vec![
            -1, 0, 1,
            -2, 0, 2,
            -1, 0, 1
        ];

        b.iter(|| {
            let filtered: ImageBuffer<Rgb<i16>, Vec<i16>> =
                filter3x3::<_, _, i16>(&image, &kernel);
            black_box(filtered);
        });
    }

    #[bench]
    fn bench_filter5x5_i32_filter_rgb(b: &mut Bencher) {
        let image = rgb_bench_image(500, 500);
        #[rustfmt::skip]
        let kernel: Vec<i32> = vec![
            -1, 0, 1, 2, 3,
            -2, 0, 2, 3, 5,
            -1, 0, 1, 6, 9,
            -1, 0, 1, 6, 9,
            -1, 0, 1, 6, 9
        ];

        b.iter(|| {
            let filtered: image::ImageBuffer<Rgb<i16>, Vec<i16>> =
            filter5x5::<_, _, i16>(&image, &kernel);
            black_box(filtered);
        });
    }

    #[bench]
    fn bench_filter7x7_i32_filter_rgb(b: &mut Bencher) {
        let image = rgb_bench_image(500, 500);
        #[rustfmt::skip]
        let kernel: Vec<i32> = vec![
            -1, 0, 1, 2, 3, -2, 1,
            -2, 0, 2, 3, 5, -2, 1,
            -1, 0, 1, 6, 9, -2, 1,
            -1, 0, 1, 6, 9, -2, 1,
            -1, 0, 1, 6, 9, -2, 1,
            -1, 0, 1, 6, 9, -2, 1,
            -1, 0, 1, 6, 9, -2, 1
        ];

        b.iter(|| {
            let filtered: image::ImageBuffer<Rgb<i16>, Vec<i16>> =
            filter7x7::<_, _, i16>(&image, &kernel);
            black_box(filtered);
        });
    }

// Master
    #[bench]
    fn bench_filter3x3_i32_filter_rgb(b: &mut Bencher) {
        let image = rgb_bench_image(500, 500);
        #[rustfmt::skip]
        let kernel: Vec<i32> = vec![
            -1, 0, 1,
            -2, 0, 2,
            -1, 0, 1
        ];
        let kernel = Kernel::new(&kernel, 3, 3);

        b.iter(|| {
            let filtered: image::ImageBuffer<Rgb<i16>, Vec<i16>> =
            filter_clamped::<_, _, i16>(&image, kernel);
            black_box(filtered);
        });
    }

    #[bench]
    fn bench_filter5x5_i32_filter_rgb(b: &mut Bencher) {
        let image = rgb_bench_image(500, 500);
        #[rustfmt::skip]
        let kernel: Vec<i32> = vec![
            -1, 0, 1, 2, 3,
            -2, 0, 2, 3, 5,
            -1, 0, 1, 6, 9,
            -1, 0, 1, 6, 9,
            -1, 0, 1, 6, 9
        ];
        let kernel = Kernel::new(&kernel, 5, 5);

        b.iter(|| {
            let filtered: image::ImageBuffer<Rgb<i16>, Vec<i16>> =
            filter_clamped::<_, _, i16>(&image, kernel);
            black_box(filtered);
        });
    }

    #[bench]
    fn bench_filter7x7_i32_filter_rgb(b: &mut Bencher) {
        let image = rgb_bench_image(500, 500);
        #[rustfmt::skip]
        let kernel: Vec<i32> = vec![
            -1, 0, 1, 2, 3, -2, 1,
            -2, 0, 2, 3, 5, -2, 1,
            -1, 0, 1, 6, 9, -2, 1,
            -1, 0, 1, 6, 9, -2, 1,
            -1, 0, 1, 6, 9, -2, 1,
            -1, 0, 1, 6, 9, -2, 1,
            -1, 0, 1, 6, 9, -2, 1
        ];
        let kernel = Kernel::new(&kernel, 7, 7);

        b.iter(|| {
            let filtered: image::ImageBuffer<Rgb<i16>, Vec<i16>> =
            filter_clamped::<_, _, i16>(&image, kernel);
            black_box(filtered);
        });
    }

Results:

// Master
filter::benches::bench_filter3x3_i32_filter_rgb                                  ... bench:  19,248,650.10 ns/iter (+/- 2,161,878.78)
filter::benches::bench_filter5x5_i32_filter_rgb                                  ... bench:  26,113,808.30 ns/iter (+/- 1,566,443.61)
filter::benches::bench_filter7x7_i32_filter_rgb                                  ... bench:  33,759,604.10 ns/iter (+/- 819,419.97)

// v0.25
filter::benches::bench_filter3x3_i32_filter_rgb                                  ... bench:   2,414,054.20 ns/iter (+/- 51,805.40)
filter::benches::bench_filter5x5_i32_filter_rgb                                  ... bench:   5,215,018.75 ns/iter (+/- 1,126,537.41)
filter::benches::bench_filter7x7_i32_filter_rgb                                  ... bench:  10,263,666.60 ns/iter (+/- 364,188.05)

The regressions are much larger for RGB images.

ripytide · 2024-05-27T12:23:36Z

0.25

test filter::benches::bench_bilateral_filter                                          ... bench:  75,295,576 ns/iter (+/- 211,489)
test filter::benches::bench_box_filter                                                ... bench:   2,070,778 ns/iter (+/- 12,442)
test filter::benches::bench_filter3x3_i32_filter                                      ... bench:   3,099,127 ns/iter (+/- 24,920)
test filter::benches::bench_filter3x3_i32_filter_rgb                                  ... bench:   3,375,630 ns/iter (+/- 131,017)
test filter::benches::bench_filter5x5_i32_filter                                      ... bench:   2,390,178 ns/iter (+/- 92,977)
test filter::benches::bench_filter5x5_i32_filter_rgb                                  ... bench:   9,743,942 ns/iter (+/- 164,547)
test filter::benches::bench_filter7x7_i32_filter                                      ... bench:   4,318,360 ns/iter (+/- 12,310)
test filter::benches::bench_filter7x7_i32_filter_rgb                                  ... bench:  16,803,183 ns/iter (+/- 749,961)
test filter::benches::bench_gaussian_f32_stdev_1                                      ... bench:     287,123 ns/iter (+/- 3,185)
test filter::benches::bench_gaussian_f32_stdev_10                                     ... bench:   1,384,883 ns/iter (+/- 10,495)
test filter::benches::bench_gaussian_f32_stdev_3                                      ... bench:     527,646 ns/iter (+/- 3,742)
test filter::benches::bench_horizontal_filter                                         ... bench:     904,917 ns/iter (+/- 8,832)
test filter::benches::bench_separable_filter                                          ... bench:     666,254 ns/iter (+/- 2,835)
test filter::benches::bench_vertical_filter                                           ... bench:     940,404 ns/iter (+/- 4,227)

Master

test filter::benches::bench_box_filter                                                ... bench:   2,064,973 ns/iter (+/- 3,673)
test filter::benches::bench_filter_clamped_gray_3x3                                   ... bench:   1,160,258 ns/iter (+/- 42,470)
test filter::benches::bench_filter_clamped_gray_5x5                                   ... bench:   2,610,301 ns/iter (+/- 5,058)
test filter::benches::bench_filter_clamped_gray_7x7                                   ... bench:   4,679,867 ns/iter (+/- 16,102)
test filter::benches::bench_filter_clamped_parallel_gray_3x3                          ... bench:     295,250 ns/iter (+/- 138,320)
test filter::benches::bench_filter_clamped_parallel_gray_5x5                          ... bench:     601,325 ns/iter (+/- 53,330)
test filter::benches::bench_filter_clamped_parallel_gray_7x7                          ... bench:   1,075,725 ns/iter (+/- 44,041)
test filter::benches::bench_gaussian_f32_stdev_1                                      ... bench:     295,152 ns/iter (+/- 1,174)
test filter::benches::bench_gaussian_f32_stdev_10                                     ... bench:   1,383,488 ns/iter (+/- 5,924)
test filter::benches::bench_gaussian_f32_stdev_3                                      ... bench:     527,800 ns/iter (+/- 2,275)
test filter::benches::bench_horizontal_filter                                         ... bench:     908,493 ns/iter (+/- 9,717)
test filter::benches::bench_separable_filter                                          ... bench:     670,450 ns/iter (+/- 17,449)
test filter::benches::bench_vertical_filter                                           ... bench:     942,569 ns/iter (+/- 6,466)

theotherphil · 2024-05-27T13:48:08Z

@ripytide are your Master benchmarks using a 300x300 image or 500x500?

ripytide · 2024-05-27T13:51:55Z

Whatever the master branch is currently using, which looks like 300x300

theotherphil · 2024-05-27T13:54:58Z

Ah, so your master and 0.25 benchmarks aren’t comparing like for like as I confusingly switched from 500x500 to 300x300 images when I added the 5x5 and 7x7 filter benchmarks.

theotherphil · 2024-05-27T15:58:44Z

After reverting the use of filter_pixel, the current implementation of filter_clamped is still 3x slower than a simple hardcoded implementation for grayscale images and 3x3 kernels: #658

theotherphil · 2024-06-07T09:08:08Z

Benchmarks added. Created #664 to fix the performance issues with the current filter implementation.

cospectrum mentioned this issue May 27, 2024

Replace filter with the old implementation #654

Merged

theotherphil mentioned this issue Jun 7, 2024

Fix filter performance vs simple hardcoded implementation #664

Open

theotherphil closed this as completed Jun 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add hardcoded filter3x3 function to benchmark current generic filter function against #644

Add hardcoded filter3x3 function to benchmark current generic filter function against #644

theotherphil commented May 26, 2024

ripytide commented May 26, 2024

theotherphil commented May 26, 2024

cospectrum commented May 27, 2024

cospectrum commented May 27, 2024 •

edited

Loading

ripytide commented May 27, 2024

theotherphil commented May 27, 2024

cospectrum commented May 27, 2024

theotherphil commented May 27, 2024

theotherphil commented May 27, 2024 •

edited

Loading

ripytide commented May 27, 2024

theotherphil commented May 27, 2024

theotherphil commented May 27, 2024

theotherphil commented May 27, 2024

ripytide commented May 27, 2024

theotherphil commented May 27, 2024

ripytide commented May 27, 2024

theotherphil commented May 27, 2024

theotherphil commented May 27, 2024 •

edited

Loading

theotherphil commented Jun 7, 2024

Add hardcoded filter3x3 function to benchmark current generic filter function against #644

Add hardcoded filter3x3 function to benchmark current generic filter function against #644

Comments

theotherphil commented May 26, 2024

ripytide commented May 26, 2024

theotherphil commented May 26, 2024

cospectrum commented May 27, 2024

cospectrum commented May 27, 2024 • edited Loading

ripytide commented May 27, 2024

theotherphil commented May 27, 2024

cospectrum commented May 27, 2024

theotherphil commented May 27, 2024

theotherphil commented May 27, 2024 • edited Loading

ripytide commented May 27, 2024

theotherphil commented May 27, 2024

theotherphil commented May 27, 2024

theotherphil commented May 27, 2024

ripytide commented May 27, 2024

0.25

Master

theotherphil commented May 27, 2024

ripytide commented May 27, 2024

theotherphil commented May 27, 2024

theotherphil commented May 27, 2024 • edited Loading

theotherphil commented Jun 7, 2024

cospectrum commented May 27, 2024 •

edited

Loading

theotherphil commented May 27, 2024 •

edited

Loading

theotherphil commented May 27, 2024 •

edited

Loading