Replace `filter` with the old implementation #654

cospectrum · 2024-05-27T10:57:54Z

Related #644

ripytide · 2024-05-27T11:20:13Z

Can you show a benchmark comparison on your machine with a before/after this pr? I'll do the same on my machine as well.

theotherphil · 2024-05-27T11:31:30Z

@ripytide I've just added a benchmark comparison on my machine on #644 (comment) (edit: for v0.25 to master).

ripytide · 2024-05-27T11:35:50Z

My Results on a Ryzen 5 4500U:

 name                                                     before1 ns/iter  after1 ns/iter  diff ns/iter   diff %  speedup 
 filter::benches::bench_box_filter                        2,069,851        2,067,948             -1,903   -0.09%   x 1.00 
 filter::benches::bench_filter_clamped_gray_3x3           1,235,373        1,187,812            -47,561   -3.85%   x 1.04 
 filter::benches::bench_filter_clamped_gray_5x5           2,860,594        2,790,122            -70,472   -2.46%   x 1.03 
 filter::benches::bench_filter_clamped_gray_7x7           4,821,814        5,087,034            265,220    5.50%   x 0.95 
 filter::benches::bench_filter_clamped_parallel_gray_3x3  780,587          459,600             -320,987  -41.12%   x 1.70 
 filter::benches::bench_filter_clamped_parallel_gray_5x5  962,982          927,658              -35,324   -3.67%   x 1.04 
 filter::benches::bench_filter_clamped_parallel_gray_7x7  1,402,118        1,595,752            193,634   13.81%   x 0.88 
 filter::benches::bench_gaussian_f32_stdev_1              286,052          295,040                8,988    3.14%   x 0.97 
 filter::benches::bench_gaussian_f32_stdev_10             1,377,759        1,384,059              6,300    0.46%   x 1.00 
 filter::benches::bench_gaussian_f32_stdev_3              524,585          527,970                3,385    0.65%   x 0.99 
 filter::benches::bench_horizontal_filter                 902,045          913,429               11,384    1.26%   x 0.99 
 filter::benches::bench_separable_filter                  663,439          705,558               42,119    6.35%   x 0.94 
 filter::benches::bench_vertical_filter                   934,897          941,840                6,943    0.74%   x 0.99

theotherphil · 2024-05-27T11:39:51Z

@ripytide can you also try with the benchmarks listed on #644 (comment) so we can get a consistent comparison between my benchmarking and yours.

ripytide · 2024-05-27T11:41:29Z

So on my machine the only massive change is the parallel_3x3 which is 1.7x but everything else is quite similar and goes either way. But definitely not 4x.

cospectrum · 2024-05-27T11:54:06Z

use std::{hint::black_box, time::Instant};
use imageproc::image::RgbImage;

const TRIES: usize = 20;

fn main() {
    const W: u32 = 2600;
    const H: u32 = W;
    
    let img = black_box(RgbImage::new(W, H));
    let kernel = black_box([0i32; 3 * 3]);

    let t = Instant::now();
    for _ in 0..TRIES {
        let _: RgbImage = black_box(imageproc::filter::filter3x3(&img, &kernel));
    }
    dbg!(t.elapsed());

    let ker = imgproc::kernel::Kernel::new(&kernel, 3, 3);
    let t_new = Instant::now();
    for _ in 0..TRIES {
        let _: RgbImage = black_box(imgproc::filter::filter_clamped(&img, ker));
    }
    dbg!(t_new.elapsed());
}

[dependencies]
imageproc = "0.25.0"
imgproc = { git = "https://github.com/image-rs/imageproc", branch = "master", package = "imageproc" }

cospectrum · 2024-05-27T11:54:45Z

[src/main.rs:17:5] t.elapsed() = 2.099322542s
[src/main.rs:24:5] t_new.elapsed() = 8.245356375s

ripytide · 2024-05-27T12:02:21Z

Interesting, I wonder what the difference between that test and the 3x3 test in the benchmarks is that's making such a big difference.

theotherphil · 2024-05-27T12:16:04Z

The regressions are much larger for RGB images: #644 (comment)

Which might explain some of the difference between what your respective benchmarks are showing. I expect we're missing benchmarks on RGB images for quite a lot of functions in this crate!

cospectrum · 2024-05-27T12:26:45Z

@theotherphil By the way, what should happen if P::CHANNEL_COUNT != Q::CHANNEL_COUNT?

cospectrum · 2024-05-27T12:31:37Z

The zip will be trimmed to the minimum length.

ripytide · 2024-05-27T12:33:12Z

Should probably panic in that scenario, is there a compile-time assert function?

cospectrum · 2024-05-27T12:35:21Z

Compile time assert is const _: () = assert!(expr)

cospectrum · 2024-05-27T12:38:25Z

But we can't use this, for the same reason we can't allocate the stack by the number of channels

cospectrum · 2024-05-27T12:39:37Z

The only way to do it in stable Rust is to add typenum in Pixel trait, I think

ripytide · 2024-05-27T12:53:15Z

I wonder if we could use the tinyvec crate to remove heap-allocation from the previous implementation and what effect that would have on its performance. Or just use a 4-channel array and panic if the pixel type has greater than 4-channels which should be generic enough for most usecases. Or even if P::CHANNEL_COUNT == 1 {[0; 1]} else if P::CHANNEL_COUNT == 2 {[0; 2]} ... up to 4.

theotherphil · 2024-05-27T13:46:54Z

@cospectrum I’d go with a runtime assert to check that channel counts match. Seems like a less likely user error than many of the places we already use panics to check preconditions.

cospectrum · 2024-05-27T14:42:00Z

@cospectrum I’d go with a runtime assert to check that channel counts match. Seems like a less likely user error than many of the places we already use panics to check preconditions.

Done

theotherphil · 2024-05-27T15:14:51Z

@cospectrum , @ripytide as both the function signatures and the set of benchmarks have changed since 0.25 comparisons across the two versions are a bit of a chore.

So I'll merge this, check manually that performance now matches 0.25, and we can then use this commit as the baseline for future changes.

ripytide · 2024-05-27T15:39:48Z

src/kernel.rs

+where
+    K: Copy,
+{


Having bounds in struct definitions is usually not best practise, bounds should appear where they are used.

ripytide · 2024-05-27T15:41:14Z

src/kernel.rs

    #[inline]
-    pub fn at(&self, x: u32, y: u32) -> &K {
-        &self.data[(y * self.width + x) as usize]
+    pub fn get(&self, x: u32, y: u32) -> K {


Why did you rename at() to get()? This was discussed as not being a good idea as get() normally returns an option.

)" This reverts commit 1cbe8d6.

cospectrum added 3 commits May 27, 2024 13:54

replace filter with old impl

dcd27f2

merge master

f1aa33c

inline S::clamp instead of lambda

786f1cb

get unchecked

98fcdca

cospectrum added 2 commits May 27, 2024 14:33

add safety

a50bc1d

wrap comment

2d5a13b

ripytide mentioned this pull request May 27, 2024

Removed allocation from filter_pixel() #656

Closed

add assert for channels

8b57027

theotherphil merged commit 1cbe8d6 into image-rs:master May 27, 2024
15 checks passed

cospectrum deleted the filter branch May 27, 2024 15:35

ripytide reviewed May 27, 2024

View reviewed changes

ripytide added a commit to ripytide/imageproc that referenced this pull request May 27, 2024

Revert "Revert use of filter_pixel in filter functions (image-rs#654

36444ae

)" This reverts commit 1cbe8d6.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace `filter` with the old implementation #654

Replace `filter` with the old implementation #654

cospectrum commented May 27, 2024 •

edited

Loading

ripytide commented May 27, 2024

theotherphil commented May 27, 2024 •

edited

Loading

ripytide commented May 27, 2024

theotherphil commented May 27, 2024

ripytide commented May 27, 2024

cospectrum commented May 27, 2024 •

edited

Loading

cospectrum commented May 27, 2024

ripytide commented May 27, 2024 •

edited

Loading

theotherphil commented May 27, 2024

cospectrum commented May 27, 2024

cospectrum commented May 27, 2024

ripytide commented May 27, 2024

cospectrum commented May 27, 2024

cospectrum commented May 27, 2024

cospectrum commented May 27, 2024 •

edited

Loading

ripytide commented May 27, 2024

theotherphil commented May 27, 2024

cospectrum commented May 27, 2024

theotherphil commented May 27, 2024

ripytide May 27, 2024 •

edited

Loading

ripytide May 27, 2024

Replace filter with the old implementation #654

Replace filter with the old implementation #654

Conversation

cospectrum commented May 27, 2024 • edited Loading

ripytide commented May 27, 2024

theotherphil commented May 27, 2024 • edited Loading

ripytide commented May 27, 2024

theotherphil commented May 27, 2024

ripytide commented May 27, 2024

cospectrum commented May 27, 2024 • edited Loading

cospectrum commented May 27, 2024

ripytide commented May 27, 2024 • edited Loading

theotherphil commented May 27, 2024

cospectrum commented May 27, 2024

cospectrum commented May 27, 2024

ripytide commented May 27, 2024

cospectrum commented May 27, 2024

cospectrum commented May 27, 2024

cospectrum commented May 27, 2024 • edited Loading

ripytide commented May 27, 2024

theotherphil commented May 27, 2024

cospectrum commented May 27, 2024

theotherphil commented May 27, 2024

ripytide May 27, 2024 • edited Loading

Choose a reason for hiding this comment

ripytide May 27, 2024

Choose a reason for hiding this comment

Replace `filter` with the old implementation #654

Replace `filter` with the old implementation #654

cospectrum commented May 27, 2024 •

edited

Loading

theotherphil commented May 27, 2024 •

edited

Loading

cospectrum commented May 27, 2024 •

edited

Loading

ripytide commented May 27, 2024 •

edited

Loading

cospectrum commented May 27, 2024 •

edited

Loading

ripytide May 27, 2024 •

edited

Loading