Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace filter with the old implementation #654

Merged
merged 7 commits into from
May 27, 2024

Conversation

cospectrum
Copy link
Contributor

@cospectrum cospectrum commented May 27, 2024

Related #644

@ripytide
Copy link
Member

Can you show a benchmark comparison on your machine with a before/after this pr? I'll do the same on my machine as well.

@theotherphil
Copy link
Contributor

theotherphil commented May 27, 2024

@ripytide I've just added a benchmark comparison on my machine on #644 (comment) (edit: for v0.25 to master).

@ripytide
Copy link
Member

My Results on a Ryzen 5 4500U:

 name                                                     before1 ns/iter  after1 ns/iter  diff ns/iter   diff %  speedup 
 filter::benches::bench_box_filter                        2,069,851        2,067,948             -1,903   -0.09%   x 1.00 
 filter::benches::bench_filter_clamped_gray_3x3           1,235,373        1,187,812            -47,561   -3.85%   x 1.04 
 filter::benches::bench_filter_clamped_gray_5x5           2,860,594        2,790,122            -70,472   -2.46%   x 1.03 
 filter::benches::bench_filter_clamped_gray_7x7           4,821,814        5,087,034            265,220    5.50%   x 0.95 
 filter::benches::bench_filter_clamped_parallel_gray_3x3  780,587          459,600             -320,987  -41.12%   x 1.70 
 filter::benches::bench_filter_clamped_parallel_gray_5x5  962,982          927,658              -35,324   -3.67%   x 1.04 
 filter::benches::bench_filter_clamped_parallel_gray_7x7  1,402,118        1,595,752            193,634   13.81%   x 0.88 
 filter::benches::bench_gaussian_f32_stdev_1              286,052          295,040                8,988    3.14%   x 0.97 
 filter::benches::bench_gaussian_f32_stdev_10             1,377,759        1,384,059              6,300    0.46%   x 1.00 
 filter::benches::bench_gaussian_f32_stdev_3              524,585          527,970                3,385    0.65%   x 0.99 
 filter::benches::bench_horizontal_filter                 902,045          913,429               11,384    1.26%   x 0.99 
 filter::benches::bench_separable_filter                  663,439          705,558               42,119    6.35%   x 0.94 
 filter::benches::bench_vertical_filter                   934,897          941,840                6,943    0.74%   x 0.99 

@theotherphil
Copy link
Contributor

@ripytide can you also try with the benchmarks listed on #644 (comment) so we can get a consistent comparison between my benchmarking and yours.

@ripytide
Copy link
Member

So on my machine the only massive change is the parallel_3x3 which is 1.7x but everything else is quite similar and goes either way. But definitely not 4x.

@cospectrum
Copy link
Contributor Author

cospectrum commented May 27, 2024

use std::{hint::black_box, time::Instant};
use imageproc::image::RgbImage;

const TRIES: usize = 20;

fn main() {
    const W: u32 = 2600;
    const H: u32 = W;
    
    let img = black_box(RgbImage::new(W, H));
    let kernel = black_box([0i32; 3 * 3]);

    let t = Instant::now();
    for _ in 0..TRIES {
        let _: RgbImage = black_box(imageproc::filter::filter3x3(&img, &kernel));
    }
    dbg!(t.elapsed());

    let ker = imgproc::kernel::Kernel::new(&kernel, 3, 3);
    let t_new = Instant::now();
    for _ in 0..TRIES {
        let _: RgbImage = black_box(imgproc::filter::filter_clamped(&img, ker));
    }
    dbg!(t_new.elapsed());
}
[dependencies]
imageproc = "0.25.0"
imgproc = { git = "https://github.com/image-rs/imageproc", branch = "master", package = "imageproc" }

@cospectrum
Copy link
Contributor Author

[src/main.rs:17:5] t.elapsed() = 2.099322542s
[src/main.rs:24:5] t_new.elapsed() = 8.245356375s

@ripytide
Copy link
Member

ripytide commented May 27, 2024

Interesting, I wonder what the difference between that test and the 3x3 test in the benchmarks is that's making such a big difference.

@theotherphil
Copy link
Contributor

The regressions are much larger for RGB images: #644 (comment)

Which might explain some of the difference between what your respective benchmarks are showing. I expect we're missing benchmarks on RGB images for quite a lot of functions in this crate!

@cospectrum
Copy link
Contributor Author

@theotherphil By the way, what should happen if P::CHANNEL_COUNT != Q::CHANNEL_COUNT?

@cospectrum
Copy link
Contributor Author

The zip will be trimmed to the minimum length.

@ripytide
Copy link
Member

Should probably panic in that scenario, is there a compile-time assert function?

@cospectrum
Copy link
Contributor Author

Compile time assert is const _: () = assert!(expr)

@cospectrum
Copy link
Contributor Author

But we can't use this, for the same reason we can't allocate the stack by the number of channels

@cospectrum
Copy link
Contributor Author

cospectrum commented May 27, 2024

The only way to do it in stable Rust is to add typenum in Pixel trait, I think

@ripytide
Copy link
Member

I wonder if we could use the tinyvec crate to remove heap-allocation from the previous implementation and what effect that would have on its performance. Or just use a 4-channel array and panic if the pixel type has greater than 4-channels which should be generic enough for most usecases. Or even if P::CHANNEL_COUNT == 1 {[0; 1]} else if P::CHANNEL_COUNT == 2 {[0; 2]} ... up to 4.

@theotherphil
Copy link
Contributor

@cospectrum I’d go with a runtime assert to check that channel counts match. Seems like a less likely user error than many of the places we already use panics to check preconditions.

@cospectrum
Copy link
Contributor Author

@cospectrum I’d go with a runtime assert to check that channel counts match. Seems like a less likely user error than many of the places we already use panics to check preconditions.

Done

@theotherphil
Copy link
Contributor

@cospectrum , @ripytide as both the function signatures and the set of benchmarks have changed since 0.25 comparisons across the two versions are a bit of a chore.

So I'll merge this, check manually that performance now matches 0.25, and we can then use this commit as the baseline for future changes.

@theotherphil theotherphil merged commit 1cbe8d6 into image-rs:master May 27, 2024
15 checks passed
@cospectrum cospectrum deleted the filter branch May 27, 2024 15:35
Comment on lines +6 to +8
where
K: Copy,
{
Copy link
Member

@ripytide ripytide May 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having bounds in struct definitions is usually not best practise, bounds should appear where they are used.

#[inline]
pub fn at(&self, x: u32, y: u32) -> &K {
&self.data[(y * self.width + x) as usize]
pub fn get(&self, x: u32, y: u32) -> K {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you rename at() to get()? This was discussed as not being a good idea as get() normally returns an option.

ripytide added a commit to ripytide/imageproc that referenced this pull request May 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants