-
Notifications
You must be signed in to change notification settings - Fork 630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple performance regressions in 3.3.1 compared to 3.2.1 #1915
Comments
thanks for that testing - any way you could contribute the wrapper scripts you used to create the above? (i see the separate repo, just thinking it'd be good to integrate into the test metrics) It would appear dwa (which is an encoding I don't ever use) has been hard hit, will take a look at that. As you point out, it is likely something that has inadvertently changed about the compression routines specifically for piz and dwa, as that would be the only reason for a regression on the encoding side. |
The results above were generated by running I agree it would be good to have test metrics integrated. I'll look into adding a more comprehensive testing mode to |
I've a got some local code that adds benchmarking mode in In the meantime, here are some results from testing on |
it took a while as I went into a rabbit hole to make dwa very much better, but the huf and dwa compression routines (encode / decode) were not as fast as they used to be. The huffman coding regression was because I removed some memory cast / copy undefined behavior which turned out to be a little faster. I've instead made other changes to make the routine faster and still left the original fix for undefined behavior. The dwa one is a bit more mysterious, as when profiling, it was dominated by a memory bound lookup into the big giant table for quantizing values, so am not sure why that one regressed given the code didn't really change between 3.2 and 3.3 other than move from c++ to the c library version. However, I have just put in a pull request to remove that table lookup, so writing dwa should be significantly faster than before (0.23s -> 0.15s for a frame for me on an avx2 / f16c enabled compile). The old version of the c++ library used to use threads (sort of) even when threads was set to 0 for reading, resulting in some misleading results on the read side. However, given how the encode and decode pipelines work right now, it doesn't surprise me that it caps out at core counts vs thread counts. This could be improved in the future with a slight rework of how the pipelines are evaluated by threads. Thanks again for all the testing.. |
This is designed to help with performance regression testing and has been used to gather data for issue AcademySoftwareFoundation#1915. Signed-off-by: Peter Urbanec <[email protected]>
This is designed to help with performance regression testing and has been used to gather data for issue AcademySoftwareFoundation#1915. Signed-off-by: Peter Urbanec <[email protected]>
This is designed to help with performance regression testing and has been used to gather data for issue AcademySoftwareFoundation#1915. Signed-off-by: Peter Urbanec <[email protected]>
This is designed to help with performance regression testing and has been used to gather data for issue AcademySoftwareFoundation#1915. Signed-off-by: Peter Urbanec <[email protected]>
There appear to be multiple performance regressions in OpenEXR 3.3.1 when compared to OpenEXR 3.2.1. These regressions were first observed in real-world use in an application. To facilitate regression testing, I have backported
exrmetrics
to build against OpenEXR 3.2.1 and created a simple series of tests. Please see https://github.com/peterurbanec/exr_perfTo summarise, largest real-world performance losses are observed with DWA compression, where the slow down is up to 58% when writing half images. Results from exrmetrics also show slowdowns with DWA, although at only 15%.
More details, taken from https://github.com/peterurbanec/exr_perf/README.md
Change in performance, measured with backported version of exrmetrics from this repository
Quantity measured is time taken for one frame, therefore positive Δ % quantities indicate
a performance regression. These figures were collected on AMD Ryzen Threadripper PRO 3975WX
running Linux
Change in performance as observed in a real-world application. Quantity measured is frames
per second, therefore a negative Δ % is a performance regression. These figures were collected
on i7-13800H running Windows.
Rows marked with * are of particular concern.
The text was updated successfully, but these errors were encountered: