-
-
Notifications
You must be signed in to change notification settings - Fork 421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modified fast_array_util.py for faster implementation #2940
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2940 +/- ##
==========================================
- Coverage 70.39% 70.02% -0.38%
==========================================
Files 222 222
Lines 16157 16161 +4
==========================================
- Hits 11373 11316 -57
- Misses 4784 4845 +61 ☔ View full report in Codecov by Sentry. |
Hi @andrewfullard @atharva-2001 @jvshields, |
*beep* *bop* Significantly changed benchmarks: All benchmarks: Benchmarks that have stayed the same:
| Change | Before [5e2d0bb3] <master> | After [6922871f] | Ratio | Benchmark (Parameter) |
|----------|------------------------------|---------------------|---------|-------------------------------------------------------------------------------------------------------------------------------------|
| | 3.23±0.4μs | 4.01±0.4μs | ~1.24 | transport_montecarlo_vpacket.BenchmarkMontecarloMontecarloNumbaVpacket.time_trace_vpacket_within_shell |
| | 3.87±0.02ms | 4.54±0.08ms | ~1.17 | opacities_opacity_state.BenchmarkOpacitiesOpacityState.time_opacity_state_initialize('macroatom') |
| | 3.40±0.4μs | 3.84±0.4μs | ~1.13 | transport_montecarlo_vpacket.BenchmarkMontecarloMontecarloNumbaVpacket.time_trace_bad_vpacket |
| | 6.69±0.9μs | 7.37±1μs | ~1.10 | transport_montecarlo_vpacket.BenchmarkMontecarloMontecarloNumbaVpacket.time_trace_vpacket |
| | 2.68±3μs | 2.26±1μs | ~0.84 | transport_montecarlo_estimators_radfield_estimator_calcs.BenchmarkMontecarloMontecarloNumbaPacket.time_update_line_estimators |
| | 28.2±8μs | 22.0±5μs | ~0.78 | transport_montecarlo_packet_trackers.BenchmarkTransportMontecarloPacketTrackers.time_generate_rpacket_last_interaction_tracker_list |
| | 2.62±0.5ms | 2.82±0.5ms | 1.08 | transport_montecarlo_single_packet_loop.BenchmarkTransportMontecarloSinglePacketLoop.time_single_packet_loop |
| | 63.8±0.2ms | 67.1±0.03ms | 1.05 | transport_montecarlo_packet_trackers.BenchmarkTransportMontecarloPacketTrackers.time_rpacket_trackers_to_dataframe |
| | 531±100ns | 551±200ns | 1.04 | opacities_opacity.BenchmarkMontecarloMontecarloNumbaOpacities.time_pair_creation_opacity_calculation |
| | 38.7±0.04s | 39.1±0.02s | 1.01 | run_tardis.BenchmarkRunTardis.time_run_tardis |
| | 41.5±30μs | 41.8±20μs | 1.01 | transport_montecarlo_interaction.BenchmarkTransportMontecarloInteraction.time_line_emission |
| | 1.05±0m | 1.05±0m | 1.00 | run_tardis.BenchmarkRunTardis.time_run_tardis_rpacket_tracking |
| | 2.08±0m | 2.08±0m | 1.00 | spectrum_formal_integral.BenchmarkTransportMontecarloFormalIntegral.time_FormalIntegrator_functions |
| | 205±0.4ns | 205±0.07ns | 1.00 | spectrum_formal_integral.BenchmarkTransportMontecarloFormalIntegral.time_intensity_black_body |
| | 1.68±0.02ms | 1.68±0.01ms | 1.00 | transport_montecarlo_main_loop.BenchmarkTransportMontecarloMontecarloMainLoop.time_montecarlo_main_loop |
| | 1.21±0.01μs | 1.20±0μs | 0.99 | transport_geometry_calculate_distances.BenchmarkTransportGeometryCalculateDistances.time_calculate_distance_boundary |
| | 37.9±0.08μs | 37.5±0.2μs | 0.99 | transport_montecarlo_packet_trackers.BenchmarkTransportMontecarloPacketTrackers.time_generate_rpacket_tracker_list |
| | 531±200ns | 521±100ns | 0.98 | opacities_opacity.BenchmarkMontecarloMontecarloNumbaOpacities.time_compton_opacity_calculation |
| | 2.79±0ms | 2.72±0.01ms | 0.98 | opacities_opacity_state.BenchmarkOpacitiesOpacityState.time_opacity_state_initialize('scatter') |
| | 561±200ns | 541±200ns | 0.96 | opacities_opacity.BenchmarkMontecarloMontecarloNumbaOpacities.time_photoabsorption_opacity_calculation |
| | 1.63±0.4μs | 1.57±0.4μs | 0.96 | transport_geometry_calculate_distances.BenchmarkTransportGeometryCalculateDistances.time_calculate_distance_line |
| | 45.4±30μs | 43.7±30μs | 0.96 | transport_montecarlo_interaction.BenchmarkTransportMontecarloInteraction.time_line_scatter |
| | 758±2ns | 723±0.6ns | 0.95 | transport_montecarlo_interaction.BenchmarkTransportMontecarloInteraction.time_thomson_scatter |
| | 7.67±2μs | 7.25±2μs | 0.95 | transport_montecarlo_vpacket.BenchmarkMontecarloMontecarloNumbaVpacket.time_trace_vpacket_volley |
If you want to see the graph of the results, you can check it here |
Hi @andrewfullard , |
Hi @andrewfullard @KasukabeDefenceForce , |
Please demonstrate the speedup using |
@andrewfullard ,
# It is currently not possible to use scipy.integrate.cumulative_trapezoid in
# numba. So here is my own implementation.
import numpy as np
from numba import njit, prange
import timeit
from tardis.transport.montecarlo import njit_dict
@njit(**njit_dict)
def numba_cumulative_trapezoid(f, x):
"""
Cumulatively integrate f(x) using the composite trapezoidal rule.
Parameters
----------
f : numpy.ndarray, dtype float
Input array to integrate.
x : numpy.ndarray, dtype float
The coordinate to integrate along.
Returns
-------
numpy.ndarray, dtype float
The result of cumulative integration of f along x
"""
integ = (np.diff(x) * (f[1:] + f[:-1]) / 2.0).cumsum()
return integ / integ[-1]
@njit(**njit_dict)
def cumulative_integrate_array_by_blocks(f, x, block_references):
"""
Cumulatively integrate a function over blocks.
This function cumulatively integrates a function `f` defined at
locations `x` over blocks given in `block_references`.
Parameters
----------
f : numpy.ndarray, dtype float
Input array to integrate. Shape is (N_freq, N_shells), where
N_freq is the number of frequency values and N_shells is the number
of computational shells.
x : numpy.ndarray, dtype float
The sample points corresponding to the `f` values. Shape is (N_freq,).
block_references : numpy.ndarray, dtype int
The start indices of the blocks to be integrated. Shape is (N_blocks,).
Returns
-------
numpy.ndarray, dtype float
Array with cumulatively integrated values. Shape is (N_freq, N_shells)
same as f.
"""
n_rows = len(block_references) - 1
integrated = np.zeros_like(f)
for i in prange(f.shape[1]): # columns
# TODO: Avoid this loop through vectorization of cumulative_trapezoid
for j in prange(n_rows): # rows
start = block_references[j]
stop = block_references[j + 1]
integrated[start + 1 : stop, i] = numba_cumulative_trapezoid(
f[start:stop, i], x[start:stop]
)
return integrated
@njit(**njit_dict)
def numba_cumulative_trapezoid2(f, x):
"""
Cumulatively integrate f(x) using the composite trapezoidal rule with manual loop.
"""
n = len(f)
integ = np.zeros(n, dtype=np.float64)
for i in range(1, n):
dx = x[i] - x[i-1]
integ[i] = integ[i-1] + (f[i] + f[i-1]) * dx / 2.0
return integ / integ[-1]
@njit(**njit_dict)
def cumulative_integrate_array_by_blocks2(f, x, block_references):
"""
Cumulatively integrate a function over blocks using a manual loop-based trapezoidal rule.
"""
n_blocks = len(block_references) - 1
integrated = np.zeros_like(f)
for i in prange(f.shape[1]):
for j in prange(n_blocks):
start = block_references[j]
stop = block_references[j + 1]
integrated[start:stop, i] = numba_cumulative_trapezoid2(f[start:stop, i], x[start:stop])
return integrated
N_freq = 100000
N_shells = 1000
N_blocks = 30
x = np.linspace(0, 10, N_freq)
f = np.random.random((N_freq, N_shells))
block_references = np.linspace(0, N_freq, N_blocks + 1, dtype=int)
cumulative_integrate_array_by_blocks(f, x, block_references)
cumulative_integrate_array_by_blocks2(f, x, block_references)
original_time = timeit.timeit(
'cumulative_integrate_array_by_blocks(f, x, block_references)',
globals=globals(),
number=10
)
optimized_time = timeit.timeit(
'cumulative_integrate_array_by_blocks2(f, x, block_references)',
globals=globals(),
number=10
)
print(f"Original implementation average time: {original_time / 10:.4f} seconds")
print(f"Optimized implementation average time: {optimized_time / 10:.4f} seconds")
original_result = cumulative_integrate_array_by_blocks(f, x, block_references)
optimized_result = cumulative_integrate_array_by_blocks2(f, x, block_references)
assert np.allclose(original_result, optimized_result), "Results do not match!"
print("Both implementations give the same result.")
print("Original result (first 10 values):", original_result[:10,:10])
print("Optimized result (first 10 values):", optimized_result[:10,:10]) Output for running my test script is: Original implementation average time: 3.0688 seconds
Optimized implementation average time: 2.8569 seconds
Both implementations give the same result.
Original result (first 10 values): [[0. 0. 0. 0. 0. 0.
0. 0. 0. 0. ]
[0.00021059 0.00014415 0.00035398 0.00019182 0.00019879 0.00037928
0.00023938 0.00032636 0.00030593 0.00035552]
[0.00035765 0.00041922 0.0007794 0.00043033 0.00054467 0.00085113
0.00035781 0.00068471 0.0005169 0.00071021]
[0.00070914 0.00068628 0.00123511 0.00076725 0.00087767 0.00110585
0.00049022 0.00107013 0.00080128 0.00113117]
[0.00097978 0.00093434 0.00170531 0.00111699 0.00098203 0.00141127
0.00080311 0.00154914 0.00104561 0.00134419]
[0.00127825 0.00107934 0.00217531 0.00152452 0.00106555 0.00171593
0.00109628 0.0019187 0.00114761 0.0015953 ]
[0.0017717 0.00108791 0.00254289 0.00193694 0.00120367 0.00189517
0.00121426 0.00214087 0.00123347 0.00182233]
[0.00207377 0.00124733 0.00272775 0.00228858 0.00145503 0.00221285
0.00151144 0.00233141 0.00126556 0.00191921]
[0.00241064 0.0016145 0.00292188 0.00261326 0.00173571 0.0025099
0.00177101 0.00268684 0.00154277 0.00201923]
[0.00296984 0.00201482 0.00324683 0.00291908 0.00198809 0.00276275
0.00196604 0.00317937 0.00204687 0.00223908]]
Optimized result (first 10 values): [[0. 0. 0. 0. 0. 0.
0. 0. 0. 0. ]
[0.00021059 0.00014415 0.00035398 0.00019182 0.00019879 0.00037928
0.00023938 0.00032636 0.00030593 0.00035552]
[0.00035765 0.00041922 0.0007794 0.00043033 0.00054467 0.00085113
0.00035781 0.00068471 0.0005169 0.00071021]
[0.00070914 0.00068628 0.00123511 0.00076725 0.00087767 0.00110585
0.00049022 0.00107013 0.00080128 0.00113117]
[0.00097978 0.00093434 0.00170531 0.00111699 0.00098203 0.00141127
0.00080311 0.00154914 0.00104561 0.00134419]
[0.00127825 0.00107934 0.00217531 0.00152452 0.00106555 0.00171593
0.00109628 0.0019187 0.00114761 0.0015953 ]
[0.0017717 0.00108791 0.00254289 0.00193694 0.00120367 0.00189517
0.00121426 0.00214087 0.00123347 0.00182233]
[0.00207377 0.00124733 0.00272775 0.00228858 0.00145503 0.00221285
0.00151144 0.00233141 0.00126556 0.00191921]
[0.00241064 0.0016145 0.00292188 0.00261326 0.00173571 0.0025099
0.00177101 0.00268684 0.00154277 0.00201923]
[0.00296984 0.00201482 0.00324683 0.00291908 0.00198809 0.00276275
0.00196604 0.00317937 0.00204687 0.00223908]] |
So the speedup is ~7% then. I think you should revisit your solution. |
📝 Description
Type: 🪲
bugfix
| 🚀feature
numba_cumulative_trapezoid
method modified to integrate using a manual for loop.Fixes #2757
📌 Resources
Examples, notebooks, and links to useful references.
🚦 Testing
How did you test these changes?
I have tested on a random data, I have compared both, the original and the optimized implementation, Here is the snippet:
The snippets show that the values of outputs with both methods is exactly same even upto high precision and also the optimised implementation is more than
~50%
faster.☑️ Checklist
build_docs
label