Modified fast_array_util.py for faster implementation #2940

Sonu0305 · 2025-01-11T15:56:32Z

📝 Description

Type: 🪲 bugfix | 🚀 feature

numba_cumulative_trapezoid method modified to integrate using a manual for loop.

Fixes #2757

📌 Resources

Examples, notebooks, and links to useful references.

🚦 Testing

How did you test these changes?

Testing pipeline
Other method (describe)
My changes can't be tested (explain why)

I have tested on a random data, I have compared both, the original and the optimized implementation, Here is the snippet:
The snippets show that the values of outputs with both methods is exactly same even upto high precision and also the optimised implementation is more than ~50% faster.

☑️ Checklist

I requested two reviewers for this pull request
I updated the documentation according to my changes
I built the documentation by applying the build_docs label

Note: If you are not allowed to perform any of these actions, ping (@) a contributor.

This reverts commit 55a190d.

tardis-bot · 2025-01-11T15:58:37Z

*beep* *bop*
Hi human,
I ran ruff on the latest commit (6922871).
Here are the outputs produced.
Results can also be downloaded as artifacts here.
Summarised output:

Complete output(might be large):

All checks passed!

codecov · 2025-01-11T16:11:38Z

Codecov Report

Attention: Patch coverage is 0% with 6 lines in your changes missing coverage. Please review.

Project coverage is 70.02%. Comparing base (2a06fdf) to head (6922871).
Report is 2 commits behind head on master.

Files with missing lines	Patch %	Lines
.../properties/continuum_processes/fast_array_util.py	0.00%	6 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2940      +/-   ##
==========================================
- Coverage   70.39%   70.02%   -0.38%     
==========================================
  Files         222      222              
  Lines       16157    16161       +4     
==========================================
- Hits        11373    11316      -57     
- Misses       4784     4845      +61

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sonu0305 · 2025-01-11T16:20:26Z

Hi @andrewfullard @atharva-2001 @jvshields,
Could you please review when you have a chance, Thank You.

tardis-bot · 2025-01-11T16:53:16Z

*beep* *bop*
Hi human,
I ran benchmarks as you asked comparing master (2a06fdf) and the latest commit (6922871).
Here are the logs produced by ASV.
Results can also be downloaded as artifacts here.

Significantly changed benchmarks:

All benchmarks:

Benchmarks that have stayed the same:

| Change   | Before [5e2d0bb3] <master>   | After [6922871f]    | Ratio   | Benchmark (Parameter)                                                                                                               |
|----------|------------------------------|---------------------|---------|-------------------------------------------------------------------------------------------------------------------------------------|
|          | 3.23±0.4μs                   | 4.01±0.4μs          | ~1.24   | transport_montecarlo_vpacket.BenchmarkMontecarloMontecarloNumbaVpacket.time_trace_vpacket_within_shell                              |
|          | 3.87±0.02ms                  | 4.54±0.08ms         | ~1.17   | opacities_opacity_state.BenchmarkOpacitiesOpacityState.time_opacity_state_initialize('macroatom')                                   |
|          | 3.40±0.4μs                   | 3.84±0.4μs          | ~1.13   | transport_montecarlo_vpacket.BenchmarkMontecarloMontecarloNumbaVpacket.time_trace_bad_vpacket                                       |
|          | 6.69±0.9μs                   | 7.37±1μs            | ~1.10   | transport_montecarlo_vpacket.BenchmarkMontecarloMontecarloNumbaVpacket.time_trace_vpacket                                           |
|          | 2.68±3μs                     | 2.26±1μs            | ~0.84   | transport_montecarlo_estimators_radfield_estimator_calcs.BenchmarkMontecarloMontecarloNumbaPacket.time_update_line_estimators       |
|          | 28.2±8μs                     | 22.0±5μs            | ~0.78   | transport_montecarlo_packet_trackers.BenchmarkTransportMontecarloPacketTrackers.time_generate_rpacket_last_interaction_tracker_list |
|          | 2.62±0.5ms                   | 2.82±0.5ms          | 1.08    | transport_montecarlo_single_packet_loop.BenchmarkTransportMontecarloSinglePacketLoop.time_single_packet_loop                        |
|          | 63.8±0.2ms                   | 67.1±0.03ms         | 1.05    | transport_montecarlo_packet_trackers.BenchmarkTransportMontecarloPacketTrackers.time_rpacket_trackers_to_dataframe                  |
|          | 531±100ns                    | 551±200ns           | 1.04    | opacities_opacity.BenchmarkMontecarloMontecarloNumbaOpacities.time_pair_creation_opacity_calculation                                |
|          | 38.7±0.04s                   | 39.1±0.02s          | 1.01    | run_tardis.BenchmarkRunTardis.time_run_tardis                                                                                       |
|          | 41.5±30μs                    | 41.8±20μs           | 1.01    | transport_montecarlo_interaction.BenchmarkTransportMontecarloInteraction.time_line_emission                                         |
|          | 1.05±0m                      | 1.05±0m             | 1.00    | run_tardis.BenchmarkRunTardis.time_run_tardis_rpacket_tracking                                                                      |
|          | 2.08±0m                      | 2.08±0m             | 1.00    | spectrum_formal_integral.BenchmarkTransportMontecarloFormalIntegral.time_FormalIntegrator_functions                                 |
|          | 205±0.4ns                    | 205±0.07ns          | 1.00    | spectrum_formal_integral.BenchmarkTransportMontecarloFormalIntegral.time_intensity_black_body                                       |
|          | 1.68±0.02ms                  | 1.68±0.01ms         | 1.00    | transport_montecarlo_main_loop.BenchmarkTransportMontecarloMontecarloMainLoop.time_montecarlo_main_loop                             |
|          | 1.21±0.01μs                  | 1.20±0μs            | 0.99    | transport_geometry_calculate_distances.BenchmarkTransportGeometryCalculateDistances.time_calculate_distance_boundary                |
|          | 37.9±0.08μs                  | 37.5±0.2μs          | 0.99    | transport_montecarlo_packet_trackers.BenchmarkTransportMontecarloPacketTrackers.time_generate_rpacket_tracker_list                  |
|          | 531±200ns                    | 521±100ns           | 0.98    | opacities_opacity.BenchmarkMontecarloMontecarloNumbaOpacities.time_compton_opacity_calculation                                      |
|          | 2.79±0ms                     | 2.72±0.01ms         | 0.98    | opacities_opacity_state.BenchmarkOpacitiesOpacityState.time_opacity_state_initialize('scatter')                                     |
|          | 561±200ns                    | 541±200ns           | 0.96    | opacities_opacity.BenchmarkMontecarloMontecarloNumbaOpacities.time_photoabsorption_opacity_calculation                              |
|          | 1.63±0.4μs                   | 1.57±0.4μs          | 0.96    | transport_geometry_calculate_distances.BenchmarkTransportGeometryCalculateDistances.time_calculate_distance_line                    |
|          | 45.4±30μs                    | 43.7±30μs           | 0.96    | transport_montecarlo_interaction.BenchmarkTransportMontecarloInteraction.time_line_scatter                                          |
|          | 758±2ns                      | 723±0.6ns           | 0.95    | transport_montecarlo_interaction.BenchmarkTransportMontecarloInteraction.time_thomson_scatter                                       |
|          | 7.67±2μs                     | 7.25±2μs            | 0.95    | transport_montecarlo_vpacket.BenchmarkMontecarloMontecarloNumbaVpacket.time_trace_vpacket_volley                                    |

If you want to see the graph of the results, you can check it here

Sonu0305 · 2025-01-13T20:05:54Z

Hi @andrewfullard ,
The tests that are failing seem to be unrelated to the changes that i made, what should I do?
Two tests are failing not continuum ubuntu-latest and not continuum macos-latest.
I realised that this is I think because of #2800 being worked upon, it will all pass once it is merged, Thank You.

Sonu0305 · 2025-01-14T12:35:03Z

Hi @andrewfullard @KasukabeDefenceForce ,
The tests are successfully passing now, please review when you have a chance, Thank You.

andrewfullard · 2025-01-21T16:43:38Z

Please demonstrate the speedup using timeit. Consider the impact of numba compile time.

Sonu0305 · 2025-01-21T18:57:35Z

Please demonstrate the speedup using timeit. Consider the impact of numba compile time.

@andrewfullard ,
My script for testing this changes is,

test_fast.py:

# It is currently not possible to use scipy.integrate.cumulative_trapezoid in
# numba. So here is my own implementation.
import numpy as np
from numba import njit, prange
import timeit
from tardis.transport.montecarlo import njit_dict


@njit(**njit_dict)
def numba_cumulative_trapezoid(f, x):
    """
    Cumulatively integrate f(x) using the composite trapezoidal rule.

    Parameters
    ----------
    f : numpy.ndarray, dtype float
        Input array to integrate.
    x : numpy.ndarray, dtype float
        The coordinate to integrate along.

    Returns
    -------
    numpy.ndarray, dtype float
        The result of cumulative integration of f along x
    """
    integ = (np.diff(x) * (f[1:] + f[:-1]) / 2.0).cumsum()
    return integ / integ[-1]


@njit(**njit_dict)
def cumulative_integrate_array_by_blocks(f, x, block_references):
    """
    Cumulatively integrate a function over blocks.

    This function cumulatively integrates a function `f` defined at
    locations `x` over blocks given in `block_references`.

    Parameters
    ----------
    f : numpy.ndarray, dtype float
        Input array to integrate. Shape is (N_freq, N_shells), where
        N_freq is the number of frequency values and N_shells is the number
        of computational shells.
    x : numpy.ndarray, dtype float
        The sample points corresponding to the `f` values. Shape is (N_freq,).
    block_references : numpy.ndarray, dtype int
        The start indices of the blocks to be integrated. Shape is (N_blocks,).

    Returns
    -------
    numpy.ndarray, dtype float
        Array with cumulatively integrated values. Shape is (N_freq, N_shells)
        same as f.
    """
    n_rows = len(block_references) - 1
    integrated = np.zeros_like(f)
    for i in prange(f.shape[1]):  # columns
        # TODO: Avoid this loop through vectorization of cumulative_trapezoid
        for j in prange(n_rows):  # rows
            start = block_references[j]
            stop = block_references[j + 1]
            integrated[start + 1 : stop, i] = numba_cumulative_trapezoid(
                f[start:stop, i], x[start:stop]
            )
    return integrated

@njit(**njit_dict)
def numba_cumulative_trapezoid2(f, x):
    """
    Cumulatively integrate f(x) using the composite trapezoidal rule with manual loop.
    """
    n = len(f)
    integ = np.zeros(n, dtype=np.float64)
    for i in range(1, n):
        dx = x[i] - x[i-1]
        integ[i] = integ[i-1] + (f[i] + f[i-1]) * dx / 2.0
    return integ / integ[-1]

@njit(**njit_dict)
def cumulative_integrate_array_by_blocks2(f, x, block_references):
    """
    Cumulatively integrate a function over blocks using a manual loop-based trapezoidal rule.
    """
    n_blocks = len(block_references) - 1
    integrated = np.zeros_like(f)
    
    for i in prange(f.shape[1]):
        for j in prange(n_blocks):
            start = block_references[j]
            stop = block_references[j + 1]
            integrated[start:stop, i] = numba_cumulative_trapezoid2(f[start:stop, i], x[start:stop])
            
    return integrated

N_freq = 100000
N_shells = 1000
N_blocks = 30

x = np.linspace(0, 10, N_freq)
f = np.random.random((N_freq, N_shells))
block_references = np.linspace(0, N_freq, N_blocks + 1, dtype=int)

cumulative_integrate_array_by_blocks(f, x, block_references)
cumulative_integrate_array_by_blocks2(f, x, block_references)

original_time = timeit.timeit(
    'cumulative_integrate_array_by_blocks(f, x, block_references)',
    globals=globals(),
    number=10 
)

optimized_time = timeit.timeit(
    'cumulative_integrate_array_by_blocks2(f, x, block_references)',
    globals=globals(),
    number=10
)

print(f"Original implementation average time: {original_time / 10:.4f} seconds")
print(f"Optimized implementation average time: {optimized_time / 10:.4f} seconds")

original_result = cumulative_integrate_array_by_blocks(f, x, block_references)
optimized_result = cumulative_integrate_array_by_blocks2(f, x, block_references)
assert np.allclose(original_result, optimized_result), "Results do not match!"
print("Both implementations give the same result.")

print("Original result (first 10 values):", original_result[:10,:10])
print("Optimized result (first 10 values):", optimized_result[:10,:10])

Output for running my test script is:

Original implementation average time: 3.0688 seconds
Optimized implementation average time: 2.8569 seconds
Both implementations give the same result.
Original result (first 10 values): [[0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.        ]
 [0.00021059 0.00014415 0.00035398 0.00019182 0.00019879 0.00037928
  0.00023938 0.00032636 0.00030593 0.00035552]
 [0.00035765 0.00041922 0.0007794  0.00043033 0.00054467 0.00085113
  0.00035781 0.00068471 0.0005169  0.00071021]
 [0.00070914 0.00068628 0.00123511 0.00076725 0.00087767 0.00110585
  0.00049022 0.00107013 0.00080128 0.00113117]
 [0.00097978 0.00093434 0.00170531 0.00111699 0.00098203 0.00141127
  0.00080311 0.00154914 0.00104561 0.00134419]
 [0.00127825 0.00107934 0.00217531 0.00152452 0.00106555 0.00171593
  0.00109628 0.0019187  0.00114761 0.0015953 ]
 [0.0017717  0.00108791 0.00254289 0.00193694 0.00120367 0.00189517
  0.00121426 0.00214087 0.00123347 0.00182233]
 [0.00207377 0.00124733 0.00272775 0.00228858 0.00145503 0.00221285
  0.00151144 0.00233141 0.00126556 0.00191921]
 [0.00241064 0.0016145  0.00292188 0.00261326 0.00173571 0.0025099
  0.00177101 0.00268684 0.00154277 0.00201923]
 [0.00296984 0.00201482 0.00324683 0.00291908 0.00198809 0.00276275
  0.00196604 0.00317937 0.00204687 0.00223908]]
Optimized result (first 10 values): [[0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.        ]
 [0.00021059 0.00014415 0.00035398 0.00019182 0.00019879 0.00037928
  0.00023938 0.00032636 0.00030593 0.00035552]
 [0.00035765 0.00041922 0.0007794  0.00043033 0.00054467 0.00085113
  0.00035781 0.00068471 0.0005169  0.00071021]
 [0.00070914 0.00068628 0.00123511 0.00076725 0.00087767 0.00110585
  0.00049022 0.00107013 0.00080128 0.00113117]
 [0.00097978 0.00093434 0.00170531 0.00111699 0.00098203 0.00141127
  0.00080311 0.00154914 0.00104561 0.00134419]
 [0.00127825 0.00107934 0.00217531 0.00152452 0.00106555 0.00171593
  0.00109628 0.0019187  0.00114761 0.0015953 ]
 [0.0017717  0.00108791 0.00254289 0.00193694 0.00120367 0.00189517
  0.00121426 0.00214087 0.00123347 0.00182233]
 [0.00207377 0.00124733 0.00272775 0.00228858 0.00145503 0.00221285
  0.00151144 0.00233141 0.00126556 0.00191921]
 [0.00241064 0.0016145  0.00292188 0.00261326 0.00173571 0.0025099
  0.00177101 0.00268684 0.00154277 0.00201923]
 [0.00296984 0.00201482 0.00324683 0.00291908 0.00198809 0.00276275
  0.00196604 0.00317937 0.00204687 0.00223908]]

andrewfullard · 2025-01-21T19:33:08Z

So the speedup is ~7% then. I think you should revisit your solution.

Sonu0305 added 5 commits January 2, 2025 23:26

for test

55a190d

Merge branch 'tardis-sn:master' into master

fee6b5f

Merge branch 'tardis-sn:master' into master

b6ce897

Revert "for test"

f0b4a9d

This reverts commit 55a190d.

modifications for faster implementation

bcaafce

Sonu0305 changed the title ~~Fix#2757~~ Modified fast_array_util.py for faster implementations Jan 11, 2025

Sonu0305 changed the title ~~Modified fast_array_util.py for faster implementations~~ Modified fast_array_util.py for faster implementation Jan 11, 2025

andrewfullard added the speedup label Jan 13, 2025

Merge branch 'tardis-sn:master' into fix#2757

3913efb

Merge branch 'tardis-sn:master' into fix#2757

6922871

andrewfullard added the GSoC 🌞 label Jan 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modified fast_array_util.py for faster implementation #2940

Modified fast_array_util.py for faster implementation #2940

Sonu0305 commented Jan 11, 2025

tardis-bot commented Jan 11, 2025 •

edited

Loading

codecov bot commented Jan 11, 2025 •

edited

Loading

Sonu0305 commented Jan 11, 2025

tardis-bot commented Jan 11, 2025 •

edited

Loading

Sonu0305 commented Jan 13, 2025 •

edited

Loading

Sonu0305 commented Jan 14, 2025 •

edited

Loading

andrewfullard commented Jan 21, 2025

Sonu0305 commented Jan 21, 2025

andrewfullard commented Jan 21, 2025

Modified fast_array_util.py for faster implementation #2940

Are you sure you want to change the base?

Modified fast_array_util.py for faster implementation #2940

Conversation

Sonu0305 commented Jan 11, 2025

📝 Description

📌 Resources

🚦 Testing

☑️ Checklist

tardis-bot commented Jan 11, 2025 • edited Loading

codecov bot commented Jan 11, 2025 • edited Loading

Codecov Report

Sonu0305 commented Jan 11, 2025

tardis-bot commented Jan 11, 2025 • edited Loading

Sonu0305 commented Jan 13, 2025 • edited Loading

Sonu0305 commented Jan 14, 2025 • edited Loading

andrewfullard commented Jan 21, 2025

Sonu0305 commented Jan 21, 2025

andrewfullard commented Jan 21, 2025

tardis-bot commented Jan 11, 2025 •

edited

Loading

codecov bot commented Jan 11, 2025 •

edited

Loading

tardis-bot commented Jan 11, 2025 •

edited

Loading

Sonu0305 commented Jan 13, 2025 •

edited

Loading

Sonu0305 commented Jan 14, 2025 •

edited

Loading