Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modified fast_array_util.py for faster implementation #2940

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

Sonu0305
Copy link
Contributor

📝 Description

Type: 🪲 bugfix | 🚀 feature

numba_cumulative_trapezoid method modified to integrate using a manual for loop.

Fixes #2757

📌 Resources

Examples, notebooks, and links to useful references.

🚦 Testing

How did you test these changes?

  • Testing pipeline
  • Other method (describe)
  • My changes can't be tested (explain why)

I have tested on a random data, I have compared both, the original and the optimized implementation, Here is the snippet:
The snippets show that the values of outputs with both methods is exactly same even upto high precision and also the optimised implementation is more than ~50% faster.
image
image

☑️ Checklist

  • I requested two reviewers for this pull request
  • I updated the documentation according to my changes
  • I built the documentation by applying the build_docs label

Note: If you are not allowed to perform any of these actions, ping (@) a contributor.

@Sonu0305 Sonu0305 changed the title Fix#2757 Modified fast_array_util.py for faster implementations Jan 11, 2025
@Sonu0305 Sonu0305 changed the title Modified fast_array_util.py for faster implementations Modified fast_array_util.py for faster implementation Jan 11, 2025
@tardis-bot
Copy link
Contributor

tardis-bot commented Jan 11, 2025

*beep* *bop*
Hi human,
I ran ruff on the latest commit (6922871).
Here are the outputs produced.
Results can also be downloaded as artifacts here.
Summarised output:

Complete output(might be large):

All checks passed!

Copy link

codecov bot commented Jan 11, 2025

Codecov Report

Attention: Patch coverage is 0% with 6 lines in your changes missing coverage. Please review.

Project coverage is 70.02%. Comparing base (2a06fdf) to head (6922871).
Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
.../properties/continuum_processes/fast_array_util.py 0.00% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2940      +/-   ##
==========================================
- Coverage   70.39%   70.02%   -0.38%     
==========================================
  Files         222      222              
  Lines       16157    16161       +4     
==========================================
- Hits        11373    11316      -57     
- Misses       4784     4845      +61     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Sonu0305
Copy link
Contributor Author

Hi @andrewfullard @atharva-2001 @jvshields,
Could you please review when you have a chance, Thank You.

@tardis-bot
Copy link
Contributor

tardis-bot commented Jan 11, 2025

*beep* *bop*
Hi human,
I ran benchmarks as you asked comparing master (2a06fdf) and the latest commit (6922871).
Here are the logs produced by ASV.
Results can also be downloaded as artifacts here.

Significantly changed benchmarks:

All benchmarks:

Benchmarks that have stayed the same:

| Change   | Before [5e2d0bb3] <master>   | After [6922871f]    | Ratio   | Benchmark (Parameter)                                                                                                               |
|----------|------------------------------|---------------------|---------|-------------------------------------------------------------------------------------------------------------------------------------|
|          | 3.23±0.4μs                   | 4.01±0.4μs          | ~1.24   | transport_montecarlo_vpacket.BenchmarkMontecarloMontecarloNumbaVpacket.time_trace_vpacket_within_shell                              |
|          | 3.87±0.02ms                  | 4.54±0.08ms         | ~1.17   | opacities_opacity_state.BenchmarkOpacitiesOpacityState.time_opacity_state_initialize('macroatom')                                   |
|          | 3.40±0.4μs                   | 3.84±0.4μs          | ~1.13   | transport_montecarlo_vpacket.BenchmarkMontecarloMontecarloNumbaVpacket.time_trace_bad_vpacket                                       |
|          | 6.69±0.9μs                   | 7.37±1μs            | ~1.10   | transport_montecarlo_vpacket.BenchmarkMontecarloMontecarloNumbaVpacket.time_trace_vpacket                                           |
|          | 2.68±3μs                     | 2.26±1μs            | ~0.84   | transport_montecarlo_estimators_radfield_estimator_calcs.BenchmarkMontecarloMontecarloNumbaPacket.time_update_line_estimators       |
|          | 28.2±8μs                     | 22.0±5μs            | ~0.78   | transport_montecarlo_packet_trackers.BenchmarkTransportMontecarloPacketTrackers.time_generate_rpacket_last_interaction_tracker_list |
|          | 2.62±0.5ms                   | 2.82±0.5ms          | 1.08    | transport_montecarlo_single_packet_loop.BenchmarkTransportMontecarloSinglePacketLoop.time_single_packet_loop                        |
|          | 63.8±0.2ms                   | 67.1±0.03ms         | 1.05    | transport_montecarlo_packet_trackers.BenchmarkTransportMontecarloPacketTrackers.time_rpacket_trackers_to_dataframe                  |
|          | 531±100ns                    | 551±200ns           | 1.04    | opacities_opacity.BenchmarkMontecarloMontecarloNumbaOpacities.time_pair_creation_opacity_calculation                                |
|          | 38.7±0.04s                   | 39.1±0.02s          | 1.01    | run_tardis.BenchmarkRunTardis.time_run_tardis                                                                                       |
|          | 41.5±30μs                    | 41.8±20μs           | 1.01    | transport_montecarlo_interaction.BenchmarkTransportMontecarloInteraction.time_line_emission                                         |
|          | 1.05±0m                      | 1.05±0m             | 1.00    | run_tardis.BenchmarkRunTardis.time_run_tardis_rpacket_tracking                                                                      |
|          | 2.08±0m                      | 2.08±0m             | 1.00    | spectrum_formal_integral.BenchmarkTransportMontecarloFormalIntegral.time_FormalIntegrator_functions                                 |
|          | 205±0.4ns                    | 205±0.07ns          | 1.00    | spectrum_formal_integral.BenchmarkTransportMontecarloFormalIntegral.time_intensity_black_body                                       |
|          | 1.68±0.02ms                  | 1.68±0.01ms         | 1.00    | transport_montecarlo_main_loop.BenchmarkTransportMontecarloMontecarloMainLoop.time_montecarlo_main_loop                             |
|          | 1.21±0.01μs                  | 1.20±0μs            | 0.99    | transport_geometry_calculate_distances.BenchmarkTransportGeometryCalculateDistances.time_calculate_distance_boundary                |
|          | 37.9±0.08μs                  | 37.5±0.2μs          | 0.99    | transport_montecarlo_packet_trackers.BenchmarkTransportMontecarloPacketTrackers.time_generate_rpacket_tracker_list                  |
|          | 531±200ns                    | 521±100ns           | 0.98    | opacities_opacity.BenchmarkMontecarloMontecarloNumbaOpacities.time_compton_opacity_calculation                                      |
|          | 2.79±0ms                     | 2.72±0.01ms         | 0.98    | opacities_opacity_state.BenchmarkOpacitiesOpacityState.time_opacity_state_initialize('scatter')                                     |
|          | 561±200ns                    | 541±200ns           | 0.96    | opacities_opacity.BenchmarkMontecarloMontecarloNumbaOpacities.time_photoabsorption_opacity_calculation                              |
|          | 1.63±0.4μs                   | 1.57±0.4μs          | 0.96    | transport_geometry_calculate_distances.BenchmarkTransportGeometryCalculateDistances.time_calculate_distance_line                    |
|          | 45.4±30μs                    | 43.7±30μs           | 0.96    | transport_montecarlo_interaction.BenchmarkTransportMontecarloInteraction.time_line_scatter                                          |
|          | 758±2ns                      | 723±0.6ns           | 0.95    | transport_montecarlo_interaction.BenchmarkTransportMontecarloInteraction.time_thomson_scatter                                       |
|          | 7.67±2μs                     | 7.25±2μs            | 0.95    | transport_montecarlo_vpacket.BenchmarkMontecarloMontecarloNumbaVpacket.time_trace_vpacket_volley                                    |

If you want to see the graph of the results, you can check it here

@Sonu0305
Copy link
Contributor Author

Sonu0305 commented Jan 13, 2025

Hi @andrewfullard ,
The tests that are failing seem to be unrelated to the changes that i made, what should I do?
Two tests are failing not continuum ubuntu-latest and not continuum macos-latest.
I realised that this is I think because of #2800 being worked upon, it will all pass once it is merged, Thank You.

@Sonu0305
Copy link
Contributor Author

Sonu0305 commented Jan 14, 2025

Hi @andrewfullard @KasukabeDefenceForce ,
The tests are successfully passing now, please review when you have a chance, Thank You.

@andrewfullard
Copy link
Contributor

Please demonstrate the speedup using timeit. Consider the impact of numba compile time.

@Sonu0305
Copy link
Contributor Author

Please demonstrate the speedup using timeit. Consider the impact of numba compile time.

@andrewfullard ,
My script for testing this changes is,

test_fast.py:

# It is currently not possible to use scipy.integrate.cumulative_trapezoid in
# numba. So here is my own implementation.
import numpy as np
from numba import njit, prange
import timeit
from tardis.transport.montecarlo import njit_dict


@njit(**njit_dict)
def numba_cumulative_trapezoid(f, x):
    """
    Cumulatively integrate f(x) using the composite trapezoidal rule.

    Parameters
    ----------
    f : numpy.ndarray, dtype float
        Input array to integrate.
    x : numpy.ndarray, dtype float
        The coordinate to integrate along.

    Returns
    -------
    numpy.ndarray, dtype float
        The result of cumulative integration of f along x
    """
    integ = (np.diff(x) * (f[1:] + f[:-1]) / 2.0).cumsum()
    return integ / integ[-1]


@njit(**njit_dict)
def cumulative_integrate_array_by_blocks(f, x, block_references):
    """
    Cumulatively integrate a function over blocks.

    This function cumulatively integrates a function `f` defined at
    locations `x` over blocks given in `block_references`.

    Parameters
    ----------
    f : numpy.ndarray, dtype float
        Input array to integrate. Shape is (N_freq, N_shells), where
        N_freq is the number of frequency values and N_shells is the number
        of computational shells.
    x : numpy.ndarray, dtype float
        The sample points corresponding to the `f` values. Shape is (N_freq,).
    block_references : numpy.ndarray, dtype int
        The start indices of the blocks to be integrated. Shape is (N_blocks,).

    Returns
    -------
    numpy.ndarray, dtype float
        Array with cumulatively integrated values. Shape is (N_freq, N_shells)
        same as f.
    """
    n_rows = len(block_references) - 1
    integrated = np.zeros_like(f)
    for i in prange(f.shape[1]):  # columns
        # TODO: Avoid this loop through vectorization of cumulative_trapezoid
        for j in prange(n_rows):  # rows
            start = block_references[j]
            stop = block_references[j + 1]
            integrated[start + 1 : stop, i] = numba_cumulative_trapezoid(
                f[start:stop, i], x[start:stop]
            )
    return integrated

@njit(**njit_dict)
def numba_cumulative_trapezoid2(f, x):
    """
    Cumulatively integrate f(x) using the composite trapezoidal rule with manual loop.
    """
    n = len(f)
    integ = np.zeros(n, dtype=np.float64)
    for i in range(1, n):
        dx = x[i] - x[i-1]
        integ[i] = integ[i-1] + (f[i] + f[i-1]) * dx / 2.0
    return integ / integ[-1]

@njit(**njit_dict)
def cumulative_integrate_array_by_blocks2(f, x, block_references):
    """
    Cumulatively integrate a function over blocks using a manual loop-based trapezoidal rule.
    """
    n_blocks = len(block_references) - 1
    integrated = np.zeros_like(f)
    
    for i in prange(f.shape[1]):
        for j in prange(n_blocks):
            start = block_references[j]
            stop = block_references[j + 1]
            integrated[start:stop, i] = numba_cumulative_trapezoid2(f[start:stop, i], x[start:stop])
            
    return integrated

N_freq = 100000
N_shells = 1000
N_blocks = 30

x = np.linspace(0, 10, N_freq)
f = np.random.random((N_freq, N_shells))
block_references = np.linspace(0, N_freq, N_blocks + 1, dtype=int)

cumulative_integrate_array_by_blocks(f, x, block_references)
cumulative_integrate_array_by_blocks2(f, x, block_references)

original_time = timeit.timeit(
    'cumulative_integrate_array_by_blocks(f, x, block_references)',
    globals=globals(),
    number=10 
)

optimized_time = timeit.timeit(
    'cumulative_integrate_array_by_blocks2(f, x, block_references)',
    globals=globals(),
    number=10
)

print(f"Original implementation average time: {original_time / 10:.4f} seconds")
print(f"Optimized implementation average time: {optimized_time / 10:.4f} seconds")

original_result = cumulative_integrate_array_by_blocks(f, x, block_references)
optimized_result = cumulative_integrate_array_by_blocks2(f, x, block_references)
assert np.allclose(original_result, optimized_result), "Results do not match!"
print("Both implementations give the same result.")

print("Original result (first 10 values):", original_result[:10,:10])
print("Optimized result (first 10 values):", optimized_result[:10,:10])

Output for running my test script is:

Original implementation average time: 3.0688 seconds
Optimized implementation average time: 2.8569 seconds
Both implementations give the same result.
Original result (first 10 values): [[0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.        ]
 [0.00021059 0.00014415 0.00035398 0.00019182 0.00019879 0.00037928
  0.00023938 0.00032636 0.00030593 0.00035552]
 [0.00035765 0.00041922 0.0007794  0.00043033 0.00054467 0.00085113
  0.00035781 0.00068471 0.0005169  0.00071021]
 [0.00070914 0.00068628 0.00123511 0.00076725 0.00087767 0.00110585
  0.00049022 0.00107013 0.00080128 0.00113117]
 [0.00097978 0.00093434 0.00170531 0.00111699 0.00098203 0.00141127
  0.00080311 0.00154914 0.00104561 0.00134419]
 [0.00127825 0.00107934 0.00217531 0.00152452 0.00106555 0.00171593
  0.00109628 0.0019187  0.00114761 0.0015953 ]
 [0.0017717  0.00108791 0.00254289 0.00193694 0.00120367 0.00189517
  0.00121426 0.00214087 0.00123347 0.00182233]
 [0.00207377 0.00124733 0.00272775 0.00228858 0.00145503 0.00221285
  0.00151144 0.00233141 0.00126556 0.00191921]
 [0.00241064 0.0016145  0.00292188 0.00261326 0.00173571 0.0025099
  0.00177101 0.00268684 0.00154277 0.00201923]
 [0.00296984 0.00201482 0.00324683 0.00291908 0.00198809 0.00276275
  0.00196604 0.00317937 0.00204687 0.00223908]]
Optimized result (first 10 values): [[0.         0.         0.         0.         0.         0.
  0.         0.         0.         0.        ]
 [0.00021059 0.00014415 0.00035398 0.00019182 0.00019879 0.00037928
  0.00023938 0.00032636 0.00030593 0.00035552]
 [0.00035765 0.00041922 0.0007794  0.00043033 0.00054467 0.00085113
  0.00035781 0.00068471 0.0005169  0.00071021]
 [0.00070914 0.00068628 0.00123511 0.00076725 0.00087767 0.00110585
  0.00049022 0.00107013 0.00080128 0.00113117]
 [0.00097978 0.00093434 0.00170531 0.00111699 0.00098203 0.00141127
  0.00080311 0.00154914 0.00104561 0.00134419]
 [0.00127825 0.00107934 0.00217531 0.00152452 0.00106555 0.00171593
  0.00109628 0.0019187  0.00114761 0.0015953 ]
 [0.0017717  0.00108791 0.00254289 0.00193694 0.00120367 0.00189517
  0.00121426 0.00214087 0.00123347 0.00182233]
 [0.00207377 0.00124733 0.00272775 0.00228858 0.00145503 0.00221285
  0.00151144 0.00233141 0.00126556 0.00191921]
 [0.00241064 0.0016145  0.00292188 0.00261326 0.00173571 0.0025099
  0.00177101 0.00268684 0.00154277 0.00201923]
 [0.00296984 0.00201482 0.00324683 0.00291908 0.00198809 0.00276275
  0.00196604 0.00317937 0.00204687 0.00223908]]

@andrewfullard
Copy link
Contributor

So the speedup is ~7% then. I think you should revisit your solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make better implementations for `fast_array_util.py
3 participants