Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integer overflow errors in flagging code. #96

Open
tmolteno opened this issue Dec 16, 2024 · 5 comments
Open

Integer overflow errors in flagging code. #96

tmolteno opened this issue Dec 16, 2024 · 5 comments

Comments

@tmolteno
Copy link

  • Tricolour version:

Current master. Also in Ben's branch for 0.2.0

  • Python version:

3.12

  • Operating System:

Debian

Description

The calculation of averaged channels casts nchan, average_freq and friends to the smallest possible integer. The calculation of the averaged_channels then fails. This is particularly evident if nchan = 255 which is cast to uint8. This then will overflow causing issues:

>       averaged_channels = ((nchan + self.average_freq - 1) //
                             self.average_freq)
E       OverflowError: Python integer 345 out of bounds for uint8

tricolour/flagging.py:1320: OverflowError

Describe what you were trying to get done.
Tell us what happened, what went wrong, and what you expected to happen.

What I Did

Run the testharness, or run tricolour on the test measurement set (1519747221.subset.ms)

trivenv) tim@dibbler:~/github/tricolour$ tricolour ~/astro/1519747221.subset.ms/
tricolour - 2024-12-16 12:32:41,801 INFO - 
*******************************************************************************
                  _______   _           _
                 |__   __| (_)         | |
                    | |_ __ _  ___ ___ | | ___  _   _ _ __
                    | | '__| |/ __/ _ \| |/ _ \| | | | '__|
                    | | |  | | (_| (_) | | (_) | |_| | |
                    |_|_|  |_|\___\___/|_|\___/ \__,_|_|

Viva la révolution!

tricolour - 2024-12-16 12:32:41,802 INFO - Flagging on the DATA column
tricolour.mask - 2024-12-16 12:32:41,802 INFO - Looking for static masks...
tricolour.mask - 2024-12-16 12:32:41,803 INFO - Searching /etc/tricolour
tricolour.mask - 2024-12-16 12:32:41,803 INFO - Searching /home/tim/github/tricolour/trivenv/etc/tricolour
tricolour.mask - 2024-12-16 12:32:41,803 INFO - Searching /home/tim/.config/tricolour
tricolour.mask - 2024-12-16 12:32:41,804 INFO - Searching /home/tim/github/tricolour/tricolour/data
tricolour.mask - 2024-12-16 12:32:41,804 INFO - Found static mask file /home/tim/github/tricolour/tricolour/data/4k_lband_meerkat.staticmask
tricolour.mask - 2024-12-16 12:32:41,804 INFO - Found static mask file /home/tim/github/tricolour/tricolour/data/4k_uhfband_meerkat.staticmask
tricolour.mask - 2024-12-16 12:32:41,809 INFO - Loaded mask /home/tim/github/tricolour/tricolour/data/4k_lband_meerkat.staticmask (non-dilated) with 41.50% flagged bandwidth between 0.856 and 1.712 GHz
tricolour.mask - 2024-12-16 12:32:41,811 INFO - Loaded mask /home/tim/github/tricolour/tricolour/data/4k_uhfband_meerkat.staticmask (non-dilated) with 4.64% flagged bandwidth between 0.544 and 1.088 GHz
tricolour - 2024-12-16 12:32:41,812 INFO - *****************************************
tricolour - 2024-12-16 12:32:41,812 INFO - The following strategies will be applied:
tricolour - 2024-12-16 12:32:41,812 INFO - *****************************************
tricolour - 2024-12-16 12:32:41,813 INFO - 0: flag_nans_zeros (nan_dropouts_flag)
tricolour - 2024-12-16 12:32:41,813 INFO - 1: apply_static_mask (background_static_mask)
tricolour - 2024-12-16 12:32:41,813 INFO -      accumulation_mode: or
tricolour - 2024-12-16 12:32:41,813 INFO -      uvrange: 
tricolour - 2024-12-16 12:32:41,813 INFO - 2: sum_threshold (background_flags)
tricolour - 2024-12-16 12:32:41,814 INFO -      outlier_nsigma: 10
tricolour - 2024-12-16 12:32:41,814 INFO -      windows_time: [1, 2, 4, 8]
tricolour - 2024-12-16 12:32:41,814 INFO -      windows_freq: [1, 2, 4, 8]
tricolour - 2024-12-16 12:32:41,814 INFO -      background_reject: 2.0
tricolour - 2024-12-16 12:32:41,815 INFO -      background_iterations: 5
tricolour - 2024-12-16 12:32:41,815 INFO -      spike_width_time: 12.5
tricolour - 2024-12-16 12:32:41,815 INFO -      spike_width_freq: 10.0
tricolour - 2024-12-16 12:32:41,815 INFO -      time_extend: 3
tricolour - 2024-12-16 12:32:41,815 INFO -      freq_extend: 3
tricolour - 2024-12-16 12:32:41,816 INFO -      freq_chunks: 10
tricolour - 2024-12-16 12:32:41,816 INFO -      average_freq: 1
tricolour - 2024-12-16 12:32:41,816 INFO -      flag_all_time_frac: 0.6
tricolour - 2024-12-16 12:32:41,816 INFO -      flag_all_freq_frac: 0.8
tricolour - 2024-12-16 12:32:41,817 INFO -      rho: 1.3
tricolour - 2024-12-16 12:32:41,817 INFO -      num_major_iterations: 5
tricolour - 2024-12-16 12:32:41,817 INFO - 3: uvcontsub_flagger (residual_flag_initial)
tricolour - 2024-12-16 12:32:41,817 INFO -      major_cycles: 7
tricolour - 2024-12-16 12:32:41,817 INFO -      or_original_from_cycle: 1
tricolour - 2024-12-16 12:32:41,818 INFO -      taylor_degrees: 20
tricolour - 2024-12-16 12:32:41,818 INFO -      sigma: 15.0
tricolour - 2024-12-16 12:32:41,818 INFO - 4: flag_nans_zeros (nan_dropouts_reflag)
tricolour - 2024-12-16 12:32:41,818 INFO - 5: apply_static_mask (uvrange_static_mask)
tricolour - 2024-12-16 12:32:41,818 INFO -      accumulation_mode: or
tricolour - 2024-12-16 12:32:41,818 INFO -      uvrange: 0~550
tricolour - 2024-12-16 12:32:41,818 INFO - 6: sum_threshold (final_st_very_broad)
tricolour - 2024-12-16 12:32:41,819 INFO -      outlier_nsigma: 10
tricolour - 2024-12-16 12:32:41,819 INFO -      windows_time: [1, 2, 4, 8]
tricolour - 2024-12-16 12:32:41,819 INFO -      windows_freq: [32, 48, 64, 128]
tricolour - 2024-12-16 12:32:41,819 INFO -      background_reject: 2.0
tricolour - 2024-12-16 12:32:41,819 INFO -      background_iterations: 5
tricolour - 2024-12-16 12:32:41,819 INFO -      spike_width_time: 6.5
tricolour - 2024-12-16 12:32:41,819 INFO -      spike_width_freq: 64.0
tricolour - 2024-12-16 12:32:41,819 INFO -      time_extend: 3
tricolour - 2024-12-16 12:32:41,820 INFO -      freq_extend: 3
tricolour - 2024-12-16 12:32:41,820 INFO -      freq_chunks: 10
tricolour - 2024-12-16 12:32:41,820 INFO -      average_freq: 1
tricolour - 2024-12-16 12:32:41,820 INFO -      flag_all_time_frac: 0.6
tricolour - 2024-12-16 12:32:41,820 INFO -      flag_all_freq_frac: 0.8
tricolour - 2024-12-16 12:32:41,820 INFO -      rho: 1.3
tricolour - 2024-12-16 12:32:41,820 INFO -      num_major_iterations: 1
tricolour - 2024-12-16 12:32:41,821 INFO - 7: sum_threshold (final_st_broad)
tricolour - 2024-12-16 12:32:41,821 INFO -      outlier_nsigma: 10
tricolour - 2024-12-16 12:32:41,821 INFO -      windows_time: [1, 2, 4, 8]
tricolour - 2024-12-16 12:32:41,821 INFO -      windows_freq: [1, 2, 4, 8]
tricolour - 2024-12-16 12:32:41,821 INFO -      background_reject: 2.0
tricolour - 2024-12-16 12:32:41,821 INFO -      background_iterations: 5
tricolour - 2024-12-16 12:32:41,821 INFO -      spike_width_time: 6.5
tricolour - 2024-12-16 12:32:41,822 INFO -      spike_width_freq: 10.0
tricolour - 2024-12-16 12:32:41,822 INFO -      time_extend: 3
tricolour - 2024-12-16 12:32:41,822 INFO -      freq_extend: 3
tricolour - 2024-12-16 12:32:41,822 INFO -      freq_chunks: 10
tricolour - 2024-12-16 12:32:41,822 INFO -      average_freq: 1
tricolour - 2024-12-16 12:32:41,822 INFO -      flag_all_time_frac: 0.6
tricolour - 2024-12-16 12:32:41,822 INFO -      flag_all_freq_frac: 0.8
tricolour - 2024-12-16 12:32:41,822 INFO -      rho: 1.3
tricolour - 2024-12-16 12:32:41,823 INFO -      num_major_iterations: 1
tricolour - 2024-12-16 12:32:41,823 INFO - 8: sum_threshold (final_st_narrow)
tricolour - 2024-12-16 12:32:41,823 INFO -      outlier_nsigma: 10
tricolour - 2024-12-16 12:32:41,823 INFO -      windows_time: [1, 2, 4, 8]
tricolour - 2024-12-16 12:32:41,823 INFO -      windows_freq: [1, 2, 4, 8]
tricolour - 2024-12-16 12:32:41,823 INFO -      background_reject: 2.0
tricolour - 2024-12-16 12:32:41,823 INFO -      background_iterations: 5
tricolour - 2024-12-16 12:32:41,824 INFO -      spike_width_time: 2
tricolour - 2024-12-16 12:32:41,824 INFO -      spike_width_freq: 10.0
tricolour - 2024-12-16 12:32:41,824 INFO -      time_extend: 3
tricolour - 2024-12-16 12:32:41,824 INFO -      freq_extend: 3
tricolour - 2024-12-16 12:32:41,824 INFO -      freq_chunks: 10
tricolour - 2024-12-16 12:32:41,824 INFO -      average_freq: 1
tricolour - 2024-12-16 12:32:41,824 INFO -      flag_all_time_frac: 0.6
tricolour - 2024-12-16 12:32:41,824 INFO -      flag_all_freq_frac: 0.8
tricolour - 2024-12-16 12:32:41,825 INFO -      rho: 1.3
tricolour - 2024-12-16 12:32:41,825 INFO -      num_major_iterations: 1
tricolour - 2024-12-16 12:32:41,825 INFO - 9: uvcontsub_flagger (residual_flag_final)
tricolour - 2024-12-16 12:32:41,825 INFO -      major_cycles: 10
tricolour - 2024-12-16 12:32:41,825 INFO -      or_original_from_cycle: 0
tricolour - 2024-12-16 12:32:41,825 INFO -      taylor_degrees: 25
tricolour - 2024-12-16 12:32:41,825 INFO -      sigma: 13.0
tricolour - 2024-12-16 12:32:41,826 INFO - 10: flag_autos (flag_autos)
tricolour - 2024-12-16 12:32:41,826 INFO - 11: combine_with_input_flags (combine_with_input_flags)
tricolour - 2024-12-16 12:32:41,826 INFO - ***************** END ********************
tricolour - 2024-12-16 12:32:41,826 INFO - Flagging per correlation ('standard' mode)
tricolour - 2024-12-16 12:32:42,986 INFO - Only considering scans '97, 98, 99, 94, 95' as per user selection criterion
tricolour - 2024-12-16 12:32:42,986 INFO - Adding field '3C286' scan 94 to compute graph for processing
tricolour - 2024-12-16 12:32:42,986 CRITICAL - You requested to flag per correlation, but not on residuals. This is not advisable and the flagger may mistake fringes of off-axis sources for broadband RFI.
tricolour - 2024-12-16 12:32:43,954 INFO - Adding field 'PKS1934-63' scan 95 to compute graph for processing
tricolour - 2024-12-16 12:32:43,955 CRITICAL - You requested to flag per correlation, but not on residuals. This is not advisable and the flagger may mistake fringes of off-axis sources for broadband RFI.
tricolour - 2024-12-16 12:32:44,127 INFO - Adding field '3C286' scan 97 to compute graph for processing
tricolour - 2024-12-16 12:32:44,128 CRITICAL - You requested to flag per correlation, but not on residuals. This is not advisable and the flagger may mistake fringes of off-axis sources for broadband RFI.
tricolour - 2024-12-16 12:32:44,292 INFO - Adding field 'PKS1934-63' scan 98 to compute graph for processing
tricolour - 2024-12-16 12:32:44,292 CRITICAL - You requested to flag per correlation, but not on residuals. This is not advisable and the flagger may mistake fringes of off-axis sources for broadband RFI.
tricolour - 2024-12-16 12:32:44,464 INFO - Adding field '3C286' scan 99 to compute graph for processing
tricolour - 2024-12-16 12:32:44,465 CRITICAL - You requested to flag per correlation, but not on residuals. This is not advisable and the flagger may mistake fringes of off-axis sources for broadband RFI.
[########################                ] | 60% Completed | 2.08 snchan: 4096
average_freq: 1
[########################                ] | 60% Completed | 2.18 s
Unexpected error. Dropping you into pdb for a post-mortem.
Traceback (most recent call last):
  File "/home/tim/github/tricolour/trivenv/bin/tricolour", line 6, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/tim/github/tricolour/tricolour/apps/tricolour/app.py", line 257, in main
    _main(args)
  File "/home/tim/github/tricolour/tricolour/apps/tricolour/app.py", line 500, in _main
    _, original_stats, final_stats = dask.compute(write_computes,
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tim/github/tricolour/trivenv/lib/python3.12/site-packages/dask/base.py", line 660, in compute
    results = schedule(dsk, keys, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tim/github/tricolour/tricolour/flagging.py", line 1183, in sum_threshold_flagger
    averaged_channels = (nchan + average_freq - 1) // average_freq
                         ~~~~~~^~~~~~~~~~~~~~
OverflowError: Python integer 4096 out of bounds for uint8
> /home/tim/github/tricolour/tricolour/flagging.py(1183)sum_threshold_flagger()
-> averaged_channels = (nchan + average_freq - 1) // average_freq
(Pdb) nchan: 4096

@bennahugo
Copy link
Collaborator

bennahugo commented Dec 16, 2024 via email

tmolteno added a commit to tmolteno/tricolour that referenced this issue Dec 16, 2024
@tmolteno
Copy link
Author

I think it is expected behaviour. Try the following in your version and see.

>>> import numpy as np
>>> x = np.array(255, np.uint8)
>>> y = np.array(1, np.uint8)
>>> x + y

It returns zero on 3.12

@tmolteno
Copy link
Author

tmolteno commented Dec 16, 2024

Have a look the following

I'm on numpy 1.26.4.

>>> import numpy as np
>>> x = np.array(255, np.uint8)
>>> y = np.array(1, np.uint8)
>>> x + y
0
>>> np.__version__
'1.26.4'
>>> 

Also on more recent versions

>>> import numpy as np
>>> np.__version__
'2.0.2'
>>> x = np.array(255, np.uint8)
>>> y = np.array(1, np.uint8)
>>> x + y
np.uint8(0)

@bennahugo
Copy link
Collaborator

bennahugo commented Dec 16, 2024 via email

@tmolteno
Copy link
Author

Not sure this will always work in this case. The 1.26 I used above did not have this PEP in place. Have you confirmed that the sample code produces 256 rather than 0 in 1.23.5?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants