-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
52 GB scan arrays create NUMA issues #25
Comments
On a single small 32k scan (32k1934.ms.2min.ms) the difference is not large (this is on stevie whose disk may be contested). xarray-ms master: tricolour - 2019-06-24 11:13:47,654 INFO - Data flagged successfully in 00h07m37s xarray-ms 0.1.6: tricolour - 2019-06-24 11:25:57,847 INFO - Data flagged successfully in 00h05m51s I'll see what happens on a larger dataset |
On my laptop with a 3 scan 4K dataset (1527016443_sdp_l0.full_1284.hh_vv.ms) the difference is neglible: xarray-ms master: tricolour - 2019-06-24 10:51:58,933 INFO - Data flagged successfully in 00h10m33s xarray-ms 0.1.6 tricolour - 2019-06-24 11:51:31,495 INFO - Data flagged successfully in 00h10m31s |
Working hypothesis: 52 GB scans are being created which are too big to fit on a single DIMM, leading to unnecessary inter-CPU commuication. Solution: Allocate per-baseline arrays. |
Spot the non-linearity:
|
Can't reproduce. See my email. Are you sure it is not just the page caches
playing heavoc with you?
…On Mon, 24 Jun 2019, 17:08 Simon Perkins, ***@***.***> wrote:
Spot the non-linearity:
# 7.45 GB
In [11]: %timeit -n 1 -r 1 np.ones(int(1e9), dtype=np.complex64)
1 loop, best of 1: 2.1 s per loop
# 14.9 GB
In [12]: %timeit -n 1 -r 1 np.ones(int(2e9), dtype=np.complex64)
1 loop, best of 1: 4.22 s per loop
# 29.8 GB
In [13]: %timeit -n 1 -r 1 np.ones(int(4e9), dtype=np.complex64)
1 loop, best of 1: 2min 42s per loop
Things seem to get much slower above the 16GB power of 2.
# 16.3GB
In [23]: %timeit -n 1 -r 1 np.ones(int(2.2e9), dtype=np.complex64)
1 loop, best of 1: 49.5 s per loop
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#25>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AB4RE6U6VJBYWVYG42V7DXDP4DPQRANCNFSM4H24ETLQ>
.
|
Where can't you reproduce this? The above timings are on stevie where the problem was reported. |
This is the OS default without NUMA pinning. |
Are you just doing a malloc mallocs and zero'd mallocs are fast, its filling the array where we see the slowdown. Anyhow, now that the dead Cubical processes have been cleared out on stevie things have improved. I suspect that the Cubical shared memory model was holding onto pages and might have provoked swapping
|
I made a mistake, allocated with ones and then filled with ones again so duplicated work. Anyway, the allocation and fill is faster now. |
Its malloc and memset with np.zeros.
…On Thu, 27 Jun 2019, 22:36 Simon Perkins, ***@***.***> wrote:
I made a mistake, allocated with ones and then filled with ones again so
duplicated work. Anyway, the allocation and fill is faster now.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#25>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AB4RE6Q6YWZ4ZIHPIDKC3G3P4UQF5ANCNFSM4H24ETLQ>
.
|
Currently, only a single thread allocates all memory for a window. It's pretty slow. |
See further description of related issues in #36 |
@sjperkins master now goes nowhere fast.
The text was updated successfully, but these errors were encountered: