Check projected memory does not exceed allowed when finalizing plan #568

tomwhite · 2024-09-04T16:15:08Z

Currently, projected memory checks are performed as the array API operations are called:

import cubed
spec = cubed.Spec(work_dir="tmp", allowed_mem=100)  # not enough memory!
a = xp.asarray([[1, 2, 3], [4, 5, 6], [7, 8, 9]], chunks=(2, 2), spec=spec)
b = xp.asarray([[1, 1, 1], [1, 1, 1], [1, 1, 1]], chunks=(2, 2), spec=spec)
try:
    c = xp.add(a, b)
except ValueError as e:
    print(e)

prints

Projected blockwise memory (192) exceeds allowed_mem (100), including reserved_mem (0)

(from https://cubed-dev.github.io/cubed/cubed-intro.slides.html#/11/0/0)

There are a couple of reasons why this is not ideal:

the optimizer might be able to fuse operations together in such a way that less memory is used for the fused operation, and
zarr arrays with large chunks in the input may not actually be used in the computation.

The first case could happen with this kind of (contrived) example:

def test_allowed_mem_exceeded_before_optimization():
    # this should pass, since c can be fused into an op that takes 700 bytes
    # (ones is 100, conversion to int32 is 400, conversion to int8 is 100, write is 100)
    spec = cubed.Spec(allowed_mem=800, reserved_mem=0)
    a = xp.ones((100,), dtype=xp.int8, chunks=(100,), spec=spec)
    b = xp.astype(a, xp.int32)
    c = xp.astype(b, xp.int8)
    c.compute()

I hit the second one when I loaded an Xarray dataset from a collection of Zarr files, some of which had very large chunk sizes (and which hit the memory limit), even though those particular variables weren't being used in the computation.

To fix this, we could move the memory checks to when the FinalizedPlan is being built (see #563), which is still before the computation is actually run.

The text was updated successfully, but these errors were encountered:

TomNicholas · 2024-09-06T17:02:22Z

# this should pass, since c can be fused into an op that takes 700 bytes
# (ones is 100, conversion to int32 is 400, conversion to int8 is 100, write is 100)

Just for my understanding, why does this the write take another 100 bytes of RAM? Haven't you already allocated space for c, and counted that in your total already?

tomwhite · 2024-09-09T08:01:10Z

Just for my understanding, why does this the write take another 100 bytes of RAM? Haven't you already allocated space for c, and counted that in your total already?

Zarr uses an output buffer to write c out to a compressed file. Even though it is compressed we don't know big it is so we take 100 bytes (c.nbytes) as the upper bound.

tomwhite added optimization planning memory labels Sep 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check projected memory does not exceed allowed when finalizing plan #568

Check projected memory does not exceed allowed when finalizing plan #568

tomwhite commented Sep 4, 2024

TomNicholas commented Sep 6, 2024 •

edited

Loading

tomwhite commented Sep 9, 2024

Check projected memory does not exceed allowed when finalizing plan #568

Check projected memory does not exceed allowed when finalizing plan #568

Comments

tomwhite commented Sep 4, 2024

TomNicholas commented Sep 6, 2024 • edited Loading

tomwhite commented Sep 9, 2024

TomNicholas commented Sep 6, 2024 •

edited

Loading