You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are a couple of reasons why this is not ideal:
the optimizer might be able to fuse operations together in such a way that less memory is used for the fused operation, and
zarr arrays with large chunks in the input may not actually be used in the computation.
The first case could happen with this kind of (contrived) example:
deftest_allowed_mem_exceeded_before_optimization():
# this should pass, since c can be fused into an op that takes 700 bytes# (ones is 100, conversion to int32 is 400, conversion to int8 is 100, write is 100)spec=cubed.Spec(allowed_mem=800, reserved_mem=0)
a=xp.ones((100,), dtype=xp.int8, chunks=(100,), spec=spec)
b=xp.astype(a, xp.int32)
c=xp.astype(b, xp.int8)
c.compute()
I hit the second one when I loaded an Xarray dataset from a collection of Zarr files, some of which had very large chunk sizes (and which hit the memory limit), even though those particular variables weren't being used in the computation.
To fix this, we could move the memory checks to when the FinalizedPlan is being built (see #563), which is still before the computation is actually run.
The text was updated successfully, but these errors were encountered:
# this should pass, since c can be fused into an op that takes 700 bytes
# (ones is 100, conversion to int32 is 400, conversion to int8 is 100, write is 100)
Just for my understanding, why does this the write take another 100 bytes of RAM? Haven't you already allocated space for c, and counted that in your total already?
Just for my understanding, why does this the write take another 100 bytes of RAM? Haven't you already allocated space for c, and counted that in your total already?
Zarr uses an output buffer to write c out to a compressed file. Even though it is compressed we don't know big it is so we take 100 bytes (c.nbytes) as the upper bound.
Currently, projected memory checks are performed as the array API operations are called:
prints
(from https://cubed-dev.github.io/cubed/cubed-intro.slides.html#/11/0/0)
There are a couple of reasons why this is not ideal:
The first case could happen with this kind of (contrived) example:
I hit the second one when I loaded an Xarray dataset from a collection of Zarr files, some of which had very large chunk sizes (and which hit the memory limit), even though those particular variables weren't being used in the computation.
To fix this, we could move the memory checks to when the
FinalizedPlan
is being built (see #563), which is still before the computation is actually run.The text was updated successfully, but these errors were encountered: