Memory-mapped caching for image translation training #218

ziw-liu · 2025-01-06T19:37:55Z

#195 implemented in-RAM caching for image translation trainings. However it does not scale to datasets larger than available system memory. This PR implements a node-local disk cache via tensordict's memory-mapped tensor.

ziw-liu · 2025-01-14T18:19:33Z

Progress summary: previously training on 2.3 TB of datasets takes 20 s/iter, now training on 3 TB takes 10 s/iter.

Lessons learned:

Default local scratch configuration on our compute nodes was not optimal for sustained I/O (ZFS and very small sector size). This was fixed on select nodes and will be rolled out more broadly later.
Optimizations/mitigations done for system memory caching might hurt performance in the mmap setup. Moving some transforms back to the CPU improved end-to-end timing. This might be because MONAI transforms are not batched (executed in a loop), and CPU/GPU sync could be taking much longer than the actual compute.

To be investigated:

Moving augmentations back to the CPU recreated the CPU compute bottleneck (removing augmentations further reduces end-to-end training time to 3s/iter, or a 3x reduction). This is potentially fixable by using batched augmentation or distributing the compute better across devices.
Precomputing the normalization to simplify training-time logic (d2cd340).

ziw-liu · 2025-01-14T20:46:26Z

This might be because MONAI transforms are not batched (executed in a loop), and CPU/GPU sync could be taking much longer than the actual compute.

Benchmark of 3D random affine in 6a88ec4 (10 runs, milliseconds):

Device	MONAI (sequential)	Kornia (batched)	Relative
Zen 2 CPU (1 thread)	9160	3800	2.4
Zen 2 CPU (16 threads)	7320	556	13.2
A40 GPU	2620	210	12.5

ziw-liu added 2 commits January 3, 2025 16:02

make well selection a mixin

16704ac

wip: mmap cache data module

fb56710

ziw-liu added enhancement New feature or request translation Image translation (VS) labels Jan 6, 2025

ziw-liu changed the base branch from main to segmentation-module January 6, 2025 19:38

ziw-liu added 2 commits January 13, 2025 23:47

support exclusion of FOVs

c9a7b3e

wip: precompute normalization

d2cd340

ziw-liu added 2 commits January 14, 2025 12:35

add augmentations benchmark

2e12f00

fix cpu threads default

dd00a36

ziw-liu added 7 commits January 14, 2025 12:47

fix probability (affects cpu results)

6a88ec4

disable metadata tracking

a2f8823

fix non-distributed initialization

4d07b55

refactor transforms into submodules

bc49671

wip: bootstrap and distillation

07bd97c

wip: balance distillation loss

e05d802

re-define cropping transforms

ca8095d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory-mapped caching for image translation training #218

Memory-mapped caching for image translation training #218

ziw-liu commented Jan 6, 2025

ziw-liu commented Jan 14, 2025 •

edited

Loading

ziw-liu commented Jan 14, 2025 •

edited

Loading

Memory-mapped caching for image translation training #218

Are you sure you want to change the base?

Memory-mapped caching for image translation training #218

Conversation

ziw-liu commented Jan 6, 2025

ziw-liu commented Jan 14, 2025 • edited Loading

ziw-liu commented Jan 14, 2025 • edited Loading

ziw-liu commented Jan 14, 2025 •

edited

Loading

ziw-liu commented Jan 14, 2025 •

edited

Loading