Improved memory profiling, new features, bugfixes
jaltmayerpizzorno
released this
04 Oct 18:26
·
1341 commits
to master
since this release
Overhauled memory attribution logic:
- uses Python's custom memory management APIs to efficiently disambiguate native vs. Python memory allocations, supplanting the prior approach that employed periodic call stack sampling.
- performs immediate lookup of the location in source code responsible for allocation/deallocation, reducing the "smearing" effect in attributions previously caused by delayed attribution.
- computes average memory consumption (rather than total) for each line of code (using the novel technique of "one-shot" tracing); lines executed many times no longer appear to have consumed large amounts of memory.
- no longer reports negative memory growth from output, caused by lines freeing more than allocating, which has been a source of confusion for some users.
- this release also resolves a memory leak.
Overhauled internal signal handling:
- uses signal actors, an approach based on actors that decouples signal handling logic from the main thread, avoiding the risk of races and deadlocks and simplifying logic
Bug fixes:
- fixed missing handling of
pynvml.NVMLError_NotSupported
exception (issue #262); - fixed issue cleaning up after profiling multiprocessor and multithreaded programs;
- fixed issue not accounting for elapsed time when zero frames were recorded (issue #269).
New features:
- added JSON output option (
--json
); - added programmatic profile control (
scalene_profiler.start()
andscalene_profiler.stop()
).
Miscellaneous:
- improved documentation.
Note: this release is for MacOS and Linux only.