Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

H2O Core: Consistent mechanism to protect frames or some of their vecs from autodeletion #15858

Closed
Tracked by #15854
sebhrusen opened this issue Oct 23, 2023 · 0 comments · Fixed by #6335
Closed
Tracked by #15854
Assignees
Labels
Milestone

Comments

@sebhrusen
Copy link
Contributor

sebhrusen commented Oct 23, 2023

Raised during implementation of #15854

During training of complex models like a Pipeline model that makes use of Frames/Vecs with different life cycles, some vecs get automatically deleted because not needed anymore for the training of a model, although they may still be needed by the pipeline itself.

This fix/improvement provides a consistent mechanism that allows to explicitely declare the frames and vecs that should not be deleted inside a specific context.

@sebhrusen sebhrusen self-assigned this Oct 23, 2023
@sebhrusen sebhrusen added the core label Oct 23, 2023
@sebhrusen sebhrusen added this to the 3.46.0.1 milestone Oct 23, 2023
sebhrusen added a commit that referenced this issue Nov 30, 2023
…s from autodeletion (#6335)

* consistent mechanism to protect frames or some of their vecs from auto deletion

* fix from old javadoc

* typo

* having fun: proposing some syntactic sugar

* fixed DeepLearning inconsistent use of Scope + enforce early detection of those inconsistent usages

* protect frame params only on expensive init

* fix GLMTest with inconsistent use of Scope.enter/exit

* for simplicity (and backwards compatibility) ensure Frame's Vecs get deleted by default during explicit delete of tmp Frame

* fixed too memory-lenient test in GBMTest

* improve usage consistency of Scope.(un)track in code+tests

* fixed wrong usage of track_generic(fr) in tests

* fixed key leakage on model interruption/timeout (was detected only by AutoML tests)

* clean up

* fixed LeaderboardTest after switch to H2ORunner

* ensure that workspace cleanup (mainly used during CV) doesn't 'accidentally' delete protected keys

* fixed introduced NPE

* comment

* fixed corner case when tmp Vec are created e.g. for validation frame on CV model, and still needed for CV scoring

* fixed increasing frustration with inconsistent logic in leaked keys checking: may reveal several leakage issues in some tests

* this should make the priority logic more clear

* improve parent job lifecycle in ModelingStepsExecutorTest as it leaked running job keys (detected with new rule)

* fix rules list for tests inheriting TestUtil

* give the possibility to track some context with some keys and use it for now to bulk delete Vecs at Scope.exit

* optimize Vec tracking so that they can also be bulk removed

* fixed newly introduced NPEs

* added test for null checks

* centralized rules priorities + use annotation for clarity+visibility and better understand how rules are ordered

* disable bulk_remove for some indivudally tracked Vecs

* use Scope.safe to replace recent usages of 'unsafe' deleteTempFrameAndItsNonSharedVecs

* fixed test compilation issue

* removed unused code and comments

* fixed some of the introduced key leaks missed during the rebase

* removing tmp key tracking logs

* fixed orphan Scope.exit in ModelSelection

* fixed orphan Shap contributions keys leakage

* fixed xgboost score predictions

* fixed more key leakages

* revert accidental changes on failing Py test

* fixed failing test pyunit_plot_functions__add_saving_parameter_and_decorate_plot_result

* prevent some potential other race condition in PartialDependence

---------

Co-authored-by: Sebastien Poirier <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant