-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PredictionWriter
: optional gzip, use ThreadPoolExecutor
#286
Conversation
PredictWriter
: optional gzip, use ThreadPoolExecutorPredictionWriter
: optional gzip, use ThreadPoolExecutor
Here begins my stream-of-consciousness during development: The only thing undesirable about this is that the |
Empirically it seems that 8 thread workers does not fall behind systematically, so for now I will change the default to 8 |
Ah dang it, it was killed with 8 workers due to OOM after 2.5 hours... Interested in your thoughts @ordabayevy |
Okay I have now implemented a Projected total runtime looks like about 5.5 hours now instead of 5 (projection with unbounded queue). |
Added an (untested) check to fail fast if it can be projected that the total size of the prediction output files will not fit in the allocated disk space. (I ran into this problem and it was only after several hours I found out. :( ) |
The OOM problem was delayed but never completely disappeared. I now think the issue has to do with the following... Needed to reach into cellarium-ml/cellarium/ml/callbacks/prediction_writer.py Lines 47 to 49 in 090161e
i.e. the |
The above |
I can get rid of the changes to The only thing that bothers me about that solution is that somebody like me can come in, not know they have to include that in the config file or where it goes, and waste a bunch of time hitting out-of-memory errors :) What I like about modifying But I can see both sides... and I guess it would be okay with me if we left things as is and just put in some kind of massive What do you think @ordabayevy ? |
I prefer using the config file instead of hard coding and do the following:
... but if we never really gonna need |
Okay I've got those changes implemented. Got rid of the hard-coded |
cellarium/ml/cli.py
Outdated
"This can be set at indent level 0 in the config file. Example:\n" | ||
"model: ...\ndata: ...\ntrainer: ...\nreturn_predictions: false", | ||
UserWarning, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might not work if the config file used. Config file is parsed later on in the init method of LightningCLI
. Probably this logic needs to be moved somewhere there. It should happen after this line self.parse_arguments(self.parser, args). This hook https://github.com/Lightning-AI/pytorch-lightning/blob/a944e7744e57a5a2c13f3c73b9735edf2f71e329/src/lightning/pytorch/cli.py#L554 might be a good place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right you are! Thank you.
I have now included an explicit test to make sure I'm actually doing what I wanted. Indeed you're right... it didn't work with config files. I've tried to implement what you suggested and the tests seem to pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
warning_message = r.message.args[0] | ||
if match_str in warning_message: | ||
n += 1 | ||
assert n < 2, "Unexpected UserWarning when running predict with return_predictions=false" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this test do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah so this is asserting that the UserWarning is not emitted if running prediction with return_predictions: false
.
I might be doing it in a weird way, but I'm not sure what the right way is. It's easy to assert that a warning is emitted, but not so easy to test that a warning is not emitted (from what I can tell). The only way I could figure was to count up the warnings matching a certain match string. (And I needed at least one such warning, or the counting mechanism would not work. Thus the assertion n < 2
... there is one "fake" warning to enable counting, and then any further warning would be the real warning.)
Closes #285
These changes add two init args to
PredictionWriter
:gzip
(bool) andmax_threadpool_workers
(int), each of which have default values.PredictionWriter
now gzips saved CSVs by default, and runs the saving-and-gzip process in a background thread that does not block furtherlightning
compute.NewLightningCLI
incli.py
also now injectsreturn_predictions=False
into calls totrainer.predict()
.Testing indicates the following outcomes for scvi reconstructions which involve computing a dense output CSV with 250 columns:
(the 18 hrs is gzipping the CSVs without the ThreadPool. if you don't gzip, the total time would be 7.5 hours.)