-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track training and validation loss #204
Comments
We should also report the learning rate as a column in this file. |
This might be a good place to add a check for NaN training or validation loss. I think it would be a good idea, when you print out the losses, if you can test for NaN and then terminate with an error when that happens. |
How granular do we want this log be (i.e. record on every step or just once per epoch)? I'm looking into a potential solution that uses the |
I got a working solution going using the PyLighting's built in |
I guess that's OK, though it's a bit unsatisfying to use a different delimiter for the different output files produced by Casanovo. |
Alternatively we could implement our own TSV writer. This does have the drawback of adding more maintenance overhead, but gives us a lot more control over the format of the loss file. Even beyond the delimitator, while the file given by the
Rows will have different cells populated depending on when the logging operation happened (at the end of a training step, validation step, or validation epoch). While this workable I don't think it is exactly desirable. |
I agree that it might be nice to have finer-grained control, but I am not sure how high priority that is. @bittremieux what do you think? |
The blank entries during loss logging, both by the CSVLogger and in the console, is what sparked this request last year. I agree that this format is silly and it would be better to have it as a long table in a log file, but unfortunately this is not possible with the built-in Lightning loggers. Meanwhile, I also don't consider it a very high priority. Ultimately, what's the goal of this additional feature? To more easily make loss plots. (Unless I'm missing something.) However, rather than making those plots ourselves after the training has happened, if you want nice (and interactive) loss plots, it's as simple as enabling the Tensorboard logging, and you have everything in there without any efforts. This is also not a feature that (m)any users will care about, as it is only relevant for people training Casanovo (i.e. us). |
OK, I think we should go ahead with the CSVLogger implementation. Providing this functionality is better than not, and I do think we want to empower as many users as possible to train (or fine tune) models, not just use our models. |
I think we should add a Boolean config file option called something like
loss_file
that triggers creation of a TSV file containing training and validation loss information. This could have the following columns: Loss type ("train" or "validation"), Epoch, Batch, Loss. I would make the epoch a float (just the number of batches divided by the batches-per-epoch).The text was updated successfully, but these errors were encountered: