-
-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for multiple param groups #1609
Conversation
Pull Request Test Coverage Report for Build 8426226675Details
💛 - Coveralls |
Commenting about this issue here so that I can keep track of it: Right now the update_optimizer utils works like the following: it takes a dictionary of {name: param} mapping of previously optimized parameters, as well as one for new parameters. It works fine if we assume that one same parameter is not going to change name, but this issue can happen, for instance in the case a model is wrapped inside another module while keeping the same parameters. In the case that parameters change name, there is an issue in the current update_optimizer is that it considers parameters that have changed their name as new parameters and resets their state. The OptimizedParameterStructure works well for that because it identifies parameters inside the current optimizer and matches them to a name that correspond to the same parameter id. Maybe I could abandon the use of optimized_param_id dict and fully switch to only using OptimizedParameterStructure, it requires some more work but I think it could work way better and also remove the requirement for storing the previously optimized params in the strategy as it's currently done |
@AntonioCarta I added a test in which I rename plus add one parameter. This test is the typical case where it should not work. But I noticed an additional problem. When some parameter is both renamed and expanded, it can absorb the parameter group of the other parameters without any notice (because in that case the expanded parameter is considered as a new parameter). This is quite an edge case but that could be problematic, because I really don't know how to even notice with a warning this kind of event happens when both the name and parameter id change there is no way to tell this happens. |
Usually expanded parameters are not renamed, right? I don't think we can really fix this (there is no correct behavior here), but maybe we can improve debugging this issues. For example, can we add a |
No, usually they are not renamed, but if you rename them at the same temporality than they are expanded, it can be a problem. It should be fixed however if you call strategy.make_optimizer() manually after renaming and before the parameter expansion |
I tried it and it works inside this pull request |
yes, it was an issue in my env. Now it's working. |
@AntonioCarta I think it's ready to merge. I made a few checks on the continual learning baselines to check that I obtain the same results with old and new implementation. I also added a test where the user adds manually some parameters to the parameter group he wants and it works fine. |
Thank you! Everything looks good now. |
Hey, I just put a prototype for now of the kind of changes I want to bring.
First of all, quick reminder of why it's not as easy as it seems to support multiple parameter groups:
How I plan to remedy to that:
Ideally, we want to have native support for a normal use of multiple parameter groups. If the user puts different parameter groups into the optimizer, he should not have to worry about them being broken. Of course, right now it's not the case and what we do is just to reset the optimizer at the start of training to setup the name, param_id mapping later used for update, and we put an assertion saying that if the optimizer has more than one param group it fails and it requires the user to implement it's own make_optimizer function (which is a pity because then using an existing method with different parameter groups requires to copy the strategy or create a new plugin).
Since users can add parameters both through DynamicModules and Manually, it's better to use a general method that works with any nn.Module object
Provide an automatic and natural way based on the natural module hierarchy to assign parameter groups to new parameters. Leaving more complex logic (which should not happen so often ?) to custom implementations.
The idea is the following, we let the user define it's optimizer with it's own param groups. Then we go over the model named_parameters() and store each name into a tree structure (which is a copy of the torch module tree structure). Each node (not necessarily parameter, a parameter is a leaf node) will contain information about a set of parameter groups that it's children nodes belong to. Then, when a new node is added (new parameter), it's group is inferred based on the group of it's parent node. If his parent node has multiple groups, it means that this node's parameter group cannot be inferred and returns an error asking the user to either change the module creation process or implement his own optimizer update logic.
@AntonioCarta I just provide for now a playground in test/test_optimizer.py that you can run as-is to see how it will work, I will implement the actual thing later. Let me know if you agree on the idea first