-
Notifications
You must be signed in to change notification settings - Fork 231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor to introduce Trainer
& TrainingArguments
, add SetFit ABSA
#265
Conversation
Note: This commit involves three deprecations: the SetFitTrainer class (and DistilledSetFitTrainer), additional arguments to Trainer.train and the keep_body_frozen argument to Trainer.unfreeze. The first and last of these are 'graceful', i.e. old code will still work, but the Trainer.train changes are breaking in some situations. For example, e.g. um_epochs can no longer be passed to Trainer.train. The new 'deprecated' test files are identical to the old test files. The goal here is to test whether old behaviour is still possible. For the most part it is, with exception of using Trainer.train with extra arguments. As a result, I skipped two tests in test_deprecated_trainer.py. Also note that docstrings have yet to be updated!
…anch 'main' of https://github.com/huggingface/setfit into refactor_v2
Merged 4cad62c...174eb00 into this PR via 94106cc. |
…rate The reasoning is that with body_learning_rate, the tuple is for (training embedding phase, training classifier phase), which matches the tuples that you should give to num_epochs and batch_size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, this is a real tour de force of a PR @tomaarsen - great job refactoring all this complexity into clean APIs 🚀 !
Overall the code looks great and most of my comments are just nits + some suggestions for the deprecation warnings. I'll respond to your discussion points in a separate comment
Regarding the discussion points:
In
I would like to remove it in v1 if possible, so one approach for that could be to make a v0.6 or v0.7 release that has all the deprecation warnings included. Alternatively, we could merge this in it's current form and tell users it will be removed in v2
Yes, please deprecate!
I think it's OK, but happy to change it if you have a different name :)
Yes, I've been wondering this too - I think it would be good to put all non-dataset-dependent objects into
I think we could aim |
Also add DeprecationWarning for DistillationSetFitTrainer
Co-authored-by: lewtun <[email protected]>
Co-authored-by: lewtun <[email protected]>
The old conditional was True with the default -1, not ideal
Sampling strategy (new sampler)
* Initial version for SetFit ABSA * Create complete test suite for ABSA (100%,90%,96%) Only push_to_hub is not under test * Run formatting * Allow initializing models with different span_context * Remove resolved TODO * Raise error if args is the wrong type * Update column mapping, allow partial maps * Remove model_init from ABSA Trainer, not used * Split train into train_aspect and train_polarity And reformat * Prefix logs with aspect/polarity when training * Add ABSA-specific model cards * If spaCy doesn't agree with the start/end, just ignore those cases * If there are no aspects, just return * Elaborate on the required columns in the datasets * Add Absa to the WIP docs
As it requires accelerate to be updated
solved an issue that was causing failures
Trainer
& TrainingArguments
Trainer
& TrainingArguments
, add SetFit ABSA
Closes #179, closes #238, closes #320
Hello!
Pull Request overview
Trainer
, a modified version of the oldSetFitTrainer
.DistilledTrainer
, a modified version of the oldDistillationSetFitTrainer
.TrainingArguments
, a dataclass used throughout the updatedTrainer
classes.SetFitTrainer
andDistillationSetFitTrainer
: a soft deprecation, i.e. old code should give a warning, but still works.Trainer.unfreeze
andTrainer.freeze
now simply point toSetFitModel.unfreeze
andSetFitModel.freeze
. This involves a soft deprecation for thekeep_body_frozen
parameter, i.e. old code should give a warning, but still works. Furthermore,Trainer.freeze()
now freezes the full model, not just the head.Trainer.train
no longer accepts training arguments likenum_epochs
: a fairly soft deprecation, i.e. old code no longer works like before as the keyword arguments are ignored, but the code won't crash. A warning is thrown instead. This will likely affect all users of the differentiable head.SupConLoss
was moved tosrc/setfit/losses.py
.SetFitBaseModel
andSKLearnWrapper
, which were both unused.test_deprecated_trainer_distillation.py
andtest_deprecated_trainer.py
. These files contain old unmodified tests, and they show that despite drastic changes, old code still generally works (with exception of the aforementionedTrainer.train
training argument deprecation).test_trainer_distillation.py
andtest_trainer.py
to use the new training procedure.test_training_args.py
.Goals
Based on my own experiments and useful feedback from @kgourgou in the refactoring discussion in #238, I have reduced my goal for this refactoring to the following:
Trainer
andTrainingArguments
classes.TrainingArguments
class. In short, a fit method of a component (body or head) must not accept aTrainingArguments
instance as a parameter, but merely the hyperparameters likelearning_rate
, etc.trainer.train()
, regardless of the head chosen.Notably, I'm not introducing an interface for classifier heads like I had originally planned, and I'm not deprecating
SetFitModel.from_pretrained(..., use_differentiable_head=True)
withSetFitModel.from_pretrained(..., head=...)
. Perhaps some other time we can have a new discussion about those.Upgrading guide
To update your code to work with this PR, the following changes must be made:
SetFitTrainer
withTrainer
, and all uses ofDistillationSetFitTrainer
withDistillationTrainer
.num_iterations
,num_epochs
,learning_rate
,batch_size
,seed
,use_amp
,warmup_proportion
,distance_metric
,margin
,samples_per_label
from aTrainer
initialisation, and move them to aTrainerArguments
initialisation instead. This instance should then be passed to the trainer via theargs
argument.trainer.train()
,trainer.freeze()
andtrainer.unfreeze()
calls that were previously necessary to train the differentiable head into just onetrainer.train()
call by settingbatch_size
andnum_epochs
on theTrainingArguments
dataclass with Tuples. The first value is for training the embeddings, and the second is for training the classifier.I applied this guide to all of the tests to upgrade them to work with the new classes.
An example using a differentiable head:
Before:
After:
Deprecations
SetFitTrainer
&DistillationSetFitTrainer
This is a soft deprecation, i.e. these can still be used like before. For reference, the following snippet still works.
First snippet from the README, using SetFitTrainer
The corresponding output
keep_body_frozen
ontrainer.unfreeze
:This is also a soft deprecation, so we can use this like before. Any use of
keep_body_frozen
results in a warning, andtrainer.unfreeze(keep_body_frozen=True)
is made equivalent tomodel.unfreeze("head")
, whiletrainer.unfreeze(keep_body_frozen=False)
is likemodel.unfreeze()
. However, there should be no need for this method anymore, so we may be better off to deprecate it fully.Please note the following snippet:
This snippet was common when training with a differentiable head, but stops working after this PR, as
trainer.freeze()
will now fully freeze the model, after whichtrainer.train()
fails as it can't perform abackward
on the frozen sentence transformer body. This causes an error:Trainer.train
argumentsThis is a fairly soft deprecation, so code with e.g.
trainer.train(num_epochs=..., ...)
no longer works like before. The arguments passed via keywords are simply ignored now, while a warning is thrown:Training Arguments
Note that the
TrainingArgument
class must be able to hold separate values for training the embeddings vs the classifier. Most difficult is the case of learning rates, for which 3 separate values must exist:For several other arguments (e.g.
num_epochs
,batch_sizes
), two values must exist:For these last two cases,
TrainingArguments(num_epochs=12)
can be used, which will set bothembedding_num_epochs
andclassifier_num_epochs
to 12. Alternatively,TrainingArguments(num_epochs=(1, 12))
will setembedding_num_epochs
to 1 andclassifier_num_epochs
to 12. Beyond that,TrainingArguments(embedding_num_epochs=1, classifier_num_epochs=12)
is also allowed, i.e. they can be set directly. In practice, onlyembedding_num_epochs
andclassifier_num_epochs
are used, whilenum_epochs
exists only to make setting the arguments more natural.The learning rate case is similar. For these,
body_learning_rate
andhead_learning_rate
must be set explicitly. Like before,body_learning_rate
can be either set as a float or as a tuple of floats. If a tuple is given, then the first value is once again for training the embeddings, while the second value is for training the classifier. If instead only one value is given tobody_learning_rate
, then it is internally expanded to a tuple. Afterwards,body_embedding_learning_rate
andbody_classifier_learning_rate
are either explicitly provided or implicitly set viabody_learning_rate
. Like before, onlybody_embedding_learning_rate
andbody_classifier_learning_rate
are used, whilebody_learning_rate
exists only to make setting the arguments more natural.Beyond that, there is a new training argument called
end_to_end
. If the model is initialized with a differentiable head, then this argument specifies whether the sentence transformer body is frozen (ifend_to_end=False
) or not (ifend_to_end
is True) during training of the classifier. This replaces the previous need oftrainer.freeze()
before and duringtrainer.train()
calls.See the Upgrading Guide for an example of how the TrainingArguments may be initialized in a relatively complex case.
Tests
The new
tests\test_deprecated_trainer.py
andtests\test_deprecated_trainer_distillation.py
files are equivalent to thetests\test_trainer.py
andtests\test_trainer_distillation.py
files at ac958ae, with four exception: Intests\test_deprecated_trainer.py
, three tests are skipped:trainer.train(max_length=max_length, ...)
and then expect a warning based on that. However, these arguments are now ignored, so no warning is ever thrown.And as a fourth change,
from setfit.modeling import SupConLoss
was changed tofrom setfit.losses import SupConLoss
.Using these tests, it is clear that users will not immediately face dozens of errors when updating. Instead, they may face several warnings and perhaps an error if they are using differentiable heads.
Discussion points
SetFitTrainer
altogether? We should include this in the deprecation warning messages.Trainer.freeze()
andTrainer.unfreeze
? They should not be needed anymore, and they are simply equivalent to callingtrainer.model.freeze()
andtrainer.model.unfreeze()
. I'm in favour of the deprecation.Trainer.freeze()
, should we then introduce a special error message when people usetrainer.freeze()
followed bytrainer.train()
, which now breaks whereas it worked previously? (See the Deprecation section)end_to_end
for training a differentiable model in full, i.e. not freezing the body when training the classifier?Trainer
be absorbed intoTrainingArguments
? In particular,loss_class
?And perhaps most importantly, I'm interested in all of your feedback on how you feel about these changes.
To do:
There are several things that still need to be done before this can be merged:
TrainingArgument
.embedding_batch_size
andclassifier_batch_size
are set correctly wheneverbatch_size
is set.I'll make a separate PR for updating the README/Notebook/scripts.
Feel free to ask me anything if you have any questions.
cc: @lewtun