-
Notifications
You must be signed in to change notification settings - Fork 455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GSoC] Create LLM Hyperparameters Optimization API Proposal #2333
Conversation
/area gsoc |
Ref issue: #2291 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this @helenxie-bit.
I left initial questions.
Please take a look @kubeflow/wg-training-leads @deepanker13 @droctothorpe.
@andreyvelich I have removed the In the updated version, users must provide parameters such as Please review the changes and let me know if you have any feedback or suggestions! |
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
…rom tune api Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
Co-authored-by: Andrey Velichkevich <[email protected]> Signed-off-by: helenxie-bit <[email protected]>
…ation' part Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
Other than consistent naming, it looks good |
…ive function Signed-off-by: helenxie-bit <[email protected]>
"storage_class": None, | ||
"access_modes": constants.PVC_DEFAULT_ACCESS_MODES, | ||
}, | ||
objective: Optional[Callable] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to using the Storage Initializer to download models and datasets from HuggingFace or S3, I added objective
and base_image
parameters to allow users to define their own objective function. This approach enables us to modify the existing tune
API directly instead of developing another one. What do you think? @andreyvelich @johnugeorge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should validate that user sets one of objective
or trainer_parameters
argument in the tune
API. Other idea might be to give user ability to set HuggingFaceTrainerParams
directly in objective argument. E.g.:
objective = HuggingFaceTrainerParams(
training_parameters = transformers.TrainingArguments(
output_dir = "test_trainer",
save_strategy = "no",
eval_strategy = "epoch",
disable_tqdm = True,
log_level = "info",
learning_rate = katib.search.double(min=1e-05, max=5e-05),
per_device_train_batch_size = katib.search.categorical([8, 16, 32]),
num_train_epochs = katib.search.int(min=1, max=10),
weight_decay = katib.search.double(min=0.0, max=1.0),
),
# Set LoRA config to reduce number of trainable model parameters.
lora_config = LoraConfig(
r = katib.search.int(min=8, max=32),
lora_alpha = 8,
lora_dropout = 0.1,
bias = "none",
),
),
I understand that this will be in-consistent with train
API in training client, but it will help us avoid unnecessary validation for one of
WDYT @helenxie-bit @johnugeorge ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my perspective, it's better to define objective
and trainer_parameters
separately. In this way, we can keep the trainer_parameters
argument (instead of setting HuggingFaceTrainerParams
directly in the objective
argument, which would render trainer_parameters
useless), and this approach is also clearer for users. Users will have two options:
- Use external models and datasets: Users need to set
model_provider_parameters
,dataset_provider_parameters
,storage_config
, andtrainer_parameters
. - Define a custom objective function: Users need to set
objective
andtrainer_parameters
.
We can add validation for this logic, and it should not be difficult.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andreyvelich @johnugeorge According to today's discussion, I will keep these parameters as they are for now. We can make adjustments in the next version.
Signed-off-by: helenxie-bit <[email protected]>
) | ||
``` | ||
|
||
_#WIP_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andreyvelich I have removed the "WIP" tag. Please take a final look at the proposal to identify any other issues.
"storage_class": None, | ||
"access_modes": constants.PVC_DEFAULT_ACCESS_MODES, | ||
}, | ||
objective: Optional[Callable] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should validate that user sets one of objective
or trainer_parameters
argument in the tune
API. Other idea might be to give user ability to set HuggingFaceTrainerParams
directly in objective argument. E.g.:
objective = HuggingFaceTrainerParams(
training_parameters = transformers.TrainingArguments(
output_dir = "test_trainer",
save_strategy = "no",
eval_strategy = "epoch",
disable_tqdm = True,
log_level = "info",
learning_rate = katib.search.double(min=1e-05, max=5e-05),
per_device_train_batch_size = katib.search.categorical([8, 16, 32]),
num_train_epochs = katib.search.int(min=1, max=10),
weight_decay = katib.search.double(min=0.0, max=1.0),
),
# Set LoRA config to reduce number of trainable model parameters.
lora_config = LoraConfig(
r = katib.search.int(min=8, max=32),
lora_alpha = 8,
lora_dropout = 0.1,
bias = "none",
),
),
I understand that this will be in-consistent with train
API in training client, but it will help us avoid unnecessary validation for one of
WDYT @helenxie-bit @johnugeorge ?
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
Related: #2339 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, we can merge it.
We can create followup PRs for additional changes.
Thanks for driving this @helenxie-bit!
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andreyvelich The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@andreyvelich Of course! Thank you for reviewing! |
What this PR does / why we need it:
Give user functionality to tune HyperParameters of LLMs using simple Python SDK APIs
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #