-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for limit_train_batches to megatron sampler classes #10648
Conversation
Signed-off-by: trias702 <[email protected]>
Signed-off-by: trias702 <[email protected]>
[🤖]: Hi @trias702 👋, I just wanted to let you know that, you know, a CICD pipeline for this PR just finished successfully ✨ So it might be time to merge this PR or like to get some approvals 🚀 But I'm just a 🤖 so I'll leave it you what to do next. Have a great day! //cc @ko3n1g |
This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days. |
jenkins |
This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days. |
jenkins
…On Mon, Oct 28, 2024, 19:00 github-actions[bot] ***@***.***> wrote:
This PR is stale because it has been open for 14 days with no activity.
Remove stale label or comment or update or this will be closed in 7 days.
—
Reply to this email directly, view it on GitHub
<#10648 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGFLGND56EFTP65AIVAI4PTZ53T3HAVCNFSM6AAAAABO6AYA72VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINBTGAYDONZZHE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days. |
Jenkins
…On Fri, Nov 15, 2024, 18:01 github-actions[bot] ***@***.***> wrote:
This PR is stale because it has been open for 14 days with no activity.
Remove stale label or comment or update or this will be closed in 7 days.
—
Reply to this email directly, view it on GitHub
<#10648 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGFLGNBIXM2W3FQ2QGNSDCD2A2RO3AVCNFSM6AAAAABO6AYA72VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOBQGMYDQNZYGI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At first sight it seems to me that this parameter doesn't belong to the sampler, and it should be up to the caller to provide the correct value for total_samples
. Why do you think we need it in the sampler?
This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days. |
We spoke about this before in a meeting, and I told you we can either hack it in via the algorithm class, or put it inside the Sampler, and you said inside the Sampler was the correct place for it. |
Ah, my memory is failing me, though as discussed in NVIDIA/NeMo-Aligner#321 (comment) there's a way to do it without it being a hack -- I assume we hadn't thought of this option when we spoke about it (really can't remember, sorry). |
This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days. |
This PR was closed because it has been inactive for 7 days since being marked as stale. |
What does this PR do ?
Adds support for limit_train_batches (as used in PTL) to work correctly. Without this PR, use of limit_train_batches less than 1.0 or num_samples will not work correctly when doing multi-epoch training in downstream libraries such as Nemo-Aligner.
Collection: NLP
Changelog
Usage
You can now pass an additional parameter,
limit_train_batches
to all 4 Sampler classesGitHub Actions CI
The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.
The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information