Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Packer Build job for Windows AMI #6064

Merged
merged 2 commits into from
Dec 17, 2024
Merged

Conversation

zxiiro
Copy link
Collaborator

@zxiiro zxiiro commented Dec 16, 2024

Add a job to create Windows AMIs in the PyTorch AWS Account.

Issue: #5992

Copy link

vercel bot commented Dec 16, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Updated (UTC)
torchci ⬜️ Ignored (Inspect) Visit Preview Dec 17, 2024 1:20pm

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 16, 2024
@zxiiro zxiiro force-pushed the zxiiro/windows-ami-builder branch from b5c22ad to 841f5b1 Compare December 16, 2024 19:39
@zxiiro zxiiro force-pushed the zxiiro/windows-ami-builder branch from 841f5b1 to 706494a Compare December 16, 2024 19:42
@zxiiro zxiiro force-pushed the zxiiro/windows-ami-builder branch from 706494a to 3f8191e Compare December 16, 2024 19:50
Add a job to create Windows AMIs in the PyTorch AWS Account.

Issue: #5992
Signed-off-by: Thanh Ha <[email protected]>
@zxiiro zxiiro force-pushed the zxiiro/windows-ami-builder branch from 3f8191e to 1bfd5a1 Compare December 16, 2024 21:52
@atalman
Copy link
Contributor

atalman commented Dec 16, 2024

Hi @zxiiro this looks good. Would be nice to see it in action.

@zxiiro
Copy link
Collaborator Author

zxiiro commented Dec 17, 2024

Hi @zxiiro this looks good. Would be nice to see it in action.

@atalman You can see it in action here:

https://github.com/pytorch/test-infra/actions/workflows/build-windows-ami.yml

Since this PR is not yet merged we cannot use the WebUI to trigger it but I've been manually triggering it using the GitHub CLI.

gh workflow run 'Build Windows AMI' --ref zxiiro/windows-ami-builder -f branch=atalman-patch-15

One issue I've been having though with this PR is even though its successfully able to create the AMI. The packer command usually hangs at the very end either after it prints:

==> amazon-ebs.windows_ebs_builder: Waiting for the instance to stop...

The job will fail and no AMI is created in this case.

or when it prints:

==> amazon-ebs.windows_ebs_builder: Waiting for AMI to become ready...

The job will timeout and fail however when I check in AWS the AMI is there so this one passes despite the failure. I'm not sure why this happens. I've only see it successfully pass twice where everything was green.

Signed-off-by: Thanh Ha <[email protected]>
@zxiiro
Copy link
Collaborator Author

zxiiro commented Dec 17, 2024

Upon further inspecting looks like the EBS volume snapshots are taking a very long time in AWS to create. I wonder if we can tell packer to not wait for AMI creation to complete and just end the job once it gets there since the EBS volume snapshots seem like it could be a background thing.

@zxiiro
Copy link
Collaborator Author

zxiiro commented Dec 17, 2024

Upon further inspecting looks like the EBS volume snapshots are taking a very long time in AWS to create. I wonder if we can tell packer to not wait for AMI creation to complete and just end the job once it gets there since the EBS volume snapshots seem like it could be a background thing.

I just saw the snapshots have completed and AMI state is now available yet the packer command is still hanging waiting for it to complete.

@zxiiro
Copy link
Collaborator Author

zxiiro commented Dec 17, 2024

I think we can likely proceed with merging this since this is a manually triggered job and it does everything we need it to do despite this kinda annoying packer hang. This gets us where we want to go and we can cancel the workflow after checking that the AMI is available in AWS.

Copy link
Contributor

@atalman atalman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@atalman atalman merged commit 05db834 into main Dec 17, 2024
4 of 5 checks passed
@atalman atalman deleted the zxiiro/windows-ami-builder branch December 17, 2024 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants