Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Swap spot instances without running temporary on-demand instances #118

Open
cristim opened this issue Jul 25, 2017 · 6 comments
Open

Swap spot instances without running temporary on-demand instances #118

cristim opened this issue Jul 25, 2017 · 6 comments

Comments

@cristim
Copy link
Member

cristim commented Jul 25, 2017

Feature idea

There should be an option to handle the spot termination signal from an agent component running on the spot instances and use it to replace it with another spot instance without running any temporary on-demand instances.

The terminated spot instance would be decoupled from the group and a new spot instance would be launched and added to the group to compensate for the drop in capacity.

This should be configurable on a per-group level using a dedicated tag.

Note: One thing to take into consideration is that the termination notice is 2min prior to the termination, which means that autospotting has to run at least every two minutes - but might miss some - or every minute.
This is already the case in Cloudformation, but not in Terraform.

@deinspanjer
Copy link

deinspanjer commented Sep 26, 2017

Nevermind, I was misunderstanding how the termination works. It is only a two minute grace period before shutdown.

@cristim
Copy link
Member Author

cristim commented Sep 26, 2017

@deinspanjer I don't think I understand this fully, please explain a bit more.

The last hour worth of costs is simply subtracted from the total cost of running that spot instance, this is a billing thing which we don't really need to care about.

The replacement instance will be launched only after the termination notification was received, which is 2 minutes before the outbid spot instance is terminated.

I don't think we need to store anything, we can implement it by firing another event that launches the function with some parameters, and everything should be handled by the logic implemented in the function's code.

@deinspanjer
Copy link

Yep, this sounds exactly right, I was just working off of very wrong assumptions in my initial question. :)

I'm not able to contribute to this project at the moment, but I am very interested in it and if things get a little more sane at work, I would be happy to help out with pieces of this.

@xlr-8
Copy link
Contributor

xlr-8 commented Sep 29, 2017

Considering the recent exchanges, I allowed myself to edit your issue to add a note relative to the run frequency - simply in case someone else is implementing it.

@cristim
Copy link
Member Author

cristim commented Sep 29, 2017

@xlr-8 I don't think we need to change/consider the frequency, this could be implemented as another event generator for the Lambda function, basically the component running on the terminating instance would immediately call the Lambda function and tell it to detach the current instance and launch a new spot instance for that group.

We would need to create a new trigger that can run the function: maybe a REST endpoint implemented with an API Gateway so we don't need any additional IAM permissions, but maybe we can also do it with an SNS topic, this needs to be investigated.

But at the end of the day, in many cases the new billing-per-second feature makes this additional complexity harder to justify. The additional cost savings would be relatively small, it's just that we'd have less workload transitions before the group converges the configuration back to spot.

@cristim
Copy link
Member Author

cristim commented Jul 15, 2019

This is now relatively easy to do using the infrastructure we already have in place to listen for instance terminations. I'll work on this next.

Once done it should also help with #156, #284, #332 and #343

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants