-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for TTLs #1
Comments
Seriusly @stensonb, thanks for your words 😉 This is still a bit experimental and any help testing the cookbook is welcome. Please, if you could explain in more detail what you need or send me a link with more information. Unfortunately I'm not very familiar with lock TTLs. Is not By the way, I'm using the zk gem to implement the lockers. So we can be somewhat limited. |
I'm thinking of the case where the guarded code barfs and causes chef-client to fail...either because of software exception (which could probably gracefully release the lock), or some hardware exception (the machine shutsdown -- cannot release the lock). The question is this: how does an abandoned lock get released? I suggest TTL... |
Another feature, which could be tangential to this one, is that of persisting the lock locally -- in order to recover from chef-client run failures. That is, when Node A runs chef-client...it gets Lock B, and performs some task C. If task C fails to complete (software or hardware), it should be possible for Node A to run chef-client again (or in daemon mode), and resume by deserializing Lock B from persistent storage and continue the chef-client run. |
Through some local testing, I was able to determine that:
|
First of all, thank you for your research.
AFAIR
Also this seems to be the typical characteristic that they could refuse to include. I'm not sure, but I think this should be implemented using other existing lock patterns instead of as you pretend. I'm sorry, but I do not quite understand your example use case (in a real scenario I mean). |
Yup, looks like the ZK::Locker::Semaphore class is using :ephemeral_sequential when creating the nodes...and this makes zookeeper clear the nodes when the TCP connection is closed. The frustrating part of this is that -- currently -- there is no way for zookeeper_bridge_sem to report (to other nodes), whether the chef-client completes successfully or not:
The result is that other nodes in the cluster will get the lock eventually, and converge their local resources. This could be disastrous for web servers in a load balancer, for example (a "service" restart which fails could bring the entire load balanced solution down...not good). In summary, the zookeeper_bridge_sem currently only sequences the beginning of a resource, not the successful convergence of it. (I understand this discussion has now diverged wildly from the original "TTL" question...maybe I'll move this to the wiki) |
@stensonb, sorry for the delay 😰 Perhaps the To manage errors inside semaphores, I think you should combine them with other resources, like locks or waits. Maybe we should implement a resource like
I think this place is better than the wiki for discussions. But you can add to the wiki whatever you think appropriate. |
First off...THIS IS AWESOME! I've been dreaming about this for a while, and then found you've already done most of the legwork! :)
It would be cool to see a lock expire based on supplied TTL...and/or, a method to update the TTL of an existing lock -- basically, I'd like the "dead-man's switch" model...
The text was updated successfully, but these errors were encountered: