-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor the process of controlling a training job #39
Conversation
@@ -106,9 +126,11 @@ func New( | |||
// is closed, at which point it will shutdown the workqueue and wait for | |||
// workers to finish processing their current work items. | |||
func (c *TrainingJobController) Run(threadiness int, maxLoadDesired float64, stopCh <-chan struct{}) error { | |||
// TODO add a lock to ensure there is only one controller in the cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resource lock has been implemented, so the TODO is needless.
@@ -125,9 +147,12 @@ func (c *TrainingJobController) Run(threadiness int, maxLoadDesired float64, sto | |||
go wait.Until(c.runWorker, time.Second, stopCh) | |||
} | |||
|
|||
// gc := NewGarbageCollector(c.KubeCli, c.trainingjobLister) | |||
// go gc.CleanOrphans(10 * time.Minute) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete commits and add a TODO for gc
@@ -34,6 +34,7 @@ var ( | |||
func main() { | |||
masterURL := flag.String("master", "", "Address of a kube master.") | |||
kubeConfig := flag.String("kubeconfig", "", "Path to a kube config. Only required if out-of-cluster.") | |||
autoClean := flag.Bool("autoclean", false, "Auto clean pods after terminating job, default false") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So make this default false means user may need to get the logs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if flag autoclean
is false, controller will maintain pods after success or failure. Otherwise, all pods will be deleted automatically. It's useful to debug and get logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already did the code review in Baidu repository. Merge it now.
fix #26