-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workflows, multiple tasks depending on one another #1198
Comments
I think you're right
I don't think it's a good idea, but I can't tell you exacty what will fail. I think you should rather do this on a different table, but you can add a foreign key as long as you handle the cases where the job will be deleted (so |
Assuming a task C needs to run when task A and B complete, introduce an orchestrator task O that defers task A and B, waits for both of them to complete and then defers task C. Would that work in your scenario? |
Hm, I'm not sure the "Orchestrator task" is a concept I like. If the worker running it crashes, the whole set of task will be completely broken. |
It is not foolproof but might be good enough. This way, the task O can be retried if the worker crashes. |
So the way Chancy does this seems to work (1b+ workflows so far), and looking at the model for Procastinate I don't see why something similar wouldn't be an option if there's no issue with adding more tables. It can be done in a way that avoids modifying the existing jobs and instead just builds on top of them. Two tables are used, the first to track the workflows themselves:
And another to track each step inside of a workflow, which will be updated with the ID of the job once it's started (job_id becomes a bigserial for procastinate, state becomes status, etc):
Periodically, incomplete workflows are picked up and processed:
Fetching the workflows and their steps can be done in a single quick query thanks to json_build_object:
And then the process to progress a workflow becomes trivial, ~15 lines, https://github.com/TkTech/chancy/blob/main/chancy/plugins/workflow/__init__.py#L340. Since procastinate doesn't have a leadership node, we'd add a This way job's don't know they are part of a workflow, no persistent job is needed, just a periodic one, and the only relationship between the two is the job ID and state. Since this implements DAG-based workflows, it becomes easy to re-implement Celery's Chain and Group as well - 6 lines. |
Thanks for all the answers folks. I'll try to to investigate more and come up with something.
|
Hi. Let's say I need job to run only if 2 others (created before) are completed successfully, AFAIU there's no support for this.
Can I extend a procrastinate job queue table myself to add array field column with required jobs ID list and I could check if all of them are finished inside my task manually or with some decorator? In other words, is having extra columns supported and won't break standard flows?
The text was updated successfully, but these errors were encountered: