Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom timeout per task and retry doesn't seem possible #1281

Open
saro2-a opened this issue Jan 10, 2025 · 2 comments
Open

Custom timeout per task and retry doesn't seem possible #1281

saro2-a opened this issue Jan 10, 2025 · 2 comments

Comments

@saro2-a
Copy link

saro2-a commented Jan 10, 2025

I was trying to restart stalled jobs, with custom timeouts.

We have several jobs that depending on the input they can either last 1 minute or 3h, with a uniform distribution. At the time of job submission we know how long it is going to take (more or less), but when I fetch "get_stalled_jobs" it seems the "started_at" of the event might not be retained at the creation of the job:

It is fetched:
SELECT job.id, status, task_name, priority, lock, queueing_lock, args, scheduled_at, queue_name, attempts, max(event.at) started_at

but not retained
https://github.com/procrastinate-org/procrastinate/blob/main/procrastinate/manager.py#L175
https://github.com/procrastinate-org/procrastinate/blob/main/procrastinate/jobs.py#L77

hence seemingly making the task impossible?

        @self.app.periodic(cron="*/10 * * * *")
        @self.app.task(queueing_lock="retry_stalled_jobs", pass_context=True)
        async def retry_stalled_jobs(context, timestamp):
            stalled_jobs = await self.app.job_manager.get_stalled_jobs(
                nb_seconds=RUNNING_JOBS_MAX_TIME_SECONDS
            )
            # TODO it is currently not possible to have some jobs with custom duration.
            # it needs to be solved at lib level
            for job in stalled_jobs:
                proc_task_max_run_time = job.task_kwargs.get("proc_task_max_run_time")
                if not proc_task_max_run_time or proc_task_max_run_time < now()- {{{ job.started_at ??where to get the start time of the event??}}}:
                    await self.app.job_manager.retry_job(job)

Could we either:

  • support proc_task_max_run_time as a first class parameter (probably preferred)
  • or pass the started_at?

Thank you

@ewjoachim
Copy link
Member

ewjoachim commented Jan 10, 2025

This looks similar to #702 which we wanted to tackle in #740 with heartbeats

EDIT: well, no, timeouts and retrying are different. It's close but not the same. I'll try looking in more details.

@ewjoachim
Copy link
Member

I think you're right in that the manager doesn't git access to the "Events" table. I think what would make the most sense is the ability to inspect the events of a job.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants