Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose some identifier for Worker and a way to query its status #1039

Open
felipesere opened this issue Nov 30, 2024 · 10 comments
Open

Expose some identifier for Worker and a way to query its status #1039

felipesere opened this issue Nov 30, 2024 · 10 comments
Labels
enhancement New feature or request
Milestone

Comments

@felipesere
Copy link
Contributor

Feature Request

Is your feature request related to a problem? Please describe.

In my app I can upload (large) images. For the frontend I use the image crate to resize a much smaller thumbnail, but this can take a few seconds. I would like for this work to be scheduled into the background from the requst.

Describe the solution you'd like

Scheduling in the background is possible with a Worker, which is awesome.
Ideally, there would be an identifier generated when I call let job_id = ImageResizer::perform_later().
I would then expose that ID to the frontend and let it poll on some endpoint to check when the job is done.

That way I can tell the user that the image has been uploaded, and then pop an extra notification when the background job is done and the image is available to use.

Describe alternatives you've considered

I have considered some kind of ID tracking in the job itself where I track jobs for a given resource (recipe, in my case) in a separate table but that sounds awfully annoying.

@felipesere felipesere added the enhancement New feature or request label Nov 30, 2024
@jondot
Copy link
Contributor

jondot commented Dec 1, 2024

Hi,
A common solution for this workflow is to decouple a job from "item at work".
Why does this decoupling matter? well, for example, imagine you're running via Async job backend, this means you cannot poll from anywhere else other than the current server on which a thread is performing the job (a "thread job id" would not mean anything anywhere else).

So what to do? In general, have an "images" or "conversions" table:

  1. create a new conversion record with: [id, resource url, status]
  2. spawn a new job, carrying the conversion record ID
  3. when the job finishes, update the conversion record by ID, set status to 'done'

Your polling happens against the "conversions" table.

This way lets you:

a) switch backens
b) move from ad-hoc background jobs to bulk job processing (if you wish)
c) have as much tracking of resources as you wish (keep this table with the URLs, be able to refresh the table if URLs change and spawn new jobs to re-resize, etc)

See more in discussions here: https://www.reddit.com/r/rails/comments/naszfk/getting_job_status_in_rails/

Having said that, there are instances where a "job ID" exists, specifically job workflow coordination. We do carry job IDs in each individual implementation, so I'd be more than willing to expose this, because I see no reason not to.

@felipesere
Copy link
Contributor Author

🤦 Jup makes total sense. I’ll pretty much build a “jobs” table to track them. Thank you for the reminder!
If I wanted to extract this into a loco plugin, how would I share the schema?

(I say this, yet at the moment I am super time-limited)

@jondot
Copy link
Contributor

jondot commented Dec 1, 2024

To provide the enhancement you originally wanted, it should be feasible.
if you look here https://github.com/loco-rs/loco/blob/master/src/bgworker/pg.rs you will see we already have such a thing as a "jobs" table (it exists in the sqlite provider too).
And if you look here: https://github.com/loco-rs/loco/blob/master/src/bgworker/mod.rs#L52 you will see that at least the SQL based providers do return a job id (which is eventually a String). I haven't looked but from my head I think also the Redis based provider has a concept of job id but dont know if it returns a Job Id.
So it would probably require:

  1. validate that all providers return an id
  2. "bubble up" the id to the main enqueue method
  3. make the method return the ID in the result
  4. adjust any breaking interface (I dont think there will be because we move from "nothing" to "something")
  5. done

NOTE: if the Redis provider cannot supply a job ID, then we can return a Result<Option>, and providers which do not support this return None

@anhnmt
Copy link
Contributor

anhnmt commented Dec 25, 2024

@jondot

Hello, I use workers to process my post-production videos and I have some problems when they suddenly shut down and when restarted, the workers with processing status do not continue to work, what should I do? so I can retry it.

I am using postgres queue

@kaplanelad
Copy link
Contributor

Hey @anhnmt,

You can use admin-jobs for managing admin workers, which also allows you to re-run jobs.

Regarding the "suddenly shut down" issue:

  1. Where are you running the worker? Have you checked the CPU and memory usage?
  2. Could you provide the logs?

@anhnmt
Copy link
Contributor

anhnmt commented Jan 8, 2025

@kaplanelad
For some reasons like having to shut down the program to deploy a new version or something, my data is not too important so I want to be able to update the status of the processes after the program runs again.

Below is the code I am using

pub struct ProcessingWorkerInitializer;

#[async_trait]
impl Initializer for ProcessingWorkerInitializer {
    fn name(&self) -> String {
        "processing-worker".to_string()
    }

    async fn before_run(&self, ctx: &AppContext) -> Result<()> {
        let pool = ctx.db.get_postgres_connection_pool();

        sqlx::raw_sql(
            r"
                update pg_loco_queue
                set status = 'queued'
                where status = 'processing';
            ",
        )
        .execute(pool)
        .await?;

        Ok(())
    }
}

@kaplanelad
Copy link
Contributor

go it.
We can add it as part of the jobs cli:

$ cargo loco jobs --help

Managing jobs queue

Usage: myapp-cli jobs [OPTIONS] <COMMAND>

Commands:
  cancel  Cancels jobs with the specified names, setting their status to `cancelled`
  tidy    Deletes jobs that are either completed or cancelled
  purge   Deletes jobs based on their age in days
  dump    Saves the details of all jobs to files in the specified folder
  import  Imports jobs from a file
  help    Print this message or the help of the given subcommand(s)

Options:
  -e, --environment <ENVIRONMENT>  Specify the environment [default: development]
  -h, --help                       Print help
  -V, --version                    Print version

What do you think?

@anhnmt
Copy link
Contributor

anhnmt commented Jan 9, 2025

$ cargo loco jobs --help

Managing jobs queue

Usage: myapp-cli jobs [OPTIONS] <COMMAND>

Commands:
  cancel  Cancels jobs with the specified names, setting their status to `cancelled`
  tidy    Deletes jobs that are either completed or cancelled
  purge   Deletes jobs based on their age in days
  dump    Saves the details of all jobs to files in the specified folder
  import  Imports jobs from a file
  help    Print this message or the help of the given subcommand(s)

Options:
  -e, --environment <ENVIRONMENT>  Specify the environment [default: development]
  -h, --help                       Print help
  -V, --version                    Print version

What do you think?

I think we should add the feature to update processing status to queued

@kaplanelad
Copy link
Contributor

I can add a new update command under cargo loco jobs, which will handle from_status and to_status.
Does that sound good?

@anhnmt
Copy link
Contributor

anhnmt commented Jan 9, 2025

That seems really cool, you should check the status in the enum to see if it is valid or not

@kaplanelad kaplanelad added this to the 0.14.1 milestone Jan 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants