Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client.trainings.async_create() is broken #408

Open
houmie opened this issue Jan 25, 2025 · 1 comment · May be fixed by #409
Open

client.trainings.async_create() is broken #408

houmie opened this issue Jan 25, 2025 · 1 comment · May be fixed by #409
Assignees

Comments

@houmie
Copy link

houmie commented Jan 25, 2025

As the README states, AsyncIO support works fine by prepending async_ to the method name.

I already do this across many functions, and it works perfectly. However, I discovered a bug when creating training asynchronously. It always fails with the error:

ReplicateError(type=None, title=None, status=404, detail='The requested resource could not be found.', instance=None)

See below:

            training = await self.client.trainings.async_create(
                model=model,
                destination=f"{model.owner}/{model.name}",
                version="ostris/flux-dev-lora-trainer:e440909d3512c31646ee2e0c7d6f6f4923224863a6a10c494606e79fb5844497",
                input={
                    "steps": 1000,
                    "batch_size": 1,
                    "autocaption": True,
                    "input_images": input_images_url, 
                    "trigger_word": internal_trigger_word,
                },
                webhook=webhook_url,
                webhook_events_filter=["completed"],
            )
            return training.id

As you can see, it is correct, but it’s somewhat odd as it requires the model to be passed along with the destination. This might create confusion internally. If you use this in sync mode instead, you’ll notice that the signature is different. As you can see, no model is passed here.

            training = self.client.trainings.create(
                version="ostris/flux-dev-lora-trainer:e440909d3512c31646ee2e0c7d6f6f4923224863a6a10c494606e79fb5844497",
                input={
                    "steps": 1000,
                    "batch_size": 1,
                    "autocaption": True,
                    "input_images": input_images_url,
                    "trigger_word": internal_trigger_word,
                },
                destination=f"houmie/{model_name}",
                webhook=webhook_url,
                webhook_events_filter=["completed"],
            )
            return training.id

The async approach needs to be investigated as it’s currently broken. It’s a shame since I’m using everything else asynchronously, but this one function has to remain synchronous to work.

Thank you

@meatballhat
Copy link

@houmie Thank you for the report! You're absolutely right. The sync and async versions are too divergent and the async version is clearly not doing the right thing.

@meatballhat meatballhat self-assigned this Feb 5, 2025
meatballhat added a commit that referenced this issue Feb 5, 2025
to better align with `Trainings.create` and the way that the arguments
are being used.

Closes #408
@meatballhat meatballhat linked a pull request Feb 5, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants