-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🔖 Issues Auto-Labeller #2542
base: main
Are you sure you want to change the base?
🔖 Issues Auto-Labeller #2542
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Fun feature! Do you have a demo repo? |
Have you tried with the HF api? It could be a free alternative |
Just pushed it to my own fork |
I'll open a batch of issues to test it |
Honestly, this was really effortless since I simply forked a mostly functional actions extension. Modifying it to work with the HF API will require much more effort. also it uses GPT-4o, there aren't many open-source models that are this accurate. If it's absolutely necessary, then I can do it, but I honestly don't think it's worth the effort. However, if you believe it is important, then I'll go ahead and do it. |
It doesn't seem like a big deal to me. Probably something like this could work from huggingface_hub import InferenceClient
client = InferenceClient(model="meta-llama/Llama-3.2-1B-Instruct", token="your_token")
content = "Find the label among these: question, issue."
completion = client.chat_completion(messages=[{"role": "user", "content": content}], max_tokens=256)
response = completion.choices[0].message.content
This task is very simple, I don't think we absolutely need GPT-4o here. And even if the labeled fail, it's not a big deal. |
ok got it |
Do you know if you can access the tag description? It could help the model in its prediction |
tag description as in the label description? If so, yes, it is part of the prompt. |
I tried using the Llama 1B model, and it "functioned," but for the TRL, I switched to the 70B model. However, I couldn't test it with the 70B because it requires a subscription. Don't forget to add the I got a context length error (limit of 4096 tokens) when using the Llama 1B model, which was weird since it supports up to 128k tokens. Since I can't use the 70B model, I'm unsure if it's a problem or not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Just commit the suggested change please
Co-authored-by: Quentin Gallouédec <[email protected]>
This can be problematic when dealing with issues that require a long context. The exact error message received was: |
A bit hacky but you can take the 15000 first strings. It should be enough for most issues: content = content[:15000] |
more like 4000 |
I was playing around with OpenAI API and was thinking about useful things we could do with the repos and all the data like issues and feedbacks and trying to automate some tasks or just do something useful with them like analysis or reports and I thought of an auto labeller.
This auto labeller uses OpenAI API, so it needs an
OPENAI_API_KEY
secret. It will cost a few bucks per month, but there are also string-based and regex-based labellers that may be faster.If you are interested then we could test and see how consistent it is.
couple of ideas for improvement :
we could one-shot or few-shot prompt it for more accuracy or use different kinds of models both of which can cost more.
as for the speed, it takes about 30 seconds to label an issue.
any other ideas on anything useful we could do with OpenAI API or other stuff to either automate or analyze and get some value?