🔖 Issues Auto-Labeller #2542

August-murr · 2025-01-04T16:39:09Z

I was playing around with OpenAI API and was thinking about useful things we could do with the repos and all the data like issues and feedbacks and trying to automate some tasks or just do something useful with them like analysis or reports and I thought of an auto labeller.

This auto labeller uses OpenAI API, so it needs an OPENAI_API_KEY secret. It will cost a few bucks per month, but there are also string-based and regex-based labellers that may be faster.

If you are interested then we could test and see how consistent it is.

couple of ideas for improvement :

we could one-shot or few-shot prompt it for more accuracy or use different kinds of models both of which can cost more.

as for the speed, it takes about 30 seconds to label an issue.

any other ideas on anything useful we could do with OpenAI API or other stuff to either automate or analyze and get some value?

HuggingFaceDocBuilderDev · 2025-01-04T16:42:49Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

qgallouedec · 2025-01-04T18:42:36Z

Fun feature! Do you have a demo repo?

qgallouedec · 2025-01-04T18:44:19Z

Have you tried with the HF api? It could be a free alternative

August-murr · 2025-01-04T19:22:33Z

Fun feature! Do you have a demo repo?

Just pushed it to my own fork

qgallouedec · 2025-01-06T14:19:45Z

I'll open a batch of issues to test it

August-murr · 2025-01-06T17:59:38Z

Have you tried with the HF api? It could be a free alternative

Honestly, this was really effortless since I simply forked a mostly functional actions extension. Modifying it to work with the HF API will require much more effort. also it uses GPT-4o, there aren't many open-source models that are this accurate.

If it's absolutely necessary, then I can do it, but I honestly don't think it's worth the effort.

However, if you believe it is important, then I'll go ahead and do it.

qgallouedec · 2025-01-06T19:05:17Z

It doesn't seem like a big deal to me. Probably something like this could work

from huggingface_hub import InferenceClient

client = InferenceClient(model="meta-llama/Llama-3.2-1B-Instruct", token="your_token")
content = "Find the label among these: question, issue."
completion = client.chat_completion(messages=[{"role": "user", "content": content}], max_tokens=256)
response = completion.choices[0].message.content

there aren't many open-source models that are this accurate.

This task is very simple, I don't think we absolutely need GPT-4o here. And even if the labeled fail, it's not a big deal.

August-murr · 2025-01-06T19:37:15Z

It doesn't seem like a big deal to me. Probably something like this could work
from huggingface_hub import InferenceClient

client = InferenceClient(model="meta-llama/Llama-3.2-1B-Instruct", token="your_token")
content = "Find the label among these: question, issue."
completion = client.chat_completion(messages=[{"role": "user", "content": content}], max_tokens=256)
response = completion.choices[0].message.content
there aren't many open-source models that are this accurate.

This task is very simple, I don't think we absolutely need GPT-4o here. And even if the labeled fail, it's not a big deal.

ok got it

qgallouedec · 2025-01-06T20:24:34Z

Do you know if you can access the tag description? It could help the model in its prediction

August-murr · 2025-01-07T05:13:37Z

Do you know if you can access the tag description? It could help the model in its prediction

tag description as in the label description?
like:
🚀 deepspeed --> Related to deepspeed

If so, yes, it is part of the prompt.

August-murr · 2025-01-07T07:14:14Z

I tried using the Llama 1B model, and it "functioned," but for the TRL, I switched to the 70B model. However, I couldn't test it with the 70B because it requires a subscription.

Don't forget to add the HF_API_KEY to the secrets.

I got a context length error (limit of 4096 tokens) when using the Llama 1B model, which was weird since it supports up to 128k tokens. Since I can't use the 70B model, I'm unsure if it's a problem or not.

.github/workflows/issue_auto_labeller.yml

qgallouedec

Nice! Just commit the suggested change please

Co-authored-by: Quentin Gallouédec <[email protected]>

August-murr · 2025-01-12T20:23:08Z

I got a context length error (limit of 4096 tokens) when using the Llama 1B model, which was weird since it supports up to 128k tokens. Since I can't use the 70B model, I'm unsure if it's a problem or not.

This can be problematic when dealing with issues that require a long context. The exact error message received was:
Input validation error: inputs tokens + max_new_tokens must be <= 4096. Given: 9223 inputs tokens and 50 max_new_tokens
I couldn't find a solution or parameter to set, possibly from the inference endpoint.

qgallouedec · 2025-01-12T20:47:28Z

A bit hacky but you can take the 15000 first strings. It should be enough for most issues:

content = content[:15000]

August-murr · 2025-01-12T21:02:50Z

A bit hacky but you can take the 15000 first strings. It should be enough for most issues:
content = content[:15000]

more like 4000
But it works well.

Initial commit for auto labeller

e04efe1

August-murr requested a review from qgallouedec January 4, 2025 16:39

August-murr self-assigned this Jan 4, 2025

Using HF instead of openai

992dcd3

August-murr marked this pull request as ready for review January 12, 2025 10:35

qgallouedec reviewed Jan 12, 2025

View reviewed changes

.github/workflows/issue_auto_labeller.yml Outdated Show resolved Hide resolved

qgallouedec approved these changes Jan 12, 2025

View reviewed changes

qgallouedec changed the title ~~Issues Auto-Labeller~~ 🔖 Issues Auto-Labeller Jan 12, 2025

secrets name change

3c7d44b

Co-authored-by: Quentin Gallouédec <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🔖 Issues Auto-Labeller #2542

🔖 Issues Auto-Labeller #2542

August-murr commented Jan 4, 2025

HuggingFaceDocBuilderDev commented Jan 4, 2025

qgallouedec commented Jan 4, 2025

qgallouedec commented Jan 4, 2025

August-murr commented Jan 4, 2025

qgallouedec commented Jan 6, 2025

August-murr commented Jan 6, 2025

qgallouedec commented Jan 6, 2025

August-murr commented Jan 6, 2025 •

edited

Loading

qgallouedec commented Jan 6, 2025

August-murr commented Jan 7, 2025

August-murr commented Jan 7, 2025

qgallouedec left a comment

August-murr commented Jan 12, 2025

qgallouedec commented Jan 12, 2025 •

edited

Loading

August-murr commented Jan 12, 2025

🔖 Issues Auto-Labeller #2542

Are you sure you want to change the base?

🔖 Issues Auto-Labeller #2542

Conversation

August-murr commented Jan 4, 2025

HuggingFaceDocBuilderDev commented Jan 4, 2025

qgallouedec commented Jan 4, 2025

qgallouedec commented Jan 4, 2025

August-murr commented Jan 4, 2025

qgallouedec commented Jan 6, 2025

August-murr commented Jan 6, 2025

qgallouedec commented Jan 6, 2025

August-murr commented Jan 6, 2025 • edited Loading

qgallouedec commented Jan 6, 2025

August-murr commented Jan 7, 2025

August-murr commented Jan 7, 2025

qgallouedec left a comment

Choose a reason for hiding this comment

August-murr commented Jan 12, 2025

qgallouedec commented Jan 12, 2025 • edited Loading

August-murr commented Jan 12, 2025

August-murr commented Jan 6, 2025 •

edited

Loading

qgallouedec commented Jan 12, 2025 •

edited

Loading