Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom RAG tool kept being called and as a result llm getting too many tokens error #1809

Open
Kkkassini opened this issue Jan 16, 2025 · 1 comment

Comments

@Kkkassini
Copy link

When using provided RAG everything looks good, ex:
`knowledge_base = PDFKnowledgeBase(
path="/xxx/",
vector_db=PgVector2(schema="xx", collection="xx", db_url=db_url, embedder=embedder)
)

model = OpenAIChat(
id="casperhansen/llama-3.3-70b-instruct-awq",
base_url="xxx",
api_key= "xxx",
model="openai/casperhansen/llama-3.3-70b-instruct-awq",
temperature=0.3,
)
knowledge_base.load(upsert=True)

storage = PgAssistantStorage(table_name="pdf_assistant", db_url=db_url)

def pdf_assistant(new: bool = False, user: str = "user"):
run_id: Optional[str] = None
if not new:
existing_run_ids: List[str] = storage.get_all_run_ids(user)
if len(existing_run_ids) > 0:
run_id = existing_run_ids[0]

agent = Agent(
    model=model,
    knowledge=knowledge_base,
    add_context=False,
    show_tool_calls=True,
    search_knowledge=True,
    search_mar_knowledge=False,
    markdown=False,
    )
agent.print_response("blabla", stream=True)`

When I ask a question that I know there's an answer in the knowledge base, it works. When I ask another question without an answer in the base, it gives answer like
The provided text does not contain information about the xxx. However, I can provide a general overview of the key principles of cloud computing.
During the process it called search_knowledge_base(query=xxx) just once.

But when I use a custom RAG tool like
`class CustomToolkit(Toolkit):
def init(self):
super().init(name="xxx")
self.register(self.custom_search_endpoint)

def adaptor(self, raw_t):
    return """'[\n  {\n"content": \"""" + raw_t + """\",]'"""

def custom_search_endpoint(self, query: str) -> str:
    headers = {
        'accept': 'application/json',
        'Content-Type': 'application/json',
    }

    json_data = {blabla}

    response = requests.post('http://xxx/test_endpoint', headers=headers, json=json_data)
    response_json = response.json()
    raw_t = response_json.get("context", "")
    return self.adaptor(raw_t=raw_t)

def test():
agent = Agent(model=model,
tools=[CustomToolkit()],
show_tool_calls=True,
add_context=False,
markdown=False)

agent.print_response("blabla", stream=True)`

custom_search_endpoint() returns pure text paragraph, but that seems not working so I use the adaptor to simulate the result in your function:
`def search_knowledge_base(self, query: str) -> str:
"""Use this function to search the knowledge base for information about a query.

    Args:
        query: The query to search for.

    Returns:
        str: A string containing the response from the knowledge base.
    """

    # Get the relevant documents from the knowledge base
    retrieval_timer = Timer()
    retrieval_timer.start()
    docs_from_knowledge = self.get_relevant_docs_from_knowledge(query=query)
    if docs_from_knowledge is not None:
        references = MessageReferences(
            query=query, references=docs_from_knowledge, time=round(retrieval_timer.elapsed, 4)
        )
        # Add the references to the run_response
        if self.run_response.extra_data is None:
            self.run_response.extra_data = RunResponseExtraData()
        if self.run_response.extra_data.references is None:
            self.run_response.extra_data.references = []
        self.run_response.extra_data.references.append(references)
    retrieval_timer.stop()
    logger.debug(f"Time to get references: {retrieval_timer.elapsed:.4f}s")

    if docs_from_knowledge is None:
        return "No documents found"
    return self.convert_documents_to_string(docs_from_knowledge)`

By launching typer.run(test), if the question is related, I can get a normal answer. however if the question is less related, it keeps calling the custom_search_endpoint for several times, each time with a question slightly different (but similar and duplicates in some cases) and process ends up by too many tokens sent to llm. Shouldn't it return an answer like
The provided text does not contain information about the xxx. However, I can provide a general overview of the key principles of xxx. instead of keep calling the tool?

@Kkkassini Kkkassini changed the title Custom RAG tool kept being called and ended up getting too many tokens error Custom RAG tool kept being called and as a result llm getting too many tokens error Jan 16, 2025
@touful
Copy link

touful commented Jan 18, 2025

yes ,I found the same issues . It will continue to call RAG functions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants