Custom RAG tool kept being called and as a result llm getting too many tokens error #1809

Kkkassini · 2025-01-16T17:42:38Z

When using provided RAG everything looks good, ex:
`knowledge_base = PDFKnowledgeBase(
path="/xxx/",
vector_db=PgVector2(schema="xx", collection="xx", db_url=db_url, embedder=embedder)
)

model = OpenAIChat(
id="casperhansen/llama-3.3-70b-instruct-awq",
base_url="xxx",
api_key= "xxx",
model="openai/casperhansen/llama-3.3-70b-instruct-awq",
temperature=0.3,
)
knowledge_base.load(upsert=True)

storage = PgAssistantStorage(table_name="pdf_assistant", db_url=db_url)

def pdf_assistant(new: bool = False, user: str = "user"):
run_id: Optional[str] = None
if not new:
existing_run_ids: List[str] = storage.get_all_run_ids(user)
if len(existing_run_ids) > 0:
run_id = existing_run_ids[0]

agent = Agent(
    model=model,
    knowledge=knowledge_base,
    add_context=False,
    show_tool_calls=True,
    search_knowledge=True,
    search_mar_knowledge=False,
    markdown=False,
    )
agent.print_response("blabla", stream=True)`

When I ask a question that I know there's an answer in the knowledge base, it works. When I ask another question without an answer in the base, it gives answer like
The provided text does not contain information about the xxx. However, I can provide a general overview of the key principles of cloud computing.
During the process it called search_knowledge_base(query=xxx) just once.

But when I use a custom RAG tool like
`class CustomToolkit(Toolkit):
def init(self):
super().init(name="xxx")
self.register(self.custom_search_endpoint)

def adaptor(self, raw_t):
    return """'[\n  {\n"content": \"""" + raw_t + """\",]'"""

def custom_search_endpoint(self, query: str) -> str:
    headers = {
        'accept': 'application/json',
        'Content-Type': 'application/json',
    }

    json_data = {blabla}

    response = requests.post('http://xxx/test_endpoint', headers=headers, json=json_data)
    response_json = response.json()
    raw_t = response_json.get("context", "")
    return self.adaptor(raw_t=raw_t)

def test():
agent = Agent(model=model,
tools=[CustomToolkit()],
show_tool_calls=True,
add_context=False,
markdown=False)

agent.print_response("blabla", stream=True)`

custom_search_endpoint() returns pure text paragraph, but that seems not working so I use the adaptor to simulate the result in your function:
`def search_knowledge_base(self, query: str) -> str:
"""Use this function to search the knowledge base for information about a query.

    Args:
        query: The query to search for.

    Returns:
        str: A string containing the response from the knowledge base.
    """

    # Get the relevant documents from the knowledge base
    retrieval_timer = Timer()
    retrieval_timer.start()
    docs_from_knowledge = self.get_relevant_docs_from_knowledge(query=query)
    if docs_from_knowledge is not None:
        references = MessageReferences(
            query=query, references=docs_from_knowledge, time=round(retrieval_timer.elapsed, 4)
        )
        # Add the references to the run_response
        if self.run_response.extra_data is None:
            self.run_response.extra_data = RunResponseExtraData()
        if self.run_response.extra_data.references is None:
            self.run_response.extra_data.references = []
        self.run_response.extra_data.references.append(references)
    retrieval_timer.stop()
    logger.debug(f"Time to get references: {retrieval_timer.elapsed:.4f}s")

    if docs_from_knowledge is None:
        return "No documents found"
    return self.convert_documents_to_string(docs_from_knowledge)`

By launching typer.run(test), if the question is related, I can get a normal answer. however if the question is less related, it keeps calling the custom_search_endpoint for several times, each time with a question slightly different (but similar and duplicates in some cases) and process ends up by too many tokens sent to llm. Shouldn't it return an answer like
The provided text does not contain information about the xxx. However, I can provide a general overview of the key principles of xxx. instead of keep calling the tool?

The text was updated successfully, but these errors were encountered:

touful · 2025-01-18T01:59:14Z

yes ，I found the same issues . It will continue to call RAG functions

Kkkassini changed the title ~~Custom RAG tool kept being called and ended up getting too many tokens error~~ Custom RAG tool kept being called and as a result llm getting too many tokens error Jan 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom RAG tool kept being called and as a result llm getting too many tokens error #1809

Custom RAG tool kept being called and as a result llm getting too many tokens error #1809

Kkkassini commented Jan 16, 2025

touful commented Jan 18, 2025

Custom RAG tool kept being called and as a result llm getting too many tokens error #1809

Custom RAG tool kept being called and as a result llm getting too many tokens error #1809

Comments

Kkkassini commented Jan 16, 2025

touful commented Jan 18, 2025