Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PgVector embedder parameter is not accepting anything other than it's default OpenAI Embeddings. Help needed! #1746

Closed
Cipher-unhsiV opened this issue Jan 10, 2025 · 2 comments
Labels

Comments

@Cipher-unhsiV
Copy link

@manthanguptaa as per your instruction from the issue #1736 I tried several ways of using an opensource embedder but nothing actually worked. I have tried the following embeddings that are available in phidata docs:

  1. MistralAI
  2. Together
  3. Huggingface
  4. SentenceTransformers

I was going through multiple errors like sqlachemy dimensionality is not matching, httpx readtimeout, pydantic.core validation error, incompatible numpy version and a lot other errors just to mention some. It's just a simple agentic rag that should read a pdf through url via PDFUrlKnowledgeBase, store them in PgVector2 and answer a predefined user query by accessing the knowledge_base but getting really hectic and involving. Please do help me in this regard! I'll get you the snippet to better understand the scenario:

import typer
from phi.agent import Agent, RunResponse
from typing import Optional,List
from phi.assistant import Assistant
from phi.model.deepseek import DeepSeekChat
from phi.model.groq import Groq
from phi.storage.assistant.postgres import PgAssistantStorage
from phi.knowledge.pdf import PDFUrlKnowledgeBase
from phi.vectordb.pgvector import PgVector2
from phi.embedder.mistral import MistralEmbedder
from phi.embedder.huggingface import HuggingfaceCustomEmbedder
from phi.embedder.together import TogetherEmbedder
from phi.embedder.sentence_transformer import SentenceTransformerEmbedder

import os
from dotenv import load_dotenv
load_dotenv()
os.environ["GROQ_API_KEY"]=os.getenv("GROQ_API_KEY")

db_url = "postgresql+psycopg://ai:ai@localhost:5532/ai"

knowledge_base=PDFUrlKnowledgeBase(
    urls=['https://phi-public.s3.amazonaws.com/recipes/ThaiRecipes.pdf'],
    vector_db=PgVector2(
        collection="recipies",
        db_url=db_url, 
        embedder=SentenceTransformerEmbedder(dimensions=1536),  # issue here
        )
)

knowledge_base.load(recreate=True, upsert=True)
#knowledge_base.load()

storage=PgAssistantStorage(table_name="pdf-assistant",db_url=db_url)

agent = Agent(
    model=Groq(id="llama-3.3-70b-versatile"),
    #model = SentenceTransformer('all-mpnet-base-v2', truncate_dim=384),
    knowledge=knowledge_base,
    storage=storage,
)

response: RunResponse = agent.run("What is the recipe for chicken curry?")
res = response.content

@manthanguptaa KINDLY DON'T CLOSE THIS ISSUE UNTIL I ACKNOWLEDGE ABOUT THE STATUS OF IMPROVEMENT IN LOCAL

@dirkbrnd
Copy link
Contributor

Hi @Cipher-unhsiV
I suggest using PgAgentStorage instead of PgAssistantStorage (it is deprecated). Also use PgVector instead of PgVector2 (also deprecated).

I'll let @manthanguptaa test after that if thats ok

Copy link

This issue has been automatically marked as stale due to 14 days of inactivity and will now be closed.

@github-actions github-actions bot added the stale label Jan 29, 2025
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants