Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenSearch custom_query without filters #1268

Open
bakerada opened this issue Dec 30, 2024 · 0 comments
Open

OpenSearch custom_query without filters #1268

bakerada opened this issue Dec 30, 2024 · 0 comments
Labels
bug Something isn't working P2

Comments

@bakerada
Copy link

bakerada commented Dec 30, 2024

Describe the bug
Currently when trying to use a custom_query for the OpenSearchEmbeddingRetriever it errors out with default filters. Standard use allows optional filters and the documentation notes that the filters are optional with custom_query. However it looks like filters are required and will error out if {} or None is provided to OpenSearchEmbeddingRetriever.run()

Bug appears to be here : https://github.com/deepset-ai/haystack-core-integrations/blob/main/integrations/opensearch/src/haystack_integrations/document_stores/opensearch/document_store.py#L499

if isinstance(custom_query, dict):
  body = self._render_custom_query(
      custom_query, {"$query_embedding": query_embedding, "$filters": normalize_filters(filters)}
  )

The call of normalize_filters errors out here if default filters are provided

FilterError: 'operator' key missing in {}

Probably a better way to fix this, but I think something like this would make sense?

if isinstance(custom_query, dict):
  query_placeholders = {"$query_embedding": query_embedding}
  if filters:
      query_placeholders["$filters"] = normalize_filters(filters)
  body = self._render_custom_query(
      custom_query, query_placeholders
  )

To Reproduce
Here's a sample piece of code I ran

from haystack_integrations.document_stores.opensearch import OpenSearchDocumentStore
from opensearchpy import RequestsHttpConnection
from haystack_integrations.components.retrievers.opensearch  import OpenSearchEmbeddingRetriever

document_store = OpenSearchDocumentStore(
        hosts=[opensearch_endpoint],
        http_auth=awsauth,
        use_ssl=True,
        verify_certs=True,
        connection_class=RequestsHttpConnection,
        index=opensearch_index,
        create_index=False,
        embedding_dim=embedding_dim,
        embedding_field='embedding'
    )

retriever = OpenSearchEmbeddingRetriever(document_store=document_store)


custom_query = {
    "query": {
        "bool": {
            "must": [
                {
                    "knn": {
                        "embedding": {
                            "vector": "$query_embedding",
                            "k": 10000,
                        }
                    }
                }
            ]
        }
    },
    "collapse": {
        "field": "name"
    }
}

embedding = np.random.random(1536).tolist()

documents = retriever.run(
    query_embedding=embedding,
    filters={},
    top_k=10,
    custom_query=custom_query
)
image

Works as expected when I add in the $filters parameter and valid filters.

Describe your environment (please complete the following information):

  • OS: macOS Sonoma 14.5
  • Haystack version: 2.7.0
  • Integration version: opensearch-haystack 1.1.0
@bakerada bakerada added the bug Something isn't working label Dec 30, 2024
@julian-risch julian-risch added the P2 label Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P2
Projects
None yet
Development

No branches or pull requests

2 participants