-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Milvus vector db #244
Milvus vector db #244
Conversation
Make Milvus an optional dependency, expand docs with Milvus information Chore
…milvus gh action Resolved if statement resolution issue regrading api key, Added vector db config to milvus test, Added milvus gh action Fix
Rewrite batch search to work as async gather Fix
Feature: Integrate Milvus as a Vector Database Provider
WalkthroughThe changes in this pull request include updates to several files to integrate support for the Milvus vector database within the Cognee framework. Key modifications involve the addition of a new GitHub Actions workflow for testing, enhancements to installation instructions in the README, and the introduction of the Changes
Possibly related PRs
Suggested reviewers
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🔇 Additional comments (3)README.md (3)
The installation command follows the consistent format used for other database providers.
The poetry command follows the consistent format used for other database providers.
The Milvus entry is correctly added with appropriate type and status indicators, maintaining consistency with other stable database implementations. Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (4)
cognee/infrastructure/databases/vector/create_vector_engine.py (2)
9-10
: Correct misspelling of 'required' in error messagesThe word "required" is misspelled as "requred" in several error messages. This may cause confusion when these errors are encountered.
Apply this diff to fix the typos:
- raise EnvironmentError("Missing requred Weaviate credentials!") + raise EnvironmentError("Missing required Weaviate credentials!") ... - raise EnvironmentError("Missing requred Qdrant credentials!") + raise EnvironmentError("Missing required Qdrant credentials!") ... - raise EnvironmentError("Missing requred pgvector credentials!") + raise EnvironmentError("Missing required pgvector credentials!") ... - raise EnvironmentError("Missing requred FalkorDB credentials!") + raise EnvironmentError("Missing required FalkorDB credentials!")Also applies to: 17-18, 30-31, 41-42
3-7
: Use appropriate base class forVectorConfig
The
VectorConfig
class currently inherits fromDict
, which is not intended for subclassing in this context. If you need a dictionary with defined keys, consider usingtyping.TypedDict
or use adataclass
for better type safety and clarity.Consider changing the class definition to one of the following:
Option 1: Use
TypedDict
from typing import TypedDict class VectorConfig(TypedDict): vector_db_url: str vector_db_port: str vector_db_key: str vector_db_provider: strOption 2: Use
dataclass
from dataclasses import dataclass @dataclass class VectorConfig: vector_db_url: str vector_db_port: str vector_db_key: str vector_db_provider: strUsing
TypedDict
ordataclass
provides better type checking and improves code readability.cognee/tests/test_milvus.py (1)
35-41
: Avoid unintended indentation in multi-line stringThe multi-line string assigned to
text
is indented, which includes leading whitespace in each line. This may affect text processing or embedding results, as the extra spaces become part of the string.Consider dedenting the string to remove unnecessary whitespace:
import textwrap text = textwrap.dedent("""\ A quantum computer is a computer that takes advantage of quantum mechanical phenomena. At small scales, physical matter exhibits properties of both particles and waves, and quantum computing leverages this behavior, specifically quantum superposition and entanglement, using specialized hardware that supports the preparation and manipulation of quantum states. Classical physics cannot explain the operation of these quantum devices, and a scalable quantum computer could perform some calculations exponentially faster (with respect to input size scaling) than any modern "classical" computer. In particular, a large-scale quantum computer could break widely used encryption schemes and aid physicists in performing physical simulations; however, the current state of the technology is largely experimental and impractical, with several obstacles to useful applications. Moreover, scalable quantum computers do not hold promise for many practical tasks, and for many important tasks quantum speedups are proven impossible. The basic unit of information in quantum computing is the qubit, similar to the bit in traditional digital electronics. Unlike a classical bit, a qubit can exist in a superposition of its two "basis" states. When measuring a qubit, the result is a probabilistic output of a classical bit, therefore making quantum computers nondeterministic in general. If a quantum computer manipulates the qubit in a particular way, wave interference effects can amplify the desired measurement results. The design of quantum algorithms involves creating procedures that allow a quantum computer to perform calculations efficiently and quickly. Physically engineering high-quality qubits has proven challenging. If a physical qubit is not sufficiently isolated from its environment, it suffers from quantum decoherence, introducing noise into calculations. Paradoxically, perfectly isolating qubits is also undesirable because quantum computations typically need to initialize qubits, perform controlled qubit interactions, and measure the resulting quantum states. Each of those operations introduces errors and suffers from noise, and such inaccuracies accumulate. In principle, a non-quantum (classical) computer can solve the same computational problems as a quantum computer, given enough time. Quantum advantage comes in the form of time complexity rather than computability, and quantum complexity theory shows that some quantum algorithms for carefully selected tasks require exponentially fewer computational steps than the best known non-quantum algorithms. Such tasks can in theory be solved on a large-scale quantum computer whereas classical computers would not finish computations in any reasonable amount of time. However, quantum speedup is not universal or even typical across computational tasks, since basic tasks such as sorting are proven to not allow any asymptotic quantum speedup. Claims of quantum supremacy have drawn significant attention to the discipline, but are demonstrated on contrived tasks, while near-term practical use cases remain limited. """)Alternatively, you can adjust the indentation:
- text = """A quantum computer is a computer that takes advantage of quantum mechanical phenomena. - At small scales, physical matter exhibits properties of both particles and waves, and quantum computing leverages this behavior, specifically quantum superposition and entanglement, using specialized hardware that supports the preparation and manipulation of quantum states. - Classical physics cannot explain the operation of these quantum devices, and a scalable quantum computer could perform some calculations exponentially faster (with respect to input size scaling) than any modern "classical" computer. In particular, a large-scale quantum computer could break widely used encryption schemes and aid physicists in performing physical simulations; however, the current state of the technology is largely experimental and impractical, with several obstacles to useful applications. Moreover, scalable quantum computers do not hold promise for many practical tasks, and for many important tasks quantum speedups are proven impossible. - The basic unit of information in quantum computing is the qubit, similar to the bit in traditional digital electronics. Unlike a classical bit, a qubit can exist in a superposition of its two "basis" states. When measuring a qubit, the result is a probabilistic output of a classical bit, therefore making quantum computers nondeterministic in general. If a quantum computer manipulates the qubit in a particular way, wave interference effects can amplify the desired measurement results. The design of quantum algorithms involves creating procedures that allow a quantum computer to perform calculations efficiently and quickly. - Physically engineering high-quality qubits has proven challenging. If a physical qubit is not sufficiently isolated from its environment, it suffers from quantum decoherence, introducing noise into calculations. Paradoxically, perfectly isolating qubits is also undesirable because quantum computations typically need to initialize qubits, perform controlled qubit interactions, and measure the resulting quantum states. Each of those operations introduces errors and suffers from noise, and such inaccuracies accumulate. - In principle, a non-quantum (classical) computer can solve the same computational problems as a quantum computer, given enough time. Quantum advantage comes in the form of time complexity rather than computability, and quantum complexity theory shows that some quantum algorithms for carefully selected tasks require exponentially fewer computational steps than the best known non-quantum algorithms. Such tasks can in theory be solved on a large-scale quantum computer whereas classical computers would not finish computations in any reasonable amount of time. However, quantum speedup is not universal or even typical across computational tasks, since basic tasks such as sorting are proven to not allow any asymptotic quantum speedup. Claims of quantum supremacy have drawn significant attention to the discipline, but are demonstrated on contrived tasks, while near-term practical use cases remain limited. - """ + text = """A quantum computer is a computer that takes advantage of quantum mechanical phenomena. +At small scales, physical matter exhibits properties of both particles and waves, and quantum computing leverages this behavior, specifically quantum superposition and entanglement, using specialized hardware that supports the preparation and manipulation of quantum states. +Classical physics cannot explain the operation of these quantum devices, and a scalable quantum computer could perform some calculations exponentially faster (with respect to input size scaling) than any modern "classical" computer. In particular, a large-scale quantum computer could break widely used encryption schemes and aid physicists in performing physical simulations; however, the current state of the technology is largely experimental and impractical, with several obstacles to useful applications. Moreover, scalable quantum computers do not hold promise for many practical tasks, and for many important tasks quantum speedups are proven impossible. +The basic unit of information in quantum computing is the qubit, similar to the bit in traditional digital electronics. Unlike a classical bit, a qubit can exist in a superposition of its two "basis" states. When measuring a qubit, the result is a probabilistic output of a classical bit, therefore making quantum computers nondeterministic in general. If a quantum computer manipulates the qubit in a particular way, wave interference effects can amplify the desired measurement results. The design of quantum algorithms involves creating procedures that allow a quantum computer to perform calculations efficiently and quickly. +Physically engineering high-quality qubits has proven challenging. If a physical qubit is not sufficiently isolated from its environment, it suffers from quantum decoherence, introducing noise into calculations. Paradoxically, perfectly isolating qubits is also undesirable because quantum computations typically need to initialize qubits, perform controlled qubit interactions, and measure the resulting quantum states. Each of those operations introduces errors and suffers from noise, and such inaccuracies accumulate. +In principle, a non-quantum (classical) computer can solve the same computational problems as a quantum computer, given enough time. Quantum advantage comes in the form of time complexity rather than computability, and quantum complexity theory shows that some quantum algorithms for carefully selected tasks require exponentially fewer computational steps than the best known non-quantum algorithms. Such tasks can in theory be solved on a large-scale quantum computer whereas classical computers would not finish computations in any reasonable amount of time. However, quantum speedup is not universal or even typical across computational tasks, since basic tasks such as sorting are proven to not allow any asymptotic quantum speedup. Claims of quantum supremacy have drawn significant attention to the discipline, but are demonstrated on contrived tasks, while near-term practical use cases remain limited. +"""This ensures that the text content is processed correctly without unintended leading spaces.
README.md (1)
Line range hint
1-400
: Documentation needs to be updated to fully reflect Milvus integration.Please update the following sections to include Milvus:
- In the "Vector Stores" section under "Vector retrieval, Graphs and LLMs", add Milvus to the list:
- **Vector Stores**: Cognee supports LanceDB, Qdrant, PGVector and Weaviate for vector storage. + **Vector Stores**: Cognee supports LanceDB, Qdrant, PGVector, Weaviate, and Milvus for vector storage.
- Add Milvus to the Vector & Graph Databases Implementation State table:
| Name | Type | Current state | Known Issues | |------------------|--------------------|-------------------|---------------------------------------| | Qdrant | Vector | Stable ✅ | | | Weaviate | Vector | Stable ✅ | | | LanceDB | Vector | Stable ✅ | | +| Milvus | Vector | Stable ✅ | | | Neo4j | Graph | Stable ✅ | |
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (1)
-
poetry.lock
is excluded by!**/*.lock
📒 Files selected for processing (8)
-
.env.template
(1 hunks) -
.github/workflows/test_milvus.yml
(1 hunks) -
README.md
(2 hunks) -
cognee/infrastructure/databases/vector/create_vector_engine.py
(4 hunks) -
cognee/infrastructure/databases/vector/milvus/MilvusAdapter.py
(1 hunks) -
cognee/infrastructure/databases/vector/milvus/__init__.py
(1 hunks) -
cognee/tests/test_milvus.py
(1 hunks) -
pyproject.toml
(2 hunks)
✅ Files skipped from review due to trivial changes (2)
- cognee/infrastructure/databases/vector/milvus/init.py
- .env.template
🧰 Additional context used
🪛 actionlint (1.7.4)
.github/workflows/test_milvus.yml
26-26: if: condition "needs.get_docs_changes.outputs.changes_outside_docs == 'true' && ${{ github.event.label.name == 'run-checks' }}" is always evaluated to true because extra characters are around ${{ }}
(if-cond)
🪛 Ruff (0.8.0)
cognee/infrastructure/databases/vector/milvus/MilvusAdapter.py
35-35: Undefined name MilvusClient
(F821)
🔇 Additional comments (4)
cognee/infrastructure/databases/vector/milvus/MilvusAdapter.py (1)
159-174
: Handle exceptions when retrieving data points
In the retrieve
method, if the client.query
operation does not return any results, the method may not handle the empty response appropriately, potentially leading to errors downstream.
Please verify that the method handles empty query results without causing exceptions.
This script searches for the retrieve
method across the codebase to review its implementation and usage, focusing on error handling when no results are found.
README.md (1)
56-61
: LGTM: Installation instructions for Milvus are clear and consistent.
The new installation instructions for Milvus follow the established pattern and provide clear guidance for both pip and poetry users.
Also applies to: 92-97
pyproject.toml (2)
88-88
: LGTM! The milvus extras group is well-structured.
The addition of the milvus extras group follows the established pattern in the project for optional database providers.
73-73
: Verify pymilvus version compatibility
The version constraint ^2.5.0
allows updates to minor versions. Let's verify this version for compatibility and security.
✅ Verification successful
Version constraint ^2.5.0
is appropriate and secure
The version constraint is well-chosen because:
- 2.5.0 is the latest stable release (released on Nov 26, 2024)
- No known security vulnerabilities are reported
- The caret (
^
) allows minor version updates while maintaining compatibility - The version follows the same pattern as other vector DB dependencies in the project
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check PyPI for latest versions and release dates
curl -s https://pypi.org/pypi/pymilvus/json | jq -r '.releases | keys_unsorted[-5:] as $versions | $versions[] as $v | "\($v): \(.[$v][0].upload_time)"'
# Check for security advisories
gh api graphql -f query='
{
securityVulnerabilities(first: 5, ecosystem: PIP, package: "pymilvus") {
nodes {
advisory {
summary
severity
publishedAt
}
vulnerableVersionRange
firstPatchedVersion {
identifier
}
}
}
}'
Length of output: 631
async def has_collection(self, collection_name: str) -> bool: | ||
future = asyncio.Future() | ||
client = self.get_milvus_client() | ||
future.set_result(client.has_collection(collection_name=collection_name)) | ||
|
||
return await future |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Simplify asynchronous method that wraps synchronous call
In the has_collection
method, using asyncio.Future
to wrap a synchronous operation adds unnecessary complexity and does not provide actual asynchronous benefits.
Consider refactoring the method to either make it synchronous or properly handle the synchronous call without unnecessary wrapping:
- async def has_collection(self, collection_name: str) -> bool:
- future = asyncio.Future()
- client = self.get_milvus_client()
- future.set_result(client.has_collection(collection_name=collection_name))
- return await future
+ def has_collection(self, collection_name: str) -> bool:
+ client = self.get_milvus_client()
+ return client.has_collection(collection_name=collection_name)
If you need to keep the method asynchronous to conform to the interface, you can run the synchronous operation in an executor to avoid blocking the event loop:
async def has_collection(self, collection_name: str) -> bool:
client = self.get_milvus_client()
loop = asyncio.get_event_loop()
return await loop.run_in_executor(None, client.has_collection, collection_name=collection_name)
Update README.md with state of stable databases Docs
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Documentation
.env.template
to include "milvus" as an option for theVECTOR_DB_PROVIDER
.Tests