Skip to content

Commit

Permalink
Merge branch 'main' into fix-dlt-for-metadata
Browse files Browse the repository at this point in the history
  • Loading branch information
dexters1 authored Dec 4, 2024
2 parents ceebcdb + 4678aae commit c505ee5
Show file tree
Hide file tree
Showing 25 changed files with 1,263 additions and 775 deletions.
2 changes: 1 addition & 1 deletion .env.template
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ GRAPH_DATABASE_URL=
GRAPH_DATABASE_USERNAME=
GRAPH_DATABASE_PASSWORD=

# "qdrant", "pgvector", "weaviate" or "lancedb"
# "qdrant", "pgvector", "weaviate", "milvus" or "lancedb"
VECTOR_DB_PROVIDER="lancedb"
# Not needed if using "lancedb" or "pgvector"
VECTOR_DB_URL=
Expand Down
64 changes: 64 additions & 0 deletions .github/workflows/test_milvus.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
name: test | milvus

on:
workflow_dispatch:
pull_request:
branches:
- main
types: [labeled, synchronize]

concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true

env:
RUNTIME__LOG_LEVEL: ERROR
ENV: 'dev'

jobs:
get_docs_changes:
name: docs changes
uses: ./.github/workflows/get_docs_changes.yml

run_milvus:
name: test
needs: get_docs_changes
if: needs.get_docs_changes.outputs.changes_outside_docs == 'true' && ${{ github.event.label.name == 'run-checks' }}
runs-on: ubuntu-latest
strategy:
fail-fast: false
defaults:
run:
shell: bash

steps:
- name: Check out
uses: actions/checkout@master

- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: '3.11.x'

- name: Install Poetry
# https://github.com/snok/install-poetry#running-on-windows
uses: snok/[email protected]
with:
virtualenvs-create: true
virtualenvs-in-project: true
installer-parallel: true

- name: Install dependencies
run: poetry install -E milvus --no-interaction

- name: Run default basic pipeline
env:
ENV: 'dev'
LLM_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: poetry run python ./cognee/tests/test_milvus.py

- name: Clean up disk space
run: |
sudo rm -rf ~/.cache
sudo rm -rf /tmp/*
df -h
2 changes: 1 addition & 1 deletion .github/workflows/test_neo4j.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ jobs:
installer-parallel: true

- name: Install dependencies
run: poetry install --no-interaction
run: poetry install -E neo4j --no-interaction

- name: Run default Neo4j
env:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/test_qdrant.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ jobs:
installer-parallel: true

- name: Install dependencies
run: poetry install --no-interaction
run: poetry install -E qdrant --no-interaction

- name: Run default Qdrant
env:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/test_weaviate.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ jobs:
installer-parallel: true

- name: Install dependencies
run: poetry install --no-interaction
run: poetry install -E weaviate --no-interaction

- name: Run default Weaviate
env:
Expand Down
66 changes: 47 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,37 +13,64 @@ We build for developers who need a reliable, production-ready data layer for AI
## What is cognee?

Cognee implements scalable, modular ECL (Extract, Cognify, Load) pipelines that allow you to interconnect and retrieve past conversations, documents, and audio transcriptions while reducing hallucinations, developer effort, and cost.
Try it in a Google Colab <a href="https://colab.research.google.com/drive/1g-Qnx6l_ecHZi0IOw23rg0qC4TYvEvWZ?usp=sharing">notebook</a> or have a look at our <a href="https://topoteretes.github.io/cognee">documentation</a>
Try it in a Google Colab <a href="https://colab.research.google.com/drive/1g-Qnx6l_ecHZi0IOw23rg0qC4TYvEvWZ?usp=sharing">notebook</a> or have a look at our <a href="https://docs.cognee.ai">documentation</a>

If you have questions, join our <a href="https://discord.gg/NQPKmU5CCg">Discord</a> community


## 📦 Installation

You can install Cognee using either **pip** or **poetry**.
Support for various databases and vector stores is available through extras.

### With pip

```bash
pip install cognee
```

### With pip with PostgreSQL support
### With poetry

```bash
pip install 'cognee[postgres]'
poetry add cognee
```

### With poetry
### With pip with specific database support

To install Cognee with support for specific databases use the appropriate command below. Replace \<database> with the name of the database you need.
```bash
poetry add cognee
pip install 'cognee[<database>]'
```

### With poetry with PostgreSQL support
Replace \<database> with any of the following databases:
- postgres
- weaviate
- qdrant
- neo4j
- milvus

Installing Cognee with PostgreSQL and Neo4j support example:
```bash
poetry add cognee -E postgres
pip install 'cognee[postgres, neo4j]'
```

### With poetry with specific database support

To install Cognee with support for specific databases use the appropriate command below. Replace \<database> with the name of the database you need.
```bash
poetry add cognee -E <database>
```
Replace \<database> with any of the following databases:
- postgres
- weaviate
- qdrant
- neo4j
- milvus

Installing Cognee with PostgreSQL and Neo4j support example:
```bash
poetry add cognee -E postgres -E neo4j
```

## 💻 Basic Usage

Expand All @@ -61,7 +88,7 @@ import cognee
cognee.config.set_llm_api_key("YOUR_OPENAI_API_KEY")
```
You can also set the variables by creating .env file, here is our <a href="https://github.com/topoteretes/cognee/blob/main/.env.template">template.</a>
To use different LLM providers, for more info check out our <a href="https://topoteretes.github.io/cognee">documentation</a>
To use different LLM providers, for more info check out our <a href="https://docs.cognee.ai">documentation</a>

If you are using Network, create an account on Graphistry to visualize results:
```
Expand Down Expand Up @@ -282,7 +309,7 @@ Check out our demo notebook [here](https://github.com/topoteretes/cognee/blob/ma

### Install Server

Please see the [cognee Quick Start Guide](https://topoteretes.github.io/cognee/quickstart/) for important configuration information.
Please see the [cognee Quick Start Guide](https://docs.cognee.ai/quickstart/) for important configuration information.

```bash
docker compose up
Expand All @@ -291,7 +318,7 @@ docker compose up

### Install SDK

Please see the cognee [Development Guide](https://topoteretes.github.io/cognee/quickstart/) for important beta information and usage instructions.
Please see the cognee [Development Guide](https://docs.cognee.ai/quickstart/) for important beta information and usage instructions.

```bash
pip install cognee
Expand All @@ -317,12 +344,13 @@ pip install cognee
}
</style>

| Name | Type | Current state | Known Issues |
|------------------|--------------------|-------------------|---------------------------------------|
| Qdrant | Vector | Stable &#x2705; | |
| Weaviate | Vector | Stable &#x2705; | |
| LanceDB | Vector | Stable &#x2705; | |
| Neo4j | Graph | Stable &#x2705; | |
| NetworkX | Graph | Stable &#x2705; | |
| FalkorDB | Vector/Graph | Unstable &#x274C; | |
| PGVector | Vector | Unstable &#x274C; | Postgres DB returns the Timeout error |
| Name | Type | Current state | Known Issues |
|----------|--------------------|-------------------|--------------|
| Qdrant | Vector | Stable &#x2705; | |
| Weaviate | Vector | Stable &#x2705; | |
| LanceDB | Vector | Stable &#x2705; | |
| Neo4j | Graph | Stable &#x2705; | |
| NetworkX | Graph | Stable &#x2705; | |
| FalkorDB | Vector/Graph | Unstable &#x274C; | |
| PGVector | Vector | Stable &#x2705; | |
| Milvus | Vector | Stable &#x2705; | |
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
from functools import lru_cache
# from functools import lru_cache

from .config import get_relational_config
from .create_relational_engine import create_relational_engine

@lru_cache
# @lru_cache
def get_relational_engine():
relational_config = get_relational_config()

Expand Down
43 changes: 29 additions & 14 deletions cognee/infrastructure/databases/vector/create_vector_engine.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
from typing import Dict


class VectorConfig(Dict):
vector_db_url: str
vector_db_port: str
vector_db_key: str
vector_db_provider: str


def create_vector_engine(config: VectorConfig, embedding_engine):
if config["vector_db_provider"] == "weaviate":
from .weaviate_db import WeaviateAdapter
Expand All @@ -16,24 +18,37 @@ def create_vector_engine(config: VectorConfig, embedding_engine):
return WeaviateAdapter(
config["vector_db_url"],
config["vector_db_key"],
embedding_engine = embedding_engine
embedding_engine=embedding_engine
)

elif config["vector_db_provider"] == "qdrant":
if not (config["vector_db_url"] and config["vector_db_key"]):
raise EnvironmentError("Missing requred Qdrant credentials!")

from .qdrant.QDrantAdapter import QDrantAdapter

return QDrantAdapter(
url = config["vector_db_url"],
api_key = config["vector_db_key"],
embedding_engine = embedding_engine
url=config["vector_db_url"],
api_key=config["vector_db_key"],
embedding_engine=embedding_engine
)

elif config['vector_db_provider'] == 'milvus':
from .milvus.MilvusAdapter import MilvusAdapter

if not config["vector_db_url"]:
raise EnvironmentError("Missing required Milvus credentials!")

return MilvusAdapter(
url=config["vector_db_url"],
api_key=config['vector_db_key'],
embedding_engine=embedding_engine
)


elif config["vector_db_provider"] == "pgvector":
from cognee.infrastructure.databases.relational import get_relational_config

# Get configuration for postgres database
relational_config = get_relational_config()
db_username = relational_config.db_username
Expand All @@ -52,8 +67,8 @@ def create_vector_engine(config: VectorConfig, embedding_engine):
from .pgvector.PGVectorAdapter import PGVectorAdapter

return PGVectorAdapter(
connection_string,
config["vector_db_key"],
connection_string,
config["vector_db_key"],
embedding_engine,
)

Expand All @@ -64,16 +79,16 @@ def create_vector_engine(config: VectorConfig, embedding_engine):
from ..hybrid.falkordb.FalkorDBAdapter import FalkorDBAdapter

return FalkorDBAdapter(
database_url = config["vector_db_url"],
database_port = config["vector_db_port"],
embedding_engine = embedding_engine,
database_url=config["vector_db_url"],
database_port=config["vector_db_port"],
embedding_engine=embedding_engine,
)

else:
from .lancedb.LanceDBAdapter import LanceDBAdapter

return LanceDBAdapter(
url = config["vector_db_url"],
api_key = config["vector_db_key"],
embedding_engine = embedding_engine,
url=config["vector_db_url"],
api_key=config["vector_db_key"],
embedding_engine=embedding_engine,
)
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
import asyncio
import logging
import math
from typing import List, Optional
import litellm
from cognee.infrastructure.databases.vector.embeddings.EmbeddingEngine import EmbeddingEngine
Expand Down Expand Up @@ -36,11 +38,26 @@ async def embed_text(self, text: List[str]) -> List[List[float]]:
api_base = self.endpoint,
api_version = self.api_version
)
except litellm.exceptions.BadRequestError as error:
logger.error("Error embedding text: %s", str(error))
return [data["embedding"] for data in response.data]

except litellm.exceptions.ContextWindowExceededError as error:
if isinstance(text, list):
parts = [text[0:math.ceil(len(text)/2)], text[math.ceil(len(text)/2):]]
parts_futures = [self.embed_text(part) for part in parts]
embeddings = await asyncio.gather(*parts_futures)

all_embeddings = []
for embeddings_part in embeddings:
all_embeddings.extend(embeddings_part)

return [data["embedding"] for data in all_embeddings]

logger.error("Context window exceeded for embedding text: %s", str(error))
raise error

return [data["embedding"] for data in response.data]
except Exception as error:
logger.error("Error embedding text: %s", str(error))
raise error

def get_vector_size(self) -> int:
return self.dimensions
Loading

0 comments on commit c505ee5

Please sign in to comment.