Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build the docker and push #33

Merged
merged 43 commits into from
Jan 23, 2024
Merged

Conversation

Vasilije1990
Copy link
Contributor

@Vasilije1990 Vasilije1990 commented Dec 16, 2023

Summary by CodeRabbit

  • New Features

    • Enhanced CI/CD workflows with Docker image repository configuration and Terraform integration.
    • Introduced new FastAPI-based web service with multiple endpoints for memory management and database interactions.
    • Added support for language detection and translation using AWS Translate in language processing utilities.
    • Implemented new classes and methods for managing embeddings, vector databases, job statuses, and graph databases.
    • Expanded vector database functionality with support for different search types and filtering options.
  • Improvements

    • Upgraded license from MIT to Apache License, Version 2.0.
    • Improved Docker image with the addition of libgl1-mesa-glx package.
    • Refined database configuration management with centralized Config object.
    • Enhanced classifier function to accept different parameters and return classifier ID.
    • Added memory_category field to database models for better memory categorization.
  • Documentation

    • Updated environment variable template with new AWS and QDRANT API keys and URLs.
  • Bug Fixes

    • Fixed issues with database connection parameter retrieval in database creation scripts.
    • Adjusted logging for the search type in vector database operations.
  • Refactor

    • Transitioned synchronous retry mechanisms to asynchronous in OpenAI tools.
    • Reorganized database settings to use centralized configuration for PostgreSQL.
    • Streamlined chunk processing and document loading in vector database operations.
    • Centralized configuration management in various database-related scripts.
  • Style

    • Cleaned up code by removing trailing whitespace and commented-out code blocks.
    • Reorganized imports and modified exception handling in vector database chunkers.
  • Chores

    • Updated entrypoint script to export ENV variable and commented out the execution of the create_database.py script.

Copy link
Contributor

coderabbitai bot commented Dec 16, 2023

Walkthrough

The changes reflect a significant update to a software project, enhancing its CI/CD pipelines, refining database interactions, and expanding its cognitive architecture. The project shifts to the Apache License and introduces new functionalities for language processing, memory management, and vector database operations. It also improves Docker configurations and adds a FastAPI-based web service, signaling a broad enhancement of the system's capabilities and infrastructure.

Changes

File Path Change Summary
.github/actions/image_builder/action.yaml
.github/workflows/cd.yaml
.github/workflows/cd_prd.yaml
Updated CI/CD workflows, Docker image repo configurations, libgl1-mesa-glx package installation, and Terraform triggers.
LICENSE Changed from MIT to Apache License, Version 2.0.
bin/dockerize
.env.template
entrypoint.sh
Altered Docker-related script, environment variable declarations, and script execution.
cognitive_architecture/openai_tools.py
cognitive_architecture/database/postgres/database.py
cognitive_architecture/database/vectordb/vectordb.py
Refactored asynchronous retry mechanisms, centralized database settings, and expanded database functionalities.
api.py
Dockerfile
cognitive_architecture/classifiers/classifier.py
Introduced a FastAPI-based web service, updated Docker configurations, and modified classifier logic.
cognitive_architecture/database/postgres/models/...
cognitive_architecture/database/vectordb/...
Added and updated functionalities for database models and vector database management.
cognitive_architecture/database/graph_database/...
cognitive_architecture/shared/language_processing.py
Enhanced graph database logic and added language processing functionalities.
cognitive_architecture/utils.py
cognitive_architecture/fetch_secret.py
cognitive_architecture/database/create_database_tst.py
main.py
Updated utility functions, enhanced secret fetching, introduced database interaction functionalities, and expanded memory management and document processing.

🐇 "With code refined and systems aligned,
🌟 New features shine, so deftly designed.
📜 From Apache's embrace to a FastAPI face,
Our project leaps through the digital space."

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit-tests for this file.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit tests for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository from git and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit tests.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 05aaee6 and c43536c.
Files selected for processing (1)
  • README.md (1 hunks)
Files skipped from review due to trivial changes (1)
  • README.md

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 5

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between c43536c and dfb13a2.
Files ignored due to filter (6)
  • assets/Dashboard_example.png
  • assets/User_graph.png
  • assets/infographic_final.png
  • docker-compose.yml
  • poetry.lock
  • pyproject.toml
Files selected for processing (7)
  • .github/actions/image_builder/action.yaml (1 hunks)
  • .github/workflows/cd.yaml (2 hunks)
  • .github/workflows/cd_prd.yaml (2 hunks)
  • .github/workflows/ci.yaml (1 hunks)
  • LICENSE (1 hunks)
  • bin/dockerize (2 hunks)
  • cognitive_architecture/openai_tools.py (1 hunks)
Files skipped from review due to trivial changes (1)
  • .github/actions/image_builder/action.yaml
Additional comments: 13
.github/workflows/cd.yaml (2)
  • 50-51: The ecr_image_repo_name and dockerfile_location have been updated. Verify that the new Docker image repository name and Dockerfile location are consistent with the intended Docker configurations and naming conventions used across the project.

  • 61-76: The new apply_tf job is introduced to trigger a Terraform apply workflow. Ensure that the PAT_FOR_CROSS_REPOS_CICD_TRIGGERING secret is properly configured for cross-repository workflow dispatch and that the workflow terraform.apply.yml exists and is set up to be triggered from this repository.

.github/workflows/cd_prd.yaml (3)
  • 23-57: The steps added to the publish_docker_to_ecr job for setting environment variables, configuring AWS credentials, and creating and pushing Docker images appear to be correctly implemented. However, it's important to verify that the .github/actions/image_builder action exists and is configured properly, as it is being used to build and push the Docker image.
Verification successful

The verification of the .github/actions/image_builder action has been successful. The directory exists within the .github/actions/ path as confirmed by the output of the shell script.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Verify that the image_builder GitHub action exists and check its configuration.
fd --exec ls -la '{}'/image_builder '*/.github/actions/'

Length of output: 34680



Script:

#!/bin/bash
# Search for the 'image_builder' directory within the '.github/actions/' path.
fd 'image_builder' '.github/actions/'

Length of output: 66

* 81-82: The output `promethai_docker_tag_backend` is correctly set to be used by downstream jobs or steps. This is a good practice for passing data between jobs in a GitHub Actions workflow.
  • 84-99: The apply_tf job is configured to trigger a Terraform apply workflow after the Docker image is published. The use of actions/github-script to programmatically dispatch a workflow is correct. Ensure that the PAT_FOR_CROSS_REPOS_CICD_TRIGGERING secret is set with the appropriate permissions and that the PromethAI-Infra repository and terraform.apply.yml workflow exist and are configured to accept dispatch events.
.github/workflows/ci.yaml (2)
  • 23-23: The docker_login and publish flags are set to "false" in the dockerize command. Confirm that this is the intended behavior for the CI workflow and that there are no scenarios where these should be set to "true".

  • 24-24: The Docker image version is now being read from /tmp/.DOCKER_IMAGE_VERSION. Verify that the dockerize script writes the Docker image version to this new path and that there are no conflicts with other processes that may use the /tmp directory.

LICENSE (1)
  • 1-201: The change from the MIT License to the Apache License, Version 2.0, is a significant alteration to the terms under which the software is provided. Ensure that this change is intentional and that all contributors are aware of the implications of this change. Additionally, verify that all necessary documentation and code headers are updated to reflect the new licensing terms.
bin/dockerize (2)
  • 11-11: The change to the IMAGE_NAME variable removes the "-memory" suffix from the Docker image name. Ensure that this new naming convention is consistent with the intended Docker image naming strategy across the project.

  • 15-15: The APP_DIR variable has been changed from "level_2/" to the current directory ".". Verify that this change aligns with the directory structure of the application and that the Docker build context is correctly set to include all necessary files.

cognitive_architecture/openai_tools.py (3)
  • 108-110: The use of the asynchronous retry decorator @aretry_with_exponential_backoff on acompletions_with_backoff is correct, but ensure that all calls to acompletions_with_backoff have been updated to be awaited, as this change affects the outward-facing interface of the codebase.

  • 119-125: The function async_get_embedding_with_backoff is correctly implemented as an asynchronous function. It's important to verify that this function is awaited wherever it's called, and that the introduction of this function does not conflict with any existing synchronous code paths.

  • 128-137: The synchronous functions create_embedding_with_backoff and get_embedding_with_backoff remain unchanged and should continue to work as before. However, it's important to ensure that the introduction of their asynchronous counterparts does not lead to confusion or misuse in the codebase.

Comment on lines 33 to +36
fi
docker push "${FULL_IMAGE_NAME}" &&
echo "Successfully pushed docker image ${FULL_IMAGE_NAME} to ECR repository"
fi
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nested if statement checking the value of PUBLISH is redundant and can be removed for clarity and simplicity.

  if [ "${PUBLISH}" = true ]; then
-   echo "logging in"
-   aws ecr get-login-password --region "${AWS_REGION}" | docker login --username AWS --password-stdin "${AWS_REPOSITORY}"
-   if [ "${PUBLISH}" = true ]; then
-     echo "logging in"
-     aws ecr get-login-password --region "${AWS_REGION}" | docker login --username AWS --password-stdin "${AWS_REPOSITORY}"
  fi

Committable suggestion

IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
fi
docker push "${FULL_IMAGE_NAME}" &&
echo "Successfully pushed docker image ${FULL_IMAGE_NAME} to ECR repository"
fi
fi
fi
docker push "${FULL_IMAGE_NAME}" &&
echo "Successfully pushed docker image ${FULL_IMAGE_NAME} to ECR repository"
fi

Comment on lines 8 to 18
SHA_SHORT="$(git rev-parse --short HEAD)"
CUR_DATE="$(date +%Y%m%d%H%M%S)"
VERSION="$STAGE-$CUR_DATE-$SHA_SHORT"
IMAGE_NAME=${image_name:-promethai-${STAGE}-promethai-backend-memory}
IMAGE_NAME=${image_name:-promethai-${STAGE}-promethai-backend}

REPO_NAME="${AWS_REPOSITORY}/${IMAGE_NAME}"
FULL_IMAGE_NAME="${REPO_NAME}:${VERSION}"
APP_DIR=${app_dir:-"level_2/"} # Updated this line
APP_DIR=${app_dir:-"."}

PUBLISH=${publish:-false}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This review was outside the patches, and no patch overlapping with it was found. Original lines [19-19]

The echo statement should use the APP_DIR variable instead of app_dir to reflect the directory being used in the build process.

- echo "Building docker image ${FULL_IMAGE_NAME} located in dir ${app_dir}"
+ echo "Building docker image ${FULL_IMAGE_NAME} located in dir ${APP_DIR}"

Comment on lines 78 to +99
# with:
# tag_name: ${{ steps.bump_version_and_push_tag.outputs.tag }}
# release_name: Release ${{ steps.bump_version_and_push_tag.outputs.tag }}
# outputs:
# promethai_docker_tag_backend: ${{ steps.export-promethai-docker-tag.outputs.promethai_docker_tag_backend }}
outputs:
promethai_docker_tag_backend: ${{ steps.export-promethai-docker-tag.outputs.promethai_docker_tag_backend }}

# apply_tf:
# name: Trigger terraform apply workflow
# runs-on: ubuntu-latest
# needs: publish_docker_to_ecr
# steps:
# - name: TF apply workflow triggers step
# uses: actions/github-script@v6
# with:
# github-token: ${{ secrets.PAT_FOR_CROSS_REPOS_CICD_TRIGGERING }}
# script: |
# await github.rest.actions.createWorkflowDispatch({
# owner: 'topoteretes',
# repo: 'PromethAI-Infra',
# workflow_id: 'terraform.apply.yml',
# ref: 'main'
# })
apply_tf:
name: Trigger terraform apply workflow
runs-on: ubuntu-latest
needs: publish_docker_to_ecr
steps:
- name: TF apply workflow triggers step
uses: actions/github-script@v6
with:
github-token: ${{ secrets.PAT_FOR_CROSS_REPOS_CICD_TRIGGERING }}
script: |
await github.rest.actions.createWorkflowDispatch({
owner: 'topoteretes',
repo: 'PromethAI-Infra',
workflow_id: 'terraform.apply.yml',
ref: 'main'
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [59-80]

The commented-out code for creating tags and releases is still present in the workflow file. If these steps have been implemented elsewhere, this commented-out code should be removed to avoid confusion and maintain cleanliness of the workflow file.

- #      - name: Create Tag and Release
- #        runs-on: ubuntu-latest
- #        uses: actions/checkout@v3
- #        needs: publish_docker_to_ecr  # ensure this job runs after Docker image is pushed
- #        steps:
- #          - name: Check out code
- #            uses: actions/checkout@v3
- #          - name: Bump version and push tag
- #            id: bump_version_and_push_tag
- #            uses: anothrNick/[email protected]
- #            env:
- #              GITHUB_TOKEN: ${{ secrets.PAT_FOR_CROSS_REPOS_CICD_TRIGGERING }}
- #              WITH_V: true
- #              DEFAULT_BUMP: 'minor'  # or 'minor' or 'major'
- #          - name: Create Release
- #            id: create_release
- #            uses: actions/create-release@v1
- #            env:
- #              GITHUB_TOKEN: ${{ secrets.PAT_FOR_CROSS_REPOS_CICD_TRIGGERING }}
- #            with:
- #              tag_name: ${{ steps.bump_version_and_push_tag.outputs.tag }}
- #              release_name: Release ${{ steps.bump_version_and_push_tag.outputs.tag }}

Comment on lines +65 to +105
def aretry_with_exponential_backoff(
func,
initial_delay: float = 1,
exponential_base: float = 2,
jitter: bool = True,
max_retries: int = 20,
errors: tuple = (openai.error.RateLimitError,),
):
"""Retry a function with exponential backoff."""

async def wrapper(*args, **kwargs):
# Initialize variables
num_retries = 0
delay = initial_delay

# Loop until a successful response or max_retries is hit or an exception is raised
while True:
try:
return await func(*args, **kwargs)

# Retry on specified errors
except errors as e:
print(f"acreate (backoff): caught error: {e}")
# Increment retries
num_retries += 1

# Check if max retries has been reached
if num_retries > max_retries:
raise Exception(f"Maximum number of retries ({max_retries}) exceeded.")

# Increment the delay
delay *= exponential_base * (1 + jitter * random.random())

# Sleep for the delay
await asyncio.sleep(delay)

# Raise exceptions for any errors not specified
except Exception as e:
raise e

return wrapper
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new asynchronous retry function aretry_with_exponential_backoff seems correctly implemented, mirroring the logic of the synchronous version but using async and await keywords. However, the exception handling prints an error message for caught RateLimitError but does not log or handle other exceptions, which could be inconsistent with the synchronous version's behavior. Consider adding logging for all exceptions or re-raising them to maintain consistency with the synchronous version.

-            except errors as e:
+            except errors as rate_limit_error:
-                print(f"acreate (backoff): caught error: {e}")
+                logging.error(f"Retrying due to rate limit: {rate_limit_error}")
-            except Exception as e:
+            except Exception as general_error:
-                raise e
+                logging.error(f"An unexpected error occurred: {general_error}")
+                raise

Committable suggestion

IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
def aretry_with_exponential_backoff(
func,
initial_delay: float = 1,
exponential_base: float = 2,
jitter: bool = True,
max_retries: int = 20,
errors: tuple = (openai.error.RateLimitError,),
):
"""Retry a function with exponential backoff."""
async def wrapper(*args, **kwargs):
# Initialize variables
num_retries = 0
delay = initial_delay
# Loop until a successful response or max_retries is hit or an exception is raised
while True:
try:
return await func(*args, **kwargs)
# Retry on specified errors
except errors as e:
print(f"acreate (backoff): caught error: {e}")
# Increment retries
num_retries += 1
# Check if max retries has been reached
if num_retries > max_retries:
raise Exception(f"Maximum number of retries ({max_retries}) exceeded.")
# Increment the delay
delay *= exponential_base * (1 + jitter * random.random())
# Sleep for the delay
await asyncio.sleep(delay)
# Raise exceptions for any errors not specified
except Exception as e:
raise e
return wrapper
def aretry_with_exponential_backoff(
func,
initial_delay: float = 1,
exponential_base: float = 2,
jitter: bool = True,
max_retries: int = 20,
errors: tuple = (openai.error.RateLimitError,),
):
"""Retry a function with exponential backoff."""
async def wrapper(*args, **kwargs):
# Initialize variables
num_retries = 0
delay = initial_delay
# Loop until a successful response or max_retries is hit or an exception is raised
while True:
try:
return await func(*args, **kwargs)
# Retry on specified errors
except errors as rate_limit_error:
logging.error(f"Retrying due to rate limit: {rate_limit_error}")
# Increment retries
num_retries += 1
# Check if max retries has been reached
if num_retries > max_retries:
raise Exception(f"Maximum number of retries ({max_retries}) exceeded.")
# Increment the delay
delay *= exponential_base * (1 + jitter * random.random())
# Sleep for the delay
await asyncio.sleep(delay)
# Raise exceptions for any errors not specified
except Exception as general_error:
logging.error(f"An unexpected error occurred: {general_error}")
raise
return wrapper

Comment on lines +113 to +116
@aretry_with_exponential_backoff
async def acreate_embedding_with_backoff(**kwargs):
"""Wrapper around Embedding.acreate w/ backoff"""
return await openai.Embedding.acreate(**kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 NOTE
Codebase verification is a beta feature.

The verification process has identified that the asynchronous function acreate_embedding_with_backoff is used correctly with the await keyword in at least one instance within the codebase. However, there is also a synchronous call to create_embedding_with_backoff present, which may indicate that not all parts of the code have been updated to use the new asynchronous function. It is recommended to verify whether the synchronous call is intentional or if it requires updating to the asynchronous pattern to maintain consistency and prevent potential issues.

  • The asynchronous call is found at ./cognitive_architecture/openai_tools.py:123.
  • The synchronous call is found at ./cognitive_architecture/openai_tools.py:135.
Analysis chain

The function acreate_embedding_with_backoff is correctly implemented as an asynchronous function with the retry decorator. Ensure that all calls to this function are awaited and that the function's introduction does not break any existing code that relies on the synchronous version.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Search for calls to `acreate_embedding_with_backoff` to ensure they are awaited.
ast-grep --lang python --pattern $'await acreate_embedding_with_backoff($$$)'

Length of output: 202



Script:

#!/bin/bash
# Search for all instances of `acreate_embedding_with_backoff` to ensure they are all awaited.
ast-grep --lang python --pattern $'await acreate_embedding_with_backoff($$$)'

Length of output: 202


Script:

#!/bin/bash
# Search for any calls to a synchronous version of the function, which might be named without the 'a' prefix.
ast-grep --lang python --pattern $'create_embedding_with_backoff($$$)'

Length of output: 188

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 13

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between dfb13a2 and 6e47c1f.
Files ignored due to filter (2)
  • poetry.lock
  • pyproject.toml
Files selected for processing (5)
  • .env.template (1 hunks)
  • cognitive_architecture/database/postgres/database_crud.py (1 hunks)
  • cognitive_architecture/database/vectordb/vectordb.py (5 hunks)
  • cognitive_architecture/vectorstore_manager.py (1 hunks)
  • main.py (6 hunks)
Files skipped from review due to trivial changes (2)
  • .env.template
  • cognitive_architecture/vectorstore_manager.py
Additional comments: 9
cognitive_architecture/database/postgres/database_crud.py (1)
  • 54-54: Explicit commit inside the update_entity function is correct and aligns with the async nature of the function.
cognitive_architecture/database/vectordb/vectordb.py (5)
  • 154-154: The addition of params['user_id'] ensures that the user ID is included in the parameters for memory operations, which is likely important for associating memories with the correct user.

  • 176-176: Incrementing params['chunk_order'] within the loop correctly assigns an order to each chunk, which is important for maintaining the sequence of the data.

  • 258-289: The addition of a new search_type == 'summary' condition with a specific filter object is a significant change. Ensure that this new search type and its logic are documented and tested to verify that it behaves as expected.

  • 314-315: The return statement for the fetch_memories method is correctly placed after all conditional logic, ensuring that the method returns the query output regardless of the search_type.

  • 320-321: The check for namespace being None and the subsequent assignment to self.namespace is consistent with the rest of the methods in the class, ensuring that the default namespace is used when none is provided.

main.py (3)
  • 8-8: The addition of update_entity_graph_summary to imports aligns with the PR objectives and summary.

  • 180-209: The signature of the add_documents_to_graph_db function has been updated by removing the stupid_local_testing_flag parameter, which aligns with the PR objectives and summary.

  • 332-332: The update operation for DocsModel in the main function aligns with the PR objectives and summary.

Comment on lines +37 to +46
def update_entity_graph_summary(session, model, entity_id, new_value):
with session_scope(session) as s:
# Retrieve the entity from the database
entity = s.query(model).filter_by(id=entity_id).first()
if entity:
entity.graph_summary = new_value
s.commit()
return "Successfully updated entity"
else:
return "Entity not found"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The update_entity_graph_summary function is synchronous, which is inconsistent with the rest of the database functions that are designed to be asynchronous. Consider refactoring this function to be asynchronous for consistency and to avoid potential integration issues with other parts of the system.

- def update_entity_graph_summary(session, model, entity_id, new_value):
+ async def update_entity_graph_summary(session, model, entity_id, new_value):
- with session_scope(session) as s:
+ async with session_scope(session) as s:
- s.commit()
+ await s.commit()

Committable suggestion

IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
def update_entity_graph_summary(session, model, entity_id, new_value):
with session_scope(session) as s:
# Retrieve the entity from the database
entity = s.query(model).filter_by(id=entity_id).first()
if entity:
entity.graph_summary = new_value
s.commit()
return "Successfully updated entity"
else:
return "Entity not found"
async def update_entity_graph_summary(session, model, entity_id, new_value):
async with session_scope(session) as s:
# Retrieve the entity from the database
entity = s.query(model).filter_by(id=entity_id).first()
if entity:
entity.graph_summary = new_value
await s.commit()
return "Successfully updated entity"
else:
return "Entity not found"

@@ -34,16 +34,24 @@ async def add_entity(session, entity):
return "Successfully added entity"



def update_entity_graph_summary(session, model, entity_id, new_value):
with session_scope(session) as s:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The session_scope context manager is asynchronous and should be used with async with, not with. This will cause a runtime error as it stands.

- with session_scope(session) as s:
+ async with session_scope(session) as s:

Committable suggestion

IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
with session_scope(session) as s:
async with session_scope(session) as s:

entity = s.query(model).filter_by(id=entity_id).first()
if entity:
entity.graph_summary = new_value
s.commit()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The session.commit() call inside update_entity_graph_summary is redundant because the session_scope context manager already takes care of committing the session. This line should be removed to avoid confusion and redundancy.

- s.commit()

Committable suggestion

IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
s.commit()

# neo4j_graph_db.update_document_node_with_namespace(user_id, vectordb_namespace=memory_name, document_id=doc_id)
# else:
try:
logging.info("Docs are", docs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect logging statement syntax; it will cause a runtime error. Use proper string formatting.

- logging.info("Docs are", docs)
+ logging.info("Docs are %s", docs)

Committable suggestion

IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
logging.info("Docs are", docs)
logging.info("Docs are %s", docs)

main.py Outdated
try:
logging.info("Docs are", docs)
for doc, memory_name in zip(docs, memory_names):
logging.info("Memory names are", memory_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect logging statement syntax; it will cause a runtime error. Use proper string formatting.

- logging.info("Memory names are", memory_name)
+ logging.info("Memory names are %s", memory_name)

Committable suggestion

IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
logging.info("Memory names are", memory_name)
logging.info("Memory names are %s", memory_name)

main.py Outdated
await update_entity(session, DocsModel, doc_id, True)
# except:
# pass
logging.info("Classification content is", classification_content)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect logging statement syntax; it will cause a runtime error. Use proper string formatting.

- logging.info("Classification content is", classification_content)
+ logging.info("Classification content is %s", classification_content)

Committable suggestion

IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
logging.info("Classification content is", classification_content)
logging.info("Classification content is %s", classification_content)

doc_name, doc_id = doc
classification = await classify_documents(doc_name, document_id =doc_id, content=concatenated_retrievals)

logging.info("Classification is", str(classification))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect logging statement syntax; it will cause a runtime error. Use proper string formatting.

- logging.info("Classification is", str(classification))
+ logging.info("Classification is %s", str(classification))

Committable suggestion

IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
logging.info("Classification is", str(classification))
logging.info("Classification is %s", str(classification))

main.py Outdated
neo4j_graph_db.query(rs, classification)
neo4j_graph_db.update_document_node_with_namespace(user_id, vectordb_namespace=memory_name,
document_id=doc_id)
logging.info("hERE IS THE OUT2323", doc_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect logging statement syntax; it will cause a runtime error. Use proper string formatting.

- logging.info("hERE IS THE OUT2323", doc_id)
+ logging.info("hERE IS THE OUT2323 %s", doc_id)

Committable suggestion

IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
logging.info("hERE IS THE OUT2323", doc_id)
logging.info("hERE IS THE OUT2323 %s", doc_id)

main.py Outdated
Comment on lines 332 to 356
await update_entity(session, DocsModel, "8cd9a022-5a7a-4af5-815a-f988415536ae", True)
# out = await get_vectordb_namespace(session, user_id)
params = {
"version": "1.0",
"agreement_id": "AG123456",
"privacy_policy": "https://example.com/privacy",
"terms_of_service": "https://example.com/terms",
"format": "json",
"schema_version": "1.1",
"checksum": "a1b2c3d4e5f6",
"owner": "John Doe",
"license": "MIT",
"validity_start": "2023-08-01",
"validity_end": "2024-07-31",
}
loader_settings = {
"format": "PDF",
"source": "DEVICE",
"path": [".data"],
"strategy": "SUMMARY",
}
await load_documents_to_vectorstore(session, user_id, loader_settings=loader_settings)
await user_query_to_graph_db(session, user_id, "I walked in the forest yesterday and added to my list I need to buy some milk in the store and get a summary from a classical book i read yesterday")
await add_documents_to_graph_db(session, user_id, loader_settings=loader_settings)
await user_context_enrichment(session, user_id, query="Tell me about the book I read yesterday")
# params = {
# "version": "1.0",
# "agreement_id": "AG123456",
# "privacy_policy": "https://example.com/privacy",
# "terms_of_service": "https://example.com/terms",
# "format": "json",
# "schema_version": "1.1",
# "checksum": "a1b2c3d4e5f6",
# "owner": "John Doe",
# "license": "MIT",
# "validity_start": "2023-08-01",
# "validity_end": "2024-07-31",
# }
# loader_settings = {
# "format": "PDF",
# "source": "DEVICE",
# "path": [".data"],
# "strategy": "SUMMARY",
# }
# await load_documents_to_vectorstore(session, user_id, loader_settings=loader_settings)
# await user_query_to_graph_db(session, user_id, "I walked in the forest yesterday and added to my list I need to buy some milk in the store and get a summary from a classical book i read yesterday")
# await add_documents_to_graph_db(session, user_id, loader_settings=loader_settings)
# await user_context_enrichment(session, user_id, query="Tell me about the book I read yesterday")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented-out code should be removed or clarified with a comment explaining why it's retained.

Comment on lines 204 to 210
search_type = 'hybrid'
logging.info("The search type is s%", search_type)

if search_type == 'summary':
from weaviate.classes import Filter
client = weaviate.connect_to_wcs(
cluster_url=config.weaviate_url,
auth_credentials=weaviate.AuthApiKey(config.weaviate_api_key)
)

summary_collection = client.collections.get(self.namespace)
response = summary_collection.query.fetch_objects(
filters=Filter("user_id").equal(self.user_id) &
Filter("chunk_order").less_than(25),
limit=15
)
return response

if not namespace:
namespace = self.namespace
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check for search_type being None and the subsequent assignment to 'hybrid' should be moved outside of the fetch_memories method to ensure that the default value is set before the method is called. This will prevent the method from being called with an undefined search_type.

- if search_type is None:
-     search_type = 'hybrid'
+ # Default search_type should be set before calling fetch_memories

Committable suggestion

IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
search_type = 'hybrid'
logging.info("The search type is s%", search_type)
if search_type == 'summary':
from weaviate.classes import Filter
client = weaviate.connect_to_wcs(
cluster_url=config.weaviate_url,
auth_credentials=weaviate.AuthApiKey(config.weaviate_api_key)
)
summary_collection = client.collections.get(self.namespace)
response = summary_collection.query.fetch_objects(
filters=Filter("user_id").equal(self.user_id) &
Filter("chunk_order").less_than(25),
limit=15
)
return response
if not namespace:
namespace = self.namespace
search_type = 'hybrid'
logging.info("The search type is s%", search_type)
# Default search_type should be set before calling fetch_memories
if not namespace:
namespace = self.namespace

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 6

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 6e47c1f and 5ed04de.
Files selected for processing (7)
  • .env.template (1 hunks)
  • cognitive_architecture/config.py (1 hunks)
  • cognitive_architecture/database/create_database.py (2 hunks)
  • cognitive_architecture/database/postgres/database.py (1 hunks)
  • cognitive_architecture/database/vectordb/vectordb.py (5 hunks)
  • entrypoint.sh (1 hunks)
  • main.py (7 hunks)
Files skipped from review due to trivial changes (1)
  • .env.template
Additional comments: 8
entrypoint.sh (1)
  • 1-6: The addition of the ENV environment variable should be verified to ensure it's utilized within the Python scripts or the Gunicorn service that this script is starting. If ENV is not used, consider removing it to avoid unnecessary environment variables.
cognitive_architecture/config.py (1)
  • 34-53: The conditional assignments for configuration parameters based on environment variables are correctly implemented. Ensure that the environment variables used here are documented and that their values are set in the environment where the application runs.
cognitive_architecture/database/vectordb/vectordb.py (3)
  • 154-154: The addition of user_id to params within the add_memories method should be verified to ensure that it aligns with the method's expected behavior and that the user_id is indeed required for the operation.

  • 172-177: The change in chunk_size to 300 should be verified to ensure it aligns with the expected behavior of the chunk_data function and that it does not negatively impact the performance or accuracy of the memory addition process.

  • 258-289: The addition of the 'summary' search type in the fetch_memories method should be verified to ensure that it aligns with the expected behavior and that the logic for fetching summaries is correctly implemented.

main.py (3)
  • 8-8: The addition of update_entity_graph_summary to the import statement should be verified to ensure that it is used within the file and that the function is correctly implemented where it is called.

  • 180-209: The modifications in the add_documents_to_graph_db function, including the addition of a try-except block and changes in the classification process, should be verified to ensure that they are correctly implemented and that they align with the expected behavior of the function.

  • 332-332: The addition of the update_entity call within the main function should be verified to ensure that it is correctly implemented and that it aligns with the expected behavior of the function.

Comment on lines 18 to 32
from cognitive_architecture.config import Config
config = Config()
config.load()


# in seconds
MAX_RETRIES = 3
RETRY_DELAY = 5

username = os.getenv('POSTGRES_USER')
password = os.getenv('POSTGRES_PASSWORD')
database_name = os.getenv('POSTGRES_DB')
import os

environment = os.environ.get("ENVIRONMENT")

if environment == "local":
host= os.getenv('POSTGRES_HOST')
host = config.postgres_host
username = config.postgres_user
password = config.postgres_password
database_name = config.postgres_db
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The refactoring to use a Config object for database configuration parameters is a good practice for maintainability and centralization of configuration. Ensure that all instances in the codebase that previously accessed these environment variables directly are updated to use the Config object.

Comment on lines 12 to 18
from cognitive_architecture.config import Config
config = Config()
config.load()



from postgres.database import Base
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of the Config object to load and assign database connection parameters is a good practice for maintainability and centralization of configuration. Ensure that all instances in the codebase that previously accessed these environment variables directly are updated to use the Config object.

Comment on lines 61 to 71
#
# username = os.getenv('POSTGRES_USER')
# password = os.getenv('POSTGRES_PASSWORD')
# database_name = os.getenv('POSTGRES_DB')
# environment = os.environ.get("ENV")
host = config.postgres_host
username = config.postgres_user
password = config.postgres_password
database_name = config.postgres_db

engine = create_admin_engine(username, password, host, database_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure that the create_admin_engine function is updated to use the new config object for database connection parameters, as the old environment variable access has been replaced.

chunk_overlap=20)]
for doc in documents[0]:
chunk_count += 1
params['chunk_order'] = chunk_count

# document_to_load = self._stuct(observation, params, metadata_schema_class)

logging.info("Loading document with defautl loader settings %s", str(doc))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a typo in the log message: "defautl" should be corrected to "default".

- logging.info("Loading document with defautl loader settings %s", str(doc))
+ logging.info("Loading document with default loader settings %s", str(doc))

Committable suggestion

IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
logging.info("Loading document with defautl loader settings %s", str(doc))
logging.info("Loading document with default loader settings %s", str(doc))

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 5ed04de and 9293c74.
Files selected for processing (2)
  • cognitive_architecture/database/create_database.py (2 hunks)
  • cognitive_architecture/database/postgres/database.py (1 hunks)
Additional comments: 7
cognitive_architecture/database/postgres/database.py (4)
  • 18-20: The introduction of the Config object and the call to config.load() are good practices for centralizing configuration management.

  • 29-32: The assignment of database connection parameters from the Config object is correct and improves maintainability by centralizing configuration settings.

  • 37-37: Verify that the SQLALCHEMY_DATABASE_URL is correctly formatted and that the Config object is loaded before this line to ensure database connectivity.

  • 36-37: The retry mechanism in safe_db_operation appears to be correctly implemented, using the MAX_RETRIES and RETRY_DELAY constants.

cognitive_architecture/database/create_database.py (3)
  • 12-14: The introduction of the Config object and the call to config.load() are good practices for centralizing configuration management.

  • 66-69: The assignment of database connection parameters from the Config object is correct and improves maintainability by centralizing configuration settings.

  • 61-71: Ensure that the create_admin_engine, database_exists, create_database, and create_tables functions are updated to use the new Config object for database connection parameters, as the old environment variable access has been replaced.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 9293c74 and cbde083.
Files selected for processing (1)
  • cognitive_architecture/database/postgres/database.py (1 hunks)
Files skipped from review as they are similar to previous changes (1)
  • cognitive_architecture/database/postgres/database.py

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between cbde083 and 078d697.
Files selected for processing (1)
  • cognitive_architecture/database/postgres/database.py (1 hunks)
Files skipped from review due to trivial changes (1)
  • cognitive_architecture/database/postgres/database.py

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 078d697 and 523f893.
Files selected for processing (1)
  • cognitive_architecture/database/create_database.py (3 hunks)
Files skipped from review as they are similar to previous changes (1)
  • cognitive_architecture/database/create_database.py

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 2

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 523f893 and 98383b7.
Files selected for processing (1)
  • api.py (2 hunks)
Additional comments: 1
api.py (1)
  • 157-158: The exception handling in the user_query_classfier function has been changed to raise an HTTPException directly instead of returning a JSONResponse. This is a positive change as it leverages FastAPI's built-in exception handling to return the appropriate response.

api.py Outdated
Comment on lines 162 to 169
@app.post("/drop-db")
async def drop_db(payload: Payload):
try:
decoded_payload = payload.payload
return JSONResponse(content={"response": "dropped"}, status_code=200)

except Exception as e:
return HTTPException(status_code=500, detail=str(e))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 NOTE
Codebase verification is a beta feature.

The drop_db function in api.py does not appear to have any active authorization checks associated with it. The references to authentication are commented out, indicating that the function could be executed without any security checks, which is a serious security concern.

  • The drop_db function lacks active authorization checks, potentially allowing unauthorized database operations.
Analysis chain

The drop_db function currently returns a static JSONResponse with the message "dropped" and does not perform any action. This could be a placeholder, but it's important to ensure that the actual database drop logic is implemented securely and with proper authorization checks to prevent accidental or malicious data loss.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Verify if there are any authorization checks for sensitive operations like dropping the database.
rg --type py 'drop_db'

Length of output: 66



Script:

#!/bin/bash
# Search for any references to authorization or authentication in the api.py file.
rg --type py 'auth|Auth|authorize|Authorize' api.py

Length of output: 154

Comment on lines +168 to +169
except Exception as e:
return HTTPException(status_code=500, detail=str(e))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding more detailed error handling for the drop_db function. Returning a generic 500 error for all exceptions may not be informative for the client. It would be beneficial to distinguish between different types of errors and handle them accordingly.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 2

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 98383b7 and 118613c.
Files selected for processing (1)
  • cognitive_architecture/config.py (1 hunks)
Additional comments: 3
cognitive_architecture/config.py (3)
  • 34-37: The environment variable GRAPH_DB_PW_PROD is used for the production password, but there is no equivalent for the development password. Verify if this is intentional or an oversight.

  • 57-57: The load method overrides class attributes with environment variables and then with config file values. Ensure that the order of precedence (environment variables over config file) is intentional and documented.

  • 57-57: The to_dict and from_dict methods are correctly implemented for converting the Config object to and from a dictionary representation. This is good for interoperability with other parts of the system that may require configuration data in dictionary format.

Comment on lines 34 to 53
if os.getenv('ENV') == 'prod' or os.getenv('ENV') == 'dev' or os.getenv('AWS_ENV') == 'dev' or os.getenv('AWS_ENV') == 'prd':
graph_database_url: str = os.getenv('GRAPH_DB_URL_PROD')
graph_database_username: str = os.getenv('GRAPH_DB_USER')
graph_database_password: str = os.getenv('GRAPH_DB_PW_PROD')
else:
graph_database_url: str = os.getenv('GRAPH_DB_URL')
graph_database_username: str = os.getenv('GRAPH_DB_USER')
graph_database_password: str = os.getenv('GRAPH_DB_PW')
weaviate_url: str = os.getenv('WEAVIATE_URL')
weaviate_api_key: str = os.getenv('WEAVIATE_API_KEY')
postgres_user: str = os.getenv('POSTGRES_USER')
postgres_password: str = os.getenv('POSTGRES_PASSWORD')
postgres_db: str = os.getenv('POSTGRES_DB')
if os.getenv('ENV') == 'prod' or os.getenv('ENV') == 'dev' or os.getenv('AWS_ENV') == 'dev' or os.getenv('AWS_ENV') == 'prd':
postgres_host: str = os.getenv('POSTGRES_PROD_HOST')
elif os.getenv('ENV') == 'docker':
postgres_host: str = os.getenv('POSTGRES_HOST_DOCKER')
elif os.getenv('ENV') == 'local':
postgres_host: str = os.getenv('POSTGRES_HOST_LOCAL')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conditional logic for setting database parameters based on environment variables is complex and may be error-prone. Consider refactoring to improve readability and maintainability.

- if os.getenv('ENV') == 'prod' or os.getenv('ENV') == 'dev' or os.getenv('AWS_ENV') == 'dev' or   os.getenv('AWS_ENV') == 'prd':
+ env = os.getenv('ENV')
+ aws_env = os.getenv('AWS_ENV')
+ is_prod_or_dev = env in ['prod', 'dev'] or aws_env in ['dev', 'prd']
+ if is_prod_or_dev:

Committable suggestion

IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
if os.getenv('ENV') == 'prod' or os.getenv('ENV') == 'dev' or os.getenv('AWS_ENV') == 'dev' or os.getenv('AWS_ENV') == 'prd':
graph_database_url: str = os.getenv('GRAPH_DB_URL_PROD')
graph_database_username: str = os.getenv('GRAPH_DB_USER')
graph_database_password: str = os.getenv('GRAPH_DB_PW_PROD')
else:
graph_database_url: str = os.getenv('GRAPH_DB_URL')
graph_database_username: str = os.getenv('GRAPH_DB_USER')
graph_database_password: str = os.getenv('GRAPH_DB_PW')
weaviate_url: str = os.getenv('WEAVIATE_URL')
weaviate_api_key: str = os.getenv('WEAVIATE_API_KEY')
postgres_user: str = os.getenv('POSTGRES_USER')
postgres_password: str = os.getenv('POSTGRES_PASSWORD')
postgres_db: str = os.getenv('POSTGRES_DB')
if os.getenv('ENV') == 'prod' or os.getenv('ENV') == 'dev' or os.getenv('AWS_ENV') == 'dev' or os.getenv('AWS_ENV') == 'prd':
postgres_host: str = os.getenv('POSTGRES_PROD_HOST')
elif os.getenv('ENV') == 'docker':
postgres_host: str = os.getenv('POSTGRES_HOST_DOCKER')
elif os.getenv('ENV') == 'local':
postgres_host: str = os.getenv('POSTGRES_HOST_LOCAL')
env = os.getenv('ENV')
aws_env = os.getenv('AWS_ENV')
is_prod_or_dev = env in ['prod', 'dev'] or aws_env in ['dev', 'prd']
if is_prod_or_dev:
graph_database_url: str = os.getenv('GRAPH_DB_URL_PROD')
graph_database_username: str = os.getenv('GRAPH_DB_USER')
graph_database_password: str = os.getenv('GRAPH_DB_PW_PROD')
else:
graph_database_url: str = os.getenv('GRAPH_DB_URL')
graph_database_username: str = os.getenv('GRAPH_DB_USER')
graph_database_password: str = os.getenv('GRAPH_DB_PW')
weaviate_url: str = os.getenv('WEAVIATE_URL')
weaviate_api_key: str = os.getenv('WEAVIATE_API_KEY')
postgres_user: str = os.getenv('POSTGRES_USER')
postgres_password: str = os.getenv('POSTGRES_PASSWORD')
postgres_db: str = os.getenv('POSTGRES_DB')
if is_prod_or_dev:
postgres_host: str = os.getenv('POSTGRES_PROD_HOST')
elif env == 'docker':
postgres_host: str = os.getenv('POSTGRES_HOST_DOCKER')
elif env == 'local':
postgres_host: str = os.getenv('POSTGRES_HOST_LOCAL')

elif os.getenv('ENV') == 'local':
postgres_host: str = os.getenv('POSTGRES_HOST_LOCAL')





Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The save method splits attribute names on the first underscore to determine the section name for the config file. This could lead to unexpected behavior if attribute names contain more than one underscore. Consider a more robust approach to determine section names.

…refactor and reformat issues, add boilerplate for loading jobs logic and extend search functionalities.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 21

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 118613c and 2928f51.
Files ignored due to filter (2)
  • poetry.lock
  • pyproject.toml
Files selected for processing (21)
  • .env.template (1 hunks)
  • Dockerfile (1 hunks)
  • api.py (1 hunks)
  • cognitive_architecture/classifiers/classifier.py (1 hunks)
  • cognitive_architecture/database/create_database.py (3 hunks)
  • cognitive_architecture/database/graph_database/graph.py (14 hunks)
  • cognitive_architecture/database/postgres/models/docs.py (1 hunks)
  • cognitive_architecture/database/postgres/models/memory.py (1 hunks)
  • cognitive_architecture/database/vectordb/chunkers/chunkers.py (2 hunks)
  • cognitive_architecture/database/vectordb/cognee_manager.py (1 hunks)
  • cognitive_architecture/database/vectordb/embeddings/embeddings.py (1 hunks)
  • cognitive_architecture/database/vectordb/embeddings/embeddings_type.py (1 hunks)
  • cognitive_architecture/database/vectordb/job.py (1 hunks)
  • cognitive_architecture/database/vectordb/loaders/loaders.py (1 hunks)
  • cognitive_architecture/database/vectordb/response.py (1 hunks)
  • cognitive_architecture/database/vectordb/vector_db_type.py (1 hunks)
  • cognitive_architecture/database/vectordb/vectordb.py (5 hunks)
  • cognitive_architecture/shared/language_processing.py (1 hunks)
  • cognitive_architecture/utils.py (3 hunks)
  • cognitive_architecture/vectorstore_manager.py (3 hunks)
  • main.py (1 hunks)
Files skipped from review due to trivial changes (1)
  • Dockerfile
Additional comments: 56
cognitive_architecture/database/vectordb/embeddings/embeddings_type.py (1)
  • 1-8: The EmbeddingsType enum is well-defined and covers a variety of embedding sources.
cognitive_architecture/database/vectordb/vector_db_type.py (1)
  • 1-13: The VectorDBType enum is well-defined and covers a variety of vector database types.
cognitive_architecture/database/vectordb/job.py (1)
  • 1-18: The Job class is well-defined with clear string representation methods.
cognitive_architecture/database/postgres/models/memory.py (1)
  • 12-18: The addition of the memory_category field to the MemoryModel is correctly implemented. Ensure that corresponding database migration scripts are in place to handle this schema change.
cognitive_architecture/database/vectordb/embeddings/embeddings.py (1)
  • 5-29: The Embeddings class is updated with new parameters and a serialize method. Ensure that the serialization logic correctly handles all new parameters and that the serialize method is tested for various embedding configurations.
cognitive_architecture/database/vectordb/response.py (1)
  • 4-72: The Response class is updated with new fields and class methods for JSON conversion. Ensure that the JSON conversion logic correctly handles all new fields and that the class methods are tested for various response scenarios.
cognitive_architecture/shared/language_processing.py (1)
  • 50-81: The translate_text function uses AWS Translate and has error handling for BotoCoreError and ClientError. Ensure that the error messages are logged and that the function is tested for various error scenarios.
cognitive_architecture/database/create_database.py (2)
  • 27-44: The environment variable handling has been updated to use AWS_ENV. Ensure that this change is reflected throughout the codebase and that the AWS_ENV variable is set in all required environments.

  • 74-83: The drop_database function has been added to handle database deletion. Ensure that this function is protected against accidental execution in production environments and that it is tested thoroughly.

+ # Add a safety check to prevent accidental drops in production
+ if os.environ.get('AWS_ENV') != 'local':
+     raise RuntimeError("Database dropping is not allowed in non-local environments")
cognitive_architecture/database/vectordb/chunkers/chunkers.py (1)
  • 1-1: The import path for ChunkStrategy has been updated. Verify that this new path is correct and that the ChunkStrategy is accessible from this new location.
Verification successful

The shell script output confirms the presence of the ChunkStrategy class within the chunk_strategy.py file. This verifies that the updated import path in chunkers.py is correct and that the ChunkStrategy class is indeed accessible from the new location.

  • : The import path for ChunkStrategy in chunkers.py is correct and the class is accessible.
Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Verify that the ChunkStrategy import is correct and accessible
rg --type py 'from cognitive_architecture.database.vectordb.chunkers.chunk_strategy import ChunkStrategy'
# Expecting to find the ChunkStrategy definition in the specified path

Length of output: 260



Script:

#!/bin/bash
# Check for the presence of the ChunkStrategy class within the chunk_strategy.py file
rg --type py 'class ChunkStrategy' cognitive_architecture/database/vectordb/chunkers/chunk_strategy.py
# Expecting to find the class definition to confirm the import is correct

Length of output: 131

cognitive_architecture/classifiers/classifier.py (1)
  • 76-104: The classify_call function has been updated with new parameters. Verify that all calls to this function have been updated to pass the correct arguments and that the function's logic correctly handles the new document_summaries parameter.
Verification successful

The verification process has confirmed that the calls to the classify_call function within the codebase have been updated to use the new document_summaries parameter. The function's logic in classifier.py and its invocation in main.py are consistent with the changes.

  • The classify_call function update has been properly handled in the codebase.
Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Verify that all calls to classify_call have been updated
rg --type py 'classify_call' --files-with-matches | xargs rg 'document_summaries'
# Expecting to find the updated calls with the new parameter

Length of output: 759

cognitive_architecture/database/vectordb/cognee_manager.py (4)
  • 33-42: The serialize method correctly converts instance attributes to a dictionary, excluding any attributes with None values. This is a clean and efficient way to prepare data for serialization.

  • 68-93: The get_job_statuses method is well-structured and handles different HTTP status codes appropriately. It uses the requests library to make a POST request and constructs the request payload and headers correctly.

  • 121-142: The get_job_status method correctly handles HTTP status codes and uses the requests library to make a GET request. It constructs the URL and headers appropriately and returns a Response object based on the JSON response.

  • 144-151: The generate_headers function constructs request headers and filters out any None values. Ensure that API keys and other sensitive information are securely managed and not hardcoded or exposed.

cognitive_architecture/database/vectordb/loaders/loaders.py (3)
  • 23-25: The fetch_text_content function uses the UnstructuredURLLoader to load text content from a URL. This is a straightforward use of an external loader class.

  • 27-46: The process_content function chunks data and translates it if necessary. It's important to ensure that the language detection and translation services are secure and efficient. Additionally, consider handling exceptions that may arise from these operations.

  • 48-92: The _document_loader function is an asynchronous function that handles document loading and processing. It correctly awaits other async functions and raises a ValueError for unsupported formats. Ensure that the error handling is consistent and that all resources are managed correctly.

cognitive_architecture/utils.py (2)
  • 10-13: The addition of AsyncSessionLocal and load_dotenv imports followed by the invocation of load_dotenv() is a common pattern for loading environment variables. Ensure that .env files are not included in version control for security reasons.

  • 222-229: The modification in the get_unsumarized_vector_db_namespace function to return memory_details and docs instead of just memory_names is a significant change. Ensure that all callers of this function are updated to handle the new return type.

api.py (12)
  • 1-17: The imports and initial setup of the FastAPI application look standard and correct. The use of environment variables for configuration is a good practice.

  • 19-23: The logging configuration is set globally, which is a common practice. However, it's important to ensure that this configuration does not override other logging configurations elsewhere in the application.

  • 48-55: The root endpoint is correctly implemented and returns a simple welcome message.

  • 58-63: The health check endpoint is implemented correctly and follows best practices by providing a simple status check.

  • 128-141: The /user-query-to-graph endpoint's error handling raises an HTTPException directly, which is consistent with FastAPI's practices.

  • 144-162: The /document-to-graph-db endpoint is implemented correctly, but ensure that the add_documents_to_graph_db function handles exceptions appropriately.

  • 164-176: The /cognitive-context-enrichment endpoint is implemented correctly, but ensure that the user_context_enrichment function handles exceptions appropriately.

  • 179-192: The /user-query-classifier endpoint is implemented correctly, but ensure that the classify_user_query function handles exceptions appropriately.

  • 238-266: The /create-public-memory endpoint is implemented correctly, but ensure that the create_public_memory function handles exceptions appropriately.

  • 269-292: The /attach-user-to-public-memory endpoint is implemented correctly, but ensure that the attach_user_to_memory function handles exceptions appropriately.

  • 295-313: The /unlink-user-from-public-memory endpoint is implemented correctly, but ensure that the unlink_user_from_memory function handles exceptions appropriately.

  • 315-329: The start_api_server function is implemented correctly, and the logging of exceptions is a good practice.

cognitive_architecture/database/vectordb/vectordb.py (3)
  • 149-181: The add_memories method has been updated with new logic for chunk processing and document loading. Ensure that the OpenAIEmbeddings and _document_loader functions are tested with the new logic.

  • 201-207: > Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [183-206]

The fetch_memories method has been extended to support different search types. The default assignment of search_type to 'hybrid' should be moved outside the method to ensure that the default value is set before the method is called, as previously commented.

  • 255-313: New search types 'summary' and 'summary_filter_by_object_name' have been added. Ensure that these new search types are properly tested and that the filtering logic is correct.
cognitive_architecture/vectorstore_manager.py (3)
  • 173-173: The addition of default values for job_id and memory_label in the create_memory method signature is a good practice for optional parameters. Ensure that all invocations of this method are updated to reflect these changes if they rely on the previous signature.

  • 254-254: The new memory_category parameter with a default value of 'PUBLIC' in the handle_new_memory method is a sensible addition for categorization purposes. Confirm that this default aligns with the business logic and that all calls to this method are compatible with the new parameter.

Verification successful

The search with rg has provided us with the locations where the handle_new_memory method is called within the vectorstore_manager.py file. From the output, we can see that the method is called twice, and in both instances, it is called with four parameters: user_id, session, job_id, and memory_name. The calls do not explicitly pass the new memory_category parameter, which means they rely on the default value of 'PUBLIC' as specified in the method's definition.

Given this information, we can conclude the following:

  • The handle_new_memory method has been updated to include a new parameter memory_category with a default value of 'PUBLIC'.
  • The method is called twice in the codebase, and both calls do not specify the memory_category parameter, thus defaulting to 'PUBLIC'.
  • There is no evidence from the provided output that the new parameter is causing any issues with existing calls to the method.

Based on the above points, the addition of the memory_category parameter with a default value seems to be correctly handled in the current codebase.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Verify that the `handle_new_memory` method is called correctly with the new parameter.
ast-grep --lang python --pattern $'handle_new_memory($_, $_, $_, $_, $_)'

Length of output: 74



Script:

#!/bin/bash
# Search for calls to the `handle_new_memory` method to check if it's being called with the correct number of parameters.
ast-grep --lang python --pattern $'handle_new_memory($_, $_, $$$)'

Length of output: 67



Script:

#!/bin/bash
# Use ripgrep to search for the string "handle_new_memory" to find any calls to the method.
rg "handle_new_memory"

Length of output: 554

* 278-278: Adding a new field `memory_category` to the `MemoryModel` instantiation within the `handle_new_memory` method. Ensure that the database schema has been updated to include this new field and that it is properly indexed if necessary for performance.
Verification successful

I will generate a new script to verify the inclusion of the memory_category field in the MemoryModel class using ast-grep with the correct language specification and pattern syntax.


I will generate a new script to search for the memory_category field directly within the files that define the MemoryModel class using rg.


The verification process has confirmed the presence of the memory_category field within the MemoryModel class definition in the file cognitive_architecture/database/postgres/models/memory.py. This aligns with the changes mentioned in the pull request.

  • The memory_category field is a column in the MemoryModel with the type String and is nullable.

Based on this information, the database schema has indeed been updated to include the new memory_category field as required by the changes in the vectorstore_manager.py file.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Verify that the database schema includes the new `memory_category` field.
rg --type python 'memory_category' --files-with-matches | xargs cat | ast-grep --lang python --pattern $'class MemoryModel { $$$ memory_category: str $$$ }'

Length of output: 531



Script:

#!/bin/bash
# Verify that the database schema includes the new `memory_category` field.
rg --type py 'class MemoryModel' --files-with-matches | xargs cat | ast-grep --lang python --pattern $'class MemoryModel { $$$ memory_category: str $$$ }'

Length of output: 500



Script:

#!/bin/bash
# Search for the 'memory_category' field within the 'MemoryModel' class definition.
rg --type py 'class MemoryModel' -A 10 | rg 'memory_category'

Length of output: 171

main.py (6)
  • 1-40: The import section is generally clean, but it's important to ensure that all imported modules and functions are used within the file to avoid unnecessary imports which can lead to confusion and slight performance issues.
Verification successful

The executed scripts have checked the usage of Optional and List from the typing module, as well as Neo4jError from the neo4j.exceptions module within main.py. The results show that these imports are indeed used in main.py.

However, the script output also includes matches from other files, which are not relevant to the verification of main.py. For the purpose of this task, we are only concerned with the usage of imports in main.py.

Based on the provided output:

  • Optional and List are used in function signatures and type hints within main.py.
  • Neo4jError is used in exception handling within main.py.

Given this information, the imports of Optional, List, and Neo4jError are justified and correctly used in main.py.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Verify that all imported modules are used in the file.
rg --type python --files-with-matches 'from typing import Optional, List' main.py | xargs -L1 rg 'Optional|List'
rg --type python --files-with-matches 'from neo4j.exceptions import Neo4jError' main.py | xargs -L1 rg 'Neo4jError'
# ... continue for all imports

Length of output: 21390

* 45-45:
  • 71-71:

  • 256-256:

  • 278-278:

  • 331-453: The function user_context_enrichment is complex and integrates various memory systems. Ensure that the logic is thoroughly tested, especially the dynamic memory class management and the exception handling.

cognitive_architecture/database/graph_database/graph.py (12)
  • 8-9: The addition of AsyncSession and Neo4jError imports indicates the use of asynchronous database sessions and improved error handling. Ensure that the rest of the codebase is compatible with these changes.

  • 121-122: The GraphQLQuery class is a straightforward model for encapsulating GraphQL queries. This is a good use of Pydantic models for type safety and validation.

  • 194-212: The Neo4jGraphDB class now uses GraphDatabase.driver for database connections, which is a standard practice. The exception handling in the query method has been improved to log errors and re-raise them, which is good for debugging and control flow.

  • 277-290: The cypher_statement_correcting method uses an AI client to check and correct Cypher query syntax. Ensure that the AI client is properly configured and that this functionality is covered by tests to verify its effectiveness.

  • 439-445: The retrieve_public_memory method is well-defined for fetching public memory nodes. It's important to ensure that the method is used consistently across the codebase and that the necessary indexes are in place for performance.

  • 736-772: The create_memory_node method is designed to create or find a memory node with specific labels and a description. It's crucial to validate the input parameters and handle potential database errors as done here.

  • 790-803: The delete_memory_node method correctly uses a Cypher query to delete a memory node by ID. It's good practice to validate inputs and handle exceptions as implemented.

  • 805-834: The unlink_memory_from_user method provides functionality to unlink a memory node from a user, which is a necessary feature for managing user data. The input validation and error handling are appropriate.

  • 627-705: > Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [604-673]

The create_document_node_cypher method generates a Cypher query to create a Document node and link it to a memory node. Ensure that the method is used correctly and that the input parameters are validated before the query is executed.

Verification successful

The create_document_node_cypher method is used in main.py and is defined in graph.py. To verify its correct usage, we need to check if the method calls in main.py align with the updated method signature and intended functionality.


The method calls to create_document_node_cypher in main.py match the updated method signature in graph.py. The calls correctly pass the required parameters, including document_summary, user_id, and conditionally public_memory_id or memory_type based on the context.

  • The first call passes classification, user_id, and public_memory_id when memory_type is 'PUBLIC'.
  • The second call passes classification, user_id, and sets memory_type to 'SemanticMemory'.
  • The commented-out call at the bottom suggests a previous or alternative usage where memory_type is set to "PUBLIC".

Based on the provided context, the usage of create_document_node_cypher in main.py appears to be consistent with the method's signature and intended functionality.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Verify that the `create_document_node_cypher` method is used correctly in the codebase.
fd '.*\.py' --exec grep -Hn 'create_document_node_cypher'

Length of output: 735



Script:

#!/bin/bash
# Check the usage of `create_document_node_cypher` in main.py and ensure it aligns with the updated method signature.
cat main.py | grep -A 5 -B 5 'create_document_node_cypher'

Length of output: 2126

* 675-704: The `update_document_node_with_db_ids` method updates the namespace of a Document node in the database. It's important to ensure that the document ID and namespace are properly sanitized before being included in the Cypher query to prevent injection attacks.
  • 733-1010: > Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [705-734]

The get_namespaces_by_document_category method is a useful addition for retrieving Vectordb namespaces for documents of a specified category. Ensure that the method is used consistently and that the category parameter is validated.

  • 736-854: The methods for managing memory nodes and their relationships, such as create_memory_node, link_user_to_public, delete_memory_node, and unlink_memory_from_user, are well-implemented with proper error handling and input validation.

main.py Outdated
Comment on lines 43 to 73

async def fetch_document_vectordb_namespace(session: AsyncSession, user_id: str, namespace_id:str, doc_id:str=None):
logging.info("user id is", user_id)
memory = await Memory.create_memory(user_id, session, namespace=namespace_id, memory_label=namespace_id)


# Managing memory attributes
existing_user = await Memory.check_existing_user(user_id, session)
print("here is the existing user", existing_user)
await memory.manage_memory_attributes(existing_user)
print("Namespace id is %s", namespace_id)
await memory.add_dynamic_memory_class(namespace_id.lower(), namespace_id)
namespace_class = namespace_id + "_class"

dynamic_memory_class = getattr(memory, namespace_class.lower(), None)

methods_to_add = ["add_memories", "fetch_memories", "delete_memories"]

if dynamic_memory_class is not None:
for method_name in methods_to_add:
await memory.add_method_to_class(dynamic_memory_class, method_name)
print(f"Memory method {method_name} has been added")
else:
print(f"No attribute named in memory.")

print("Available memory classes:", await memory.list_memory_classes())
result = await memory.dynamic_method_call(dynamic_memory_class, 'fetch_memories',
observation="placeholder", search_type="summary_filter_by_object_name", params=doc_id)
logging.info("Result is", result)

return result, namespace_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider replacing print statements with logging for better control over log levels and outputs, which is more suitable for production environments.

- print("here is the existing user", existing_user)
+ logging.debug("here is the existing user %s", existing_user)
# ... apply this change to all print statements

Committable suggestion

IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
async def fetch_document_vectordb_namespace(session: AsyncSession, user_id: str, namespace_id:str, doc_id:str=None):
logging.info("user id is", user_id)
memory = await Memory.create_memory(user_id, session, namespace=namespace_id, memory_label=namespace_id)
# Managing memory attributes
existing_user = await Memory.check_existing_user(user_id, session)
print("here is the existing user", existing_user)
await memory.manage_memory_attributes(existing_user)
print("Namespace id is %s", namespace_id)
await memory.add_dynamic_memory_class(namespace_id.lower(), namespace_id)
namespace_class = namespace_id + "_class"
dynamic_memory_class = getattr(memory, namespace_class.lower(), None)
methods_to_add = ["add_memories", "fetch_memories", "delete_memories"]
if dynamic_memory_class is not None:
for method_name in methods_to_add:
await memory.add_method_to_class(dynamic_memory_class, method_name)
print(f"Memory method {method_name} has been added")
else:
print(f"No attribute named in memory.")
print("Available memory classes:", await memory.list_memory_classes())
result = await memory.dynamic_method_call(dynamic_memory_class, 'fetch_memories',
observation="placeholder", search_type="summary_filter_by_object_name", params=doc_id)
logging.info("Result is", result)
return result, namespace_id
async def fetch_document_vectordb_namespace(session: AsyncSession, user_id: str, namespace_id:str, doc_id:str=None):
logging.info("user id is", user_id)
memory = await Memory.create_memory(user_id, session, namespace=namespace_id, memory_label=namespace_id)
# Managing memory attributes
existing_user = await Memory.check_existing_user(user_id, session)
logging.debug("here is the existing user %s", existing_user)
await memory.manage_memory_attributes(existing_user)
logging.debug("Namespace id is %s", namespace_id)
await memory.add_dynamic_memory_class(namespace_id.lower(), namespace_id)
namespace_class = namespace_id + "_class"
dynamic_memory_class = getattr(memory, namespace_class.lower(), None)
methods_to_add = ["add_memories", "fetch_memories", "delete_memories"]
if dynamic_memory_class is not None:
for method_name in methods_to_add:
await memory.add_method_to_class(dynamic_memory_class, method_name)
logging.debug(f"Memory method {method_name} has been added")
else:
logging.debug(f"No attribute named in memory.")
logging.debug("Available memory classes: %s", await memory.list_memory_classes())
result = await memory.dynamic_method_call(dynamic_memory_class, 'fetch_memories',
observation="placeholder", search_type="summary_filter_by_object_name", params=doc_id)
logging.info("Result is %s", result)
return result, namespace_id

Comment on lines +69 to +91
@app.post("/add-memory", response_model=dict)
async def add_memory(
payload: Payload,
# files: List[UploadFile] = File(...),
):
try:
logging.info(" Adding to Memory ")
decoded_payload = payload.payload
async with session_scope(session=AsyncSessionLocal()) as session:
from main import load_documents_to_vectorstore

if 'settings' in decoded_payload and decoded_payload['settings'] is not None:
settings_for_loader = decoded_payload['settings']
else:
settings_for_loader = None

if 'content' in decoded_payload and decoded_payload['content'] is not None:
content = decoded_payload['content']
else:
content = None

output = await load_documents_to_vectorstore(session, decoded_payload['user_id'], content=content, loader_settings=settings_for_loader)
return JSONResponse(content={"response": output}, status_code=200)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The /add-memory endpoint is well-structured. However, the error handling could be improved by providing more specific HTTP status codes and messages based on the exception type.

- return JSONResponse(content={"response": {"error": str(e)}}, status_code=503)
+ if isinstance(e, SomeSpecificException):
+     status_code = 400  # Bad Request or other appropriate status code
+ else:
+     status_code = 500  # Internal Server Error
+ return JSONResponse(content={"response": {"error": str(e)}}, status_code=status_code)

Committable suggestion

IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
@app.post("/add-memory", response_model=dict)
async def add_memory(
payload: Payload,
# files: List[UploadFile] = File(...),
):
try:
logging.info(" Adding to Memory ")
decoded_payload = payload.payload
async with session_scope(session=AsyncSessionLocal()) as session:
from main import load_documents_to_vectorstore
if 'settings' in decoded_payload and decoded_payload['settings'] is not None:
settings_for_loader = decoded_payload['settings']
else:
settings_for_loader = None
if 'content' in decoded_payload and decoded_payload['content'] is not None:
content = decoded_payload['content']
else:
content = None
output = await load_documents_to_vectorstore(session, decoded_payload['user_id'], content=content, loader_settings=settings_for_loader)
return JSONResponse(content={"response": output}, status_code=200)
@app.post("/add-memory", response_model=dict)
async def add_memory(
payload: Payload,
# files: List[UploadFile] = File(...),
):
try:
logging.info(" Adding to Memory ")
decoded_payload = payload.payload
async with session_scope(session=AsyncSessionLocal()) as session:
from main import load_documents_to_vectorstore
if 'settings' in decoded_payload and decoded_payload['settings'] is not None:
settings_for_loader = decoded_payload['settings']
else:
settings_for_loader = None
if 'content' in decoded_payload and decoded_payload['content'] is not None:
content = decoded_payload['content']
else:
content = None
output = await load_documents_to_vectorstore(session, decoded_payload['user_id'], content=content, loader_settings=settings_for_loader)
return JSONResponse(content={"response": output}, status_code=200)
except Exception as e:
if isinstance(e, SomeSpecificException):
status_code = 400 # Bad Request or other appropriate status code
else:
status_code = 500 # Internal Server Error
return JSONResponse(content={"response": {"error": str(e)}}, status_code=status_code)

Comment on lines +9 to +31
class CogneeManager:
def __init__(self, embeddings: Embeddings = None,
vector_db: VectorDB = None,
vector_db_key: str = None,
embedding_api_key: str = None,
webhook_url: str = None,
lines_per_batch: int = 1000,
webhook_key: str = None,
document_id: str = None,
chunk_validation_url: str = None,
internal_api_key: str = "test123",
base_url="http://localhost:8000"):
self.embeddings = embeddings if embeddings else Embeddings()
self.vector_db = vector_db if vector_db else VectorDB()
self.webhook_url = webhook_url
self.lines_per_batch = lines_per_batch
self.webhook_key = webhook_key
self.document_id = document_id
self.chunk_validation_url = chunk_validation_url
self.vector_db_key = vector_db_key
self.embeddings_api_key = embedding_api_key
self.internal_api_key = internal_api_key
self.base_url = base_url
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constructor of CogneeManager class initializes several instance variables. It's important to ensure that the default values provided, such as internal_api_key and base_url, are appropriate and secure. The use of a default API key ("test123") in production code is a security risk and should be replaced with a secure method of configuration.

- internal_api_key: str = "test123",
+ internal_api_key: str = None,

[SECURITY]


Committable suggestion

IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
class CogneeManager:
def __init__(self, embeddings: Embeddings = None,
vector_db: VectorDB = None,
vector_db_key: str = None,
embedding_api_key: str = None,
webhook_url: str = None,
lines_per_batch: int = 1000,
webhook_key: str = None,
document_id: str = None,
chunk_validation_url: str = None,
internal_api_key: str = "test123",
base_url="http://localhost:8000"):
self.embeddings = embeddings if embeddings else Embeddings()
self.vector_db = vector_db if vector_db else VectorDB()
self.webhook_url = webhook_url
self.lines_per_batch = lines_per_batch
self.webhook_key = webhook_key
self.document_id = document_id
self.chunk_validation_url = chunk_validation_url
self.vector_db_key = vector_db_key
self.embeddings_api_key = embedding_api_key
self.internal_api_key = internal_api_key
self.base_url = base_url
class CogneeManager:
def __init__(self, embeddings: Embeddings = None,
vector_db: VectorDB = None,
vector_db_key: str = None,
embedding_api_key: str = None,
webhook_url: str = None,
lines_per_batch: int = 1000,
webhook_key: str = None,
document_id: str = None,
chunk_validation_url: str = None,
internal_api_key: str = None,
base_url="http://localhost:8000"):
self.embeddings = embeddings if embeddings else Embeddings()
self.vector_db = vector_db if vector_db else VectorDB()
self.webhook_url = webhook_url
self.lines_per_batch = lines_per_batch
self.webhook_key = webhook_key
self.document_id = document_id
self.chunk_validation_url = chunk_validation_url
self.vector_db_key = vector_db_key
self.embeddings_api_key = embedding_api_key
self.internal_api_key = internal_api_key
self.base_url = base_url

Comment on lines +4 to +18
ENV = docker
POSTGRES_USER = bla
POSTGRES_PASSWORD = bla
POSTGRES_DB = bubu
POSTGRES_HOST = localhost
POSTGRES_HOST_DOCKER = postgres
SEGMENT_KEY = Etl4WJwzOkeDPAjaOXOMgyU16hO7mV7B
COG_ARCH_DIR = cognitive_architecture
GRAPH_DB_URL =
GRAPH_DB_PW =
GRAPH_DB_USER =
GRAPH_DB_URL =
GRAPH_DB_PW =
GRAPH_DB_USER =
AWS_ACCESS_KEY_ID =
AWS_SECRET_ACCESS_KEY =
QDRANT_API_KEY
QDRANT_API_URL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 NOTE
Codebase verification is a beta feature.

The search results indicate that the ENVIRONMENT variable is still being used in several places within the codebase:

  • entrypoint.sh
  • README.md (twice, in instructions)
  • iterations/level_3/create_database.py
  • iterations/level_3/entrypoint.sh
  • iterations/level_3/database/database.py
  • iterations/level_2/entrypoint.sh

Given this information, the renaming of ENVIRONMENT to ENV in the .env.template file is not consistently applied across the entire project. The references to ENVIRONMENT in the README.md are likely instructional and may need to be updated to reflect the changes in the .env.template file. The occurrences in the scripts and Python files suggest that the code still expects an ENVIRONMENT variable to be set.

Based on these findings, the following issues have been identified:

  • The ENVIRONMENT variable is still being used in multiple places, which could lead to inconsistencies or errors if the environment variable is expected to be ENV now.
  • The README.md file contains outdated instructions regarding the ENVIRONMENT variable.
Analysis chain

Ensure that the new AWS and QDRANT API keys are handled securely and are not exposed in the codebase or version control. Verify that the renaming of ENVIRONMENT to ENV is consistently applied across the entire project.


Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Verify that ENV is used instead of ENVIRONMENT throughout the codebase
rg --type env 'ENVIRONMENT'
# Expecting no results for ENVIRONMENT, only ENV should be present

Length of output: 55



Script:

#!/bin/bash
# Verify that ENV is used instead of ENVIRONMENT throughout the codebase
# This time, search all text files without specifying a file type
rg 'ENVIRONMENT'
# Expecting no results for ENVIRONMENT, only ENV should be present

Length of output: 495

Comment on lines +44 to +66
def upload(self, file_paths: list[str], base_url=None):
if base_url:
url = base_url + "/jobs"
else:
url = self.base_url + "/jobs"

data = self.serialize()
headers = self.generate_headers()
multipart_form_data = [('file', (os.path.basename(filepath), open(filepath, 'rb'), 'application/octet-stream'))
for filepath in file_paths]

print(f"embedding {len(file_paths)} documents at {url}")
response = requests.post(url, files=multipart_form_data, headers=headers, stream=True, data=data)

if response.status_code == 500:
print(response.text)
return Response(error=response.text, status_code=response.status_code)

response_json = response.json()
if response.status_code >= 400 and response.status_code < 500:
print(f"Error: {response_json['error']}")

return Response.from_json(response_json, response.status_code)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The upload method opens files for reading without ensuring they are closed, which can lead to resource leaks. Use a context manager to ensure files are properly closed.

- multipart_form_data = [('file', (os.path.basename(filepath), open(filepath, 'rb'), 'application/octet-stream'))
+ multipart_form_data = [('file', (os.path.basename(filepath), open(filepath, 'rb'), 'application/octet-stream')) for filepath in file_paths]
+ with contextlib.ExitStack() as stack:
+     files = [stack.enter_context(open(filepath, 'rb')) for filepath in file_paths]
+     multipart_form_data = [('file', (os.path.basename(filepath), file, 'application/octet-stream')) for filepath, file in zip(file_paths, files)]

Committable suggestion

IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
def upload(self, file_paths: list[str], base_url=None):
if base_url:
url = base_url + "/jobs"
else:
url = self.base_url + "/jobs"
data = self.serialize()
headers = self.generate_headers()
multipart_form_data = [('file', (os.path.basename(filepath), open(filepath, 'rb'), 'application/octet-stream'))
for filepath in file_paths]
print(f"embedding {len(file_paths)} documents at {url}")
response = requests.post(url, files=multipart_form_data, headers=headers, stream=True, data=data)
if response.status_code == 500:
print(response.text)
return Response(error=response.text, status_code=response.status_code)
response_json = response.json()
if response.status_code >= 400 and response.status_code < 500:
print(f"Error: {response_json['error']}")
return Response.from_json(response_json, response.status_code)
def upload(self, file_paths: list[str], base_url=None):
if base_url:
url = base_url + "/jobs"
else:
url = self.base_url + "/jobs"
data = self.serialize()
headers = self.generate_headers()
with contextlib.ExitStack() as stack:
files = [stack.enter_context(open(filepath, 'rb')) for filepath in file_paths]
multipart_form_data = [('file', (os.path.basename(filepath), file, 'application/octet-stream')) for filepath, file in zip(file_paths, files)]
print(f"embedding {len(file_paths)} documents at {url}")
response = requests.post(url, files=multipart_form_data, headers=headers, stream=True, data=data)
if response.status_code == 500:
print(response.text)
return Response(error=response.text, status_code=response.status_code)
response_json = response.json()
if response.status_code >= 400 and response.status_code < 500:
print(f"Error: {response_json['error']}")
return Response.from_json(response_json, response.status_code)

Comment on lines +341 to 345


async def delete_memories(self, namespace:str, params: dict = None):
if namespace is None:
namespace = self.namespace
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The delete_memories method should handle exceptions and provide appropriate logging or error messages.

+ try:
+     # existing deletion logic
+ except Exception as e:
+     logging.error(f"Failed to delete memories: {e}")
+     raise

Committable suggestion

IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
async def delete_memories(self, namespace:str, params: dict = None):
if namespace is None:
namespace = self.namespace
async def delete_memories(self, namespace:str, params: dict = None):
try:
if namespace is None:
namespace = self.namespace
# existing deletion logic
except Exception as e:
logging.error(f"Failed to delete memories: {e}")
raise

main.py Outdated
Comment on lines 455 to 491
async def create_public_memory(user_id: str=None, labels:list=None, topic:str=None) -> Optional[int]:
"""
Create a public memory node associated with a user in a Neo4j graph database.
If Public Memory exists, it will return the id of the memory.
This is intended as standalone node that can be attached to any user.
It is not attached to any user by default.

Args:
user_id (str): The unique identifier for the user.
session (AsyncSession): An asynchronous session for database operations.

Returns:
Optional[int]: The ID of the created public memory node or None if an error occurs.
:param labels: Label for the memory, to help filter for different countries
:param topic: Topic for the memory, to help provide a name

"""
# Validate input parameters
if not labels:
labels = ['sr'] # Labels for the memory node

if not topic:
topic = "PublicMemory"


try:
neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url,
username=config.graph_database_username,
password=config.graph_database_password)

# Assuming the topic for public memory is predefined, e.g., "PublicMemory"
# Create the memory node
memory_id = await neo4j_graph_db.create_memory_node(labels=labels, topic=topic)
return memory_id
except Neo4jError as e:
logging.error(f"Error creating public memory node: {e}")
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function create_public_memory returns None in case of a Neo4jError. Consider if this is the best error handling strategy for the context in which this function is used. It might be better to re-raise the exception or handle it differently.

main.py Outdated
Comment on lines 573 to 720
# CREATE (episodic)-[:HAS_EVENT]->(location5_5e43f4c45b3c44ea897c12220db4c051)
# CREATE (buffer)-[:CURRENTLY_HOLDING]->(location5_5e43f4c45b3c44ea897c12220db4c051)
# CREATE (episodic)-[:HAS_EVENT]->(object6_5cdb87ad488c450c9dbce07b7daf3d8d)
# CREATE (buffer)-[:CURRENTLY_HOLDING]->(object6_5cdb87ad488c450c9dbce07b7daf3d8d)
# CREATE (episodic)-[:HAS_EVENT]->(information7_f756e3f3720c4fe5aeb01287badaf088)
# CREATE (buffer)-[:CURRENTLY_HOLDING]->(information7_f756e3f3720c4fe5aeb01287badaf088)
# CREATE (episodic)-[:HAS_EVENT]->(event8_da6334e744454264900296319e14b532)
# CREATE (buffer)-[:CURRENTLY_HOLDING]->(event8_da6334e744454264900296319e14b532)
# CREATE (episodic)-[:HAS_EVENT]->(action9_48e45419604e4d66b3e718ee1d6c095f)
# CREATE (buffer)-[:CURRENTLY_HOLDING]->(action9_48e45419604e4d66b3e718ee1d6c095f)
# CREATE (episodic)-[:HAS_EVENT]->(action10_f48acb1db4da4934afbe17363e9e63a4)
# CREATE (buffer)-[:CURRENTLY_HOLDING]->(action10_f48acb1db4da4934afbe17363e9e63a4)"""
#
# out = cypher_statement_correcting(query)
# print(out)
#
# out = await user_query_to_graph_db(session, user_id, "I walked in the forest yesterday and added to my list I need to buy some milk in the store and get a summary from a classical book i read yesterday")
# print(out)
# load_doc_to_graph = await add_documents_to_graph_db(session, user_id)
# print(load_doc_to_graph)
# user_id = 'user'
# loader_settings = {
# "format": "PDF",
# "source": "DEVICE",
# "path": [".data"]
# }
# await load_documents_to_vectorstore(session, user_id, loader_settings=loader_settings)
# await create_public_memory(user_id=user_id, labels=['sr'], topic="PublicMemory")
# await add_documents_to_graph_db(session, user_id)
#
neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url, username=config.graph_database_username,
password=config.graph_database_password)

# await attach_user_to_memory(user_id=user_id, labels=['sr'], topic="PublicMemory")

# return_ = await user_context_enrichment(user_id=user_id, query="Koja je minimalna visina ograde na balkonu na stambenom objektu", session=session)
# print(return_)

# document_summary = {
# 'DocumentCategory': 'Science',
# 'Title': 'The Future of AI',
# 'Summary': 'An insightful article about the advancements in AI.',
# 'd_id': 'doc123'
# }
#
# # Example user ID
# user_id = 'user'
#
# # value = await neo4j_graph_db.create_memory_node(labels=['sr'])
# # print(value)
# # neo4j_graph_db.close()
#
# await add_documents_to_graph_db(session, user_id)
# neo4j_graph_db.link_public_memory_to_user(memory_id = 17,user_id=user_id)
#
ids = neo4j_graph_db.retrieve_node_id_for_memory_type(topic="Document")
print(ids)

for id in ids:
print(id.get('memoryId'))

neo4j_graph_db.delete_memory_node(memory_id = id.get('memoryId'), topic="Document")
#
# neo4j_graph_db.delete_memory_node(memory_id=16, topic="PublicSerbianArchitecture")
# neo4j_graph_db.unlink_memory_from_user(memory_id = 17,user_id=user_id)
# cypher_query_public = neo4j_graph_db.create_document_node_cypher(document_summary, user_id, memory_type="PUBLIC")
# neo4j_graph_db.query(cypher_query_public)
# link_memory_to_user(user_id, session)

# neo4j_graph_db.create_memory_node(labels=['sr'])
# out = await get_vectordb_namespace(session, user_id)
# params = {
# "version": "1.0",
# "agreement_id": "AG123456",
# "privacy_policy": "https://example.com/privacy",
# "terms_of_service": "https://example.com/terms",
# "format": "json",
# "schema_version": "1.1",
# "checksum": "a1b2c3d4e5f6",
# "owner": "John Doe",
# "license": "MIT",
# "validity_start": "2023-08-01",
# "validity_end": "2024-07-31",
# }
# loader_settings = {
# "format": "PDF",
# "source": "DEVICE",
# "path": [".data"],
# "strategy": "SUMMARY",
# }
# await load_documents_to_vectorstore(session, user_id, loader_settings=loader_settings)
# await user_query_to_graph_db(session, user_id, "I walked in the forest yesterday and added to my list I need to buy some milk in the store and get a summary from a classical book i read yesterday")
# await add_documents_to_graph_db(session, user_id, loader_settings=loader_settings)
# await user_context_enrichment(session, user_id, query="Tell me about the book I read yesterday")


if __name__ == "__main__":
import asyncio

asyncio.run(main())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main function is commented out. If this is intentional and the code is not needed, consider removing it to avoid confusion. If it's meant to be part of the application, uncomment and ensure it's being used correctly.

Comment on lines +15 to +46
def detect_language(text):
"""
Detect the language of the given text and return its ISO 639-1 language code.
If the detected language is Croatian ('hr'), it maps to Serbian ('sr').
The text is trimmed to the first 100 characters for efficient processing.

Parameters:
text (str): The text for language detection.

Returns:
str: The ISO 639-1 language code of the detected language, or 'None' in case of an error.
"""

# Trim the text to the first 100 characters
trimmed_text = text[:100]

try:
# Detect the language using langdetect
detected_lang_iso639_1 = detect(trimmed_text)
logging.info(f"Detected ISO 639-1 code: {detected_lang_iso639_1}")

# Special case: map 'hr' (Croatian) to 'sr' (Serbian ISO 639-2)
if detected_lang_iso639_1 == 'hr':
return 'sr'
return detected_lang_iso639_1

except LangDetectException as e:
logging.error(f"Language detection error: {e}")
except Exception as e:
logging.error(f"Unexpected error: {e}")

return -1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The detect_language function has an error path that returns -1, which is inconsistent with the documented return type of a string. Consider returning None or an empty string instead to maintain consistency with the return type.

-    return -1
+    return None  # or return an empty string if that's more appropriate

Committable suggestion

IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
def detect_language(text):
"""
Detect the language of the given text and return its ISO 639-1 language code.
If the detected language is Croatian ('hr'), it maps to Serbian ('sr').
The text is trimmed to the first 100 characters for efficient processing.
Parameters:
text (str): The text for language detection.
Returns:
str: The ISO 639-1 language code of the detected language, or 'None' in case of an error.
"""
# Trim the text to the first 100 characters
trimmed_text = text[:100]
try:
# Detect the language using langdetect
detected_lang_iso639_1 = detect(trimmed_text)
logging.info(f"Detected ISO 639-1 code: {detected_lang_iso639_1}")
# Special case: map 'hr' (Croatian) to 'sr' (Serbian ISO 639-2)
if detected_lang_iso639_1 == 'hr':
return 'sr'
return detected_lang_iso639_1
except LangDetectException as e:
logging.error(f"Language detection error: {e}")
except Exception as e:
logging.error(f"Unexpected error: {e}")
return -1
def detect_language(text):
"""
Detect the language of the given text and return its ISO 639-1 language code.
If the detected language is Croatian ('hr'), it maps to Serbian ('sr').
The text is trimmed to the first 100 characters for efficient processing.
Parameters:
text (str): The text for language detection.
Returns:
str: The ISO 639-1 language code of the detected language, or 'None' in case of an error.
"""
# Trim the text to the first 100 characters
trimmed_text = text[:100]
try:
# Detect the language using langdetect
detected_lang_iso639_1 = detect(trimmed_text)
logging.info(f"Detected ISO 639-1 code: {detected_lang_iso639_1}")
# Special case: map 'hr' (Croatian) to 'sr' (Serbian ISO 639-2)
if detected_lang_iso639_1 == 'hr':
return 'sr'
return detected_lang_iso639_1
except LangDetectException as e:
logging.error(f"Language detection error: {e}")
except Exception as e:
logging.error(f"Unexpected error: {e}")
return None # or return an empty string if that's more appropriate

Comment on lines +32 to 40
# try:
# pages = text_splitter.create_documents([source_data])
# except:
# try:
pages = text_splitter.create_documents([source_data])
# except:
# pages = text_splitter.create_documents(source_data.content)
# pages = source_data.load_and_split()
return pages
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commented-out exception handling block in the vanilla_chunker function should be removed or updated to reflect the current exception handling strategy.

-    # try:
-    #     pages = text_splitter.create_documents([source_data])
-    # except:
-    # try:
-    #     pages = text_splitter.create_documents(source_data.content)
-    # except:
-    #     pages = text_splitter.create_documents(source_data.content)

Committable suggestion

IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
# try:
# pages = text_splitter.create_documents([source_data])
# except:
# try:
pages = text_splitter.create_documents([source_data])
# except:
# pages = text_splitter.create_documents(source_data.content)
# pages = source_data.load_and_split()
return pages
pages = text_splitter.create_documents([source_data])
return pages

…refactor and reformat issues, add boilerplate for loading jobs logic and extend search functionalities.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 2928f51 and be3f179.
Files selected for processing (1)
  • Dockerfile (1 hunks)
Files skipped from review as they are similar to previous changes (1)
  • Dockerfile

…refactor and reformat issues, add boilerplate for loading jobs logic and extend search functionalities.
…refactor and reformat issues, add boilerplate for loading jobs logic and extend search functionalities.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between be3f179 and 3632a85.
Files ignored due to filter (2)
  • poetry.lock
  • pyproject.toml
Files selected for processing (1)
  • entrypoint.sh (1 hunks)
Files skipped from review due to trivial changes (1)
  • entrypoint.sh

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 22

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 7afe9a7 and 0323ab6.
Files selected for processing (5)
  • cognitive_architecture/classifiers/classifier.py (1 hunks)
  • cognitive_architecture/database/create_database_tst.py (1 hunks)
  • cognitive_architecture/database/graph_database/graph.py (14 hunks)
  • cognitive_architecture/database/vectordb/basevectordb.py (1 hunks)
  • cognitive_architecture/database/vectordb/vectordb.py (6 hunks)
Additional comments: 8
cognitive_architecture/classifiers/classifier.py (1)
  • 76-109: The classify_call function has been modified to accept document_summaries instead of query, context, and document_types. The JSON structure and function invocation have been updated accordingly. The return value has been changed from the classifier value to the classifier ID. Ensure that all calls to this function throughout the codebase have been updated to match the new signature and that the change in return value is handled appropriately.
Verification successful

Based on the information gathered from the executed shell scripts, the classify_call function has been updated in cognitive_architecture/classifiers/classifier.py and is correctly used with the new signature in main.py. There are no other occurrences of the function in the codebase, which indicates that the changes have been properly reflected where necessary. The old function signature does not appear to be used anymore.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Search to find files that import classifier module. Expecting list of dependent files.
rg -C3 --type python $'from cognitive_architecture.classifiers import classifier'
# Search old signature to find omissions. Expecting no results.
ast-grep --lang python --pattern $'classify_call($_, $_, $_)'
# Search new signature to validate usage. Expecting list of locations where the new signature is used.
ast-grep --lang python --pattern $'classify_call($_, $_)'

Length of output: 342



Script:

#!/bin/bash
# Correct the search to find files that import classifier module without specifying the file type.
rg -C3 'from cognitive_architecture.classifiers import classifier'
# Search for any other usages of the new `classify_call` function signature.
ast-grep --lang python --pattern $'classify_call($_, $_)'
# Search for any usages of the old `classify_call` function signature to ensure it's not used anymore.
ast-grep --lang python --pattern $'classify_call($_, $_, $_)'

Length of output: 297



Script:

#!/bin/bash
# Search for any file that might be using the `classify_call` function, regardless of the import statement.
rg 'classify_call' --type py

Length of output: 310

cognitive_architecture/database/vectordb/basevectordb.py (1)
  • 281-281: The change in the hunk modifies the logging statement for the search type by adding str() around the search_type variable. This is a good practice to ensure that the variable is converted to a string before being concatenated in the logging message.
cognitive_architecture/database/vectordb/vectordb.py (3)
  • 63-63: The init_weaviate method has been updated to include a retriever_type parameter. Ensure that this change is reflected wherever the method is called and that the new parameter is used correctly.

  • 152-179: The add_memories method has been updated to handle document loading and chunking. Ensure that the changes are compatible with the expected document structure and that the chunk_data function is implemented correctly.

  • 252-316: > Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [183-313]

The fetch_memories method has been modified to include different search types. Verify that the new search types are implemented correctly and that the method's callers are updated to handle the new functionality.

cognitive_architecture/database/graph_database/graph.py (3)
  • 8-9: New imports from neo4j have been added, which suggests that the application is interacting with a Neo4j database. Ensure that these new dependencies are included in the project's dependency management files (e.g., requirements.txt or Pipfile).

  • 361-364: The commented-out code for corrected_cypher_statements is related to a previous comment about uncommenting or removing the code. Since the comment is still valid, we can skip generating a new comment.

  • 805-834: The method unlink_memory_from_user is correctly using parameterized queries, which is a good practice for security. However, ensure that the memory_id is validated as an integer before using it in the query to prevent type-related issues.

Comment on lines +1 to +5
# this is needed to import classes from other modules
# script_dir = os.path.dirname(os.path.abspath(__file__))
# # Get the parent directory of your script and add it to sys.path
# parent_dir = os.path.dirname(script_dir)
# sys.path.append(parent_dir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commented-out code for setting the script directory and appending the parent directory to sys.path is still present. As previously mentioned, if this code is no longer needed, it should be removed to avoid confusion.

- # script_dir = os.path.dirname(os.path.abspath(__file__))
- # # Get the parent directory of your script and add it to sys.path
- # parent_dir = os.path.dirname(script_dir)
- # sys.path.append(parent_dir)

Committable suggestion

IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
# this is needed to import classes from other modules
# script_dir = os.path.dirname(os.path.abspath(__file__))
# # Get the parent directory of your script and add it to sys.path
# parent_dir = os.path.dirname(script_dir)
# sys.path.append(parent_dir)
# this is needed to import classes from other modules

Comment on lines +12 to +14
# from cognitive_architecture.config import Config
# config = Config()
# config.load()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commented-out code for loading configuration using a Config object is still present. As previously mentioned, if the new pattern is to use environment variables directly, this commented-out code should be removed to prevent confusion.

- # from cognitive_architecture.config import Config
- # config = Config()
- # config.load()

Committable suggestion

IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
# from cognitive_architecture.config import Config
# config = Config()
# config.load()

Comment on lines +30 to +44
if os.environ.get('AWS_ENV') == 'prd' or os.environ.get('AWS_ENV') == 'dev':
host = os.environ.get('POSTGRES_HOST')
username = os.environ.get('POSTGRES_USER')
password = os.environ.get('POSTGRES_PASSWORD')
database_name = os.environ.get('POSTGRES_DB')
elif os.environ.get('AWS_ENV') == 'local':
host = os.environ.get('POSTGRES_HOST')
username = os.environ.get('POSTGRES_USER')
password = os.environ.get('POSTGRES_PASSWORD')
database_name = os.environ.get('POSTGRES_DB')
else:
host = os.environ.get('POSTGRES_HOST')
username = os.environ.get('POSTGRES_USER')
password = os.environ.get('POSTGRES_PASSWORD')
database_name = os.environ.get('POSTGRES_DB')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code for setting database connection parameters based on the AWS_ENV environment variable is still redundant across the prd, dev, and local cases. This can be simplified since the assignments are identical in each case.

- if os.environ.get('AWS_ENV') == 'prd' or os.environ.get('AWS_ENV') == 'dev':
-     host = os.environ.get('POSTGRES_HOST')
-     username = os.environ.get('POSTGRES_USER')
-     password = os.environ.get('POSTGRES_PASSWORD')
-     database_name = os.environ.get('POSTGRES_DB')
- elif os.environ.get('AWS_ENV') == 'local':
-     host = os.environ.get('POSTGRES_HOST')
-     username = os.environ.get('POSTGRES_USER')
-     password = os.environ.get('POSTGRES_PASSWORD')
-     database_name = os.environ.get('POSTGRES_DB')
- else:
-     host = os.environ.get('POSTGRES_HOST')
-     username = os.environ.get('POSTGRES_USER')
-     password = os.environ.get('POSTGRES_PASSWORD')
-     database_name = os.environ.get('POSTGRES_DB')
+ host = os.environ.get('POSTGRES_HOST')
+ username = os.environ.get('POSTGRES_USER')
+ password = os.environ.get('POSTGRES_PASSWORD')
+ database_name = os.environ.get('POSTGRES_DB')

Committable suggestion

IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
if os.environ.get('AWS_ENV') == 'prd' or os.environ.get('AWS_ENV') == 'dev':
host = os.environ.get('POSTGRES_HOST')
username = os.environ.get('POSTGRES_USER')
password = os.environ.get('POSTGRES_PASSWORD')
database_name = os.environ.get('POSTGRES_DB')
elif os.environ.get('AWS_ENV') == 'local':
host = os.environ.get('POSTGRES_HOST')
username = os.environ.get('POSTGRES_USER')
password = os.environ.get('POSTGRES_PASSWORD')
database_name = os.environ.get('POSTGRES_DB')
else:
host = os.environ.get('POSTGRES_HOST')
username = os.environ.get('POSTGRES_USER')
password = os.environ.get('POSTGRES_PASSWORD')
database_name = os.environ.get('POSTGRES_DB')
host = os.environ.get('POSTGRES_HOST')
username = os.environ.get('POSTGRES_USER')
password = os.environ.get('POSTGRES_PASSWORD')
database_name = os.environ.get('POSTGRES_DB')

Comment on lines +53 to +60
def database_exists(username, password, host, db_name):
engine = create_admin_engine(username, password, host, db_name)
connection = engine.connect()
query = text(f"SELECT 1 FROM pg_database WHERE datname='{db_name}'")
result = connection.execute(query).fetchone()
connection.close()
engine.dispose()
return result is not None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The database_exists function constructs a raw SQL query using string formatting, which could be vulnerable to SQL injection if the db_name variable is not properly sanitized. As previously mentioned, use query parameters to mitigate this risk.

- query = text(f"SELECT 1 FROM pg_database WHERE datname='{db_name}'")
+ query = text("SELECT 1 FROM pg_database WHERE datname=:db_name")
+ result = connection.execute(query, {'db_name': db_name}).fetchone()

Committable suggestion

IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
def database_exists(username, password, host, db_name):
engine = create_admin_engine(username, password, host, db_name)
connection = engine.connect()
query = text(f"SELECT 1 FROM pg_database WHERE datname='{db_name}'")
result = connection.execute(query).fetchone()
connection.close()
engine.dispose()
return result is not None
def database_exists(username, password, host, db_name):
engine = create_admin_engine(username, password, host, db_name)
connection = engine.connect()
query = text("SELECT 1 FROM pg_database WHERE datname=:db_name")
result = connection.execute(query, {'db_name': db_name}).fetchone()
connection.close()
engine.dispose()
return result is not None

Comment on lines +63 to +71
def create_database(username, password, host, db_name):
engine = create_admin_engine(username, password, host, db_name)
connection = engine.raw_connection()
connection.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT)
cursor = connection.cursor()
cursor.execute(f"CREATE DATABASE {db_name}")
cursor.close()
connection.close()
engine.dispose()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The create_database function uses string formatting to construct a SQL command, which could be vulnerable to SQL injection. As previously mentioned, consider using safer methods to execute SQL commands.

- cursor.execute(f"CREATE DATABASE {db_name}")
+ cursor.execute("CREATE DATABASE %s", (db_name,))

Committable suggestion

IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
def create_database(username, password, host, db_name):
engine = create_admin_engine(username, password, host, db_name)
connection = engine.raw_connection()
connection.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT)
cursor = connection.cursor()
cursor.execute(f"CREATE DATABASE {db_name}")
cursor.close()
connection.close()
engine.dispose()
def create_database(username, password, host, db_name):
engine = create_admin_engine(username, password, host, db_name)
connection = engine.raw_connection()
connection.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT)
cursor = connection.cursor()
cursor.execute("CREATE DATABASE %s", (db_name,))
cursor.close()
connection.close()
engine.dispose()

Comment on lines +194 to +196
# self.graph = Neo4jGraph(url=url, username=username, password=password)
from neo4j import GraphDatabase
self.driver = GraphDatabase.driver(url, auth=(username, password))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commented-out line # self.graph = Neo4jGraph(url=url, username=username, password=password) suggests that there was a previous implementation that has been replaced. Ensure that this commented-out code is no longer needed and remove it to avoid confusion.

Comment on lines 372 to 378
def delete_all_user_memories(self, user_id):
try:
# Check if the user exists
user_exists = self.graph.query(f"MATCH (user:User {{userId: '{user_id}'}}) RETURN user")
user_exists = self.query(f"MATCH (user:User {{userId: '{user_id}'}}) RETURN user")
if not user_exists:
return f"No user found with ID: {user_id}"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [375-389]

The method delete_all_user_memories uses string formatting to inject the user_id directly into the Cypher query, which could lead to Cypher injection if the user_id is not properly sanitized. Use parameterization to prevent potential security issues.

- user_exists = self.query(f"MATCH (user:User {{userId: '{user_id}'}}) RETURN user")
+ user_exists = self.query("MATCH (user:User {userId: $user_id}) RETURN user", params={"user_id": user_id})

Comment on lines 386 to 400
MATCH (user)-[:HAS_BUFFER]->(buffer)
DETACH DELETE semantic, episodic, buffer
"""
self.graph.query(delete_query)
self.query(delete_query)
return f"All memories deleted for user ID: {user_id}"
except Exception as e:
return f"An error occurred: {str(e)}"

def delete_specific_memory_type(self, user_id, memory_type):
try:
# Check if the user exists
user_exists = self.graph.query(f"MATCH (user:User {{userId: '{user_id}'}}) RETURN user")
user_exists = self.query(f"MATCH (user:User {{userId: '{user_id}'}}) RETURN user")
if not user_exists:
return f"No user found with ID: {user_id}"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [397-410]

Similar to the previous comment, the delete_specific_memory_type method should use parameterized queries to prevent Cypher injection.

- user_exists = self.query(f"MATCH (user:User {{userId: '{user_id}'}}) RETURN user")
+ user_exists = self.query("MATCH (user:User {userId: $user_id}) RETURN user", params={"user_id": user_id})

Comment on lines 675 to 702
def update_document_node_with_db_ids(self, vectordb_namespace: str, document_id: str, user_id: str = None):
"""
Update the namespace of a Document node in the database. The document can be linked
either to a SemanticMemory node (if a user ID is provided) or to a PublicMemory node.

Parameters:
- vectordb_namespace (str): The namespace to set for the vectordb.
- document_id (str): The unique identifier of the document.
- user_id (str, optional): The unique identifier for the user. Default is None.

Returns:
- str: A Cypher query string to perform the update.
"""

if user_id:
# Update for a document linked to a SemanticMemory node
cypher_query = f'''
MATCH (user:User {{userId: '{user_id}' }})-[:HAS_SEMANTIC_MEMORY]->(:SemanticMemory)-[:HAS_DOCUMENT]->(document:Document {{d_id: '{document_id}'}})
SET document.vectordbNamespace = '{vectordb_namespace}'
RETURN document
'''
else:
# Update for a document linked to a PublicMemory node
cypher_query = f'''
MATCH (:PublicMemory)-[:HAS_DOCUMENT]->(document:Document {{d_id: '{document_id}'}})
SET document.vectordbNamespace = '{vectordb_namespace}'
RETURN document
'''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method update_document_node_with_db_ids has a conditional that checks if user_id is provided, but the method signature does not mark user_id as optional, which could lead to a TypeError if user_id is not passed. The method signature should be updated to reflect that user_id is optional.

- def update_document_node_with_db_ids(self, vectordb_namespace: str, document_id: str, user_id: str = None):
+ def update_document_node_with_db_ids(self, vectordb_namespace: str, document_id: str, user_id: Optional[str] = None):

Committable suggestion

IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
def update_document_node_with_db_ids(self, vectordb_namespace: str, document_id: str, user_id: str = None):
"""
Update the namespace of a Document node in the database. The document can be linked
either to a SemanticMemory node (if a user ID is provided) or to a PublicMemory node.
Parameters:
- vectordb_namespace (str): The namespace to set for the vectordb.
- document_id (str): The unique identifier of the document.
- user_id (str, optional): The unique identifier for the user. Default is None.
Returns:
- str: A Cypher query string to perform the update.
"""
if user_id:
# Update for a document linked to a SemanticMemory node
cypher_query = f'''
MATCH (user:User {{userId: '{user_id}' }})-[:HAS_SEMANTIC_MEMORY]->(:SemanticMemory)-[:HAS_DOCUMENT]->(document:Document {{d_id: '{document_id}'}})
SET document.vectordbNamespace = '{vectordb_namespace}'
RETURN document
'''
else:
# Update for a document linked to a PublicMemory node
cypher_query = f'''
MATCH (:PublicMemory)-[:HAS_DOCUMENT]->(document:Document {{d_id: '{document_id}'}})
SET document.vectordbNamespace = '{vectordb_namespace}'
RETURN document
'''
def update_document_node_with_db_ids(self, vectordb_namespace: str, document_id: str, user_id: Optional[str] = None):
"""
Update the namespace of a Document node in the database. The document can be linked
either to a SemanticMemory node (if a user ID is provided) or to a PublicMemory node.
Parameters:
- vectordb_namespace (str): The namespace to set for the vectordb.
- document_id (str): The unique identifier of the document.
- user_id (str, optional): The unique identifier for the user. Default is None.
Returns:
- str: A Cypher query string to perform the update.
"""
if user_id:
# Update for a document linked to a SemanticMemory node
cypher_query = f'''
MATCH (user:User {{userId: '{user_id}' }})-[:HAS_SEMANTIC_MEMORY]->(:SemanticMemory)-[:HAS_DOCUMENT]->(document:Document {{d_id: '{document_id}'}})
SET document.vectordbNamespace = '{vectordb_namespace}'
RETURN document
'''
else:
# Update for a document linked to a PublicMemory node
cypher_query = f'''
MATCH (:PublicMemory)-[:HAS_DOCUMENT]->(document:Document {{d_id: '{document_id}'}})
SET document.vectordbNamespace = '{vectordb_namespace}'
RETURN document
'''

Comment on lines +341 to +342


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The delete_memories method allows for deletion of memories from the vector database. As previously mentioned, ensure that proper authorization checks are in place to prevent unauthorized deletions and that exceptions are handled appropriately.

+ # TODO: Add authorization checks before allowing deletion of memories
+ try:
+     # existing deletion logic
+ except Exception as e:
+     logging.error(f"Failed to delete memories: {e}")
+     raise

Committable suggestion

IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
# TODO: Add authorization checks before allowing deletion of memories
try:
# existing deletion logic
except Exception as e:
logging.error(f"Failed to delete memories: {e}")
raise

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 9

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 0323ab6 and 50e4d7c.
Files selected for processing (1)
  • main.py (1 hunks)
Additional comments: 1
main.py (1)
  • 26-26: The instantiation of aclient using instructor.patch(OpenAI()) is not explained. It's unclear what instructor.patch does. If it's a custom wrapper or patch for the OpenAI client, ensure that it's functioning as expected and that the OpenAI client is properly configured.

main.py Outdated
Comment on lines 44 to 73
async def fetch_document_vectordb_namespace(session: AsyncSession, user_id: str, namespace_id:str, doc_id:str=None):
logging.info("user id is", user_id)
memory = await Memory.create_memory(user_id, session, namespace=namespace_id, memory_label=namespace_id)


# Managing memory attributes
existing_user = await Memory.check_existing_user(user_id, session)
print("here is the existing user", existing_user)
await memory.manage_memory_attributes(existing_user)
print("Namespace id is %s", namespace_id)
await memory.add_dynamic_memory_class(namespace_id.lower(), namespace_id)
namespace_class = namespace_id + "_class"

dynamic_memory_class = getattr(memory, namespace_class.lower(), None)

methods_to_add = ["add_memories", "fetch_memories", "delete_memories"]

if dynamic_memory_class is not None:
for method_name in methods_to_add:
await memory.add_method_to_class(dynamic_memory_class, method_name)
print(f"Memory method {method_name} has been added")
else:
print(f"No attribute named in memory.")

print("Available memory classes:", await memory.list_memory_classes())
result = await memory.dynamic_method_call(dynamic_memory_class, 'fetch_memories',
observation="placeholder", search_type="summary_filter_by_object_name", params=doc_id)
logging.info("Result is", result)

return result, namespace_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function fetch_document_vectordb_namespace has several print statements which are generally not recommended for production code as they can clutter logs and are not easily manageable. Consider using a logging framework with appropriate log levels.

Comment on lines +77 to +163
async def load_documents_to_vectorstore(session: AsyncSession, user_id: str, content:str=None, job_id:str=None, loader_settings:dict=None, memory_type:str="PRIVATE"):
namespace_id = str(generate_letter_uuid()) + "_" + "SEMANTICMEMORY"
namespace_class = namespace_id + "_class"

logging.info("Namespace created with id %s", namespace_id)
try:
new_user = User(id=user_id)
await add_entity(session, new_user)
except:
pass
if job_id is None:
job_id = str(uuid.uuid4())

await add_entity(
session,
Operation(
id=job_id,
user_id=user_id,
operation_status="RUNNING",
operation_type="DATA_LOAD",
),
)
memory = await Memory.create_memory(user_id, session, namespace=namespace_id, job_id=job_id, memory_label=namespace_id)
if content is not None:
document_names = [content[:30]]
if loader_settings is not None:
document_source = loader_settings.get("document_names") if isinstance(loader_settings.get("document_names"),
list) else loader_settings.get("path", "None")
logging.info("Document source is %s", document_source)
# try:
document_names = get_document_names(document_source[0])
logging.info(str(document_names))
# except:
# document_names = document_source
for doc in document_names:
from cognitive_architecture.shared.language_processing import translate_text, detect_language
#translates doc titles to english
if loader_settings is not None:
logging.info("Detecting language of document %s", doc)
loader_settings["single_document_path"]= loader_settings.get("path", "None")[0] +"/"+doc
logging.info("Document path is %s", loader_settings.get("single_document_path", "None"))
memory_category = loader_settings.get("memory_category", "PUBLIC")
if loader_settings is None:
memory_category = "CUSTOM"
if detect_language(doc) != "en":
doc_ = doc.strip(".pdf").replace("-", " ")
doc_ = translate_text(doc_, "sr", "en")
else:
doc_=doc
doc_id = str(uuid.uuid4())

logging.info("Document name is %s", doc_)
await add_entity(
session,
DocsModel(
id=doc_id,
operation_id=job_id,
graph_summary= False,
memory_category= memory_category,
doc_name=doc_
)
)
# Managing memory attributes
existing_user = await Memory.check_existing_user(user_id, session)
await memory.manage_memory_attributes(existing_user)
params = {
"doc_id":doc_id
}
print("Namespace id is %s", namespace_id)
await memory.add_dynamic_memory_class(namespace_id.lower(), namespace_id)

dynamic_memory_class = getattr(memory, namespace_class.lower(), None)

methods_to_add = ["add_memories", "fetch_memories", "delete_memories"]

if dynamic_memory_class is not None:
for method_name in methods_to_add:
await memory.add_method_to_class(dynamic_memory_class, method_name)
print(f"Memory method {method_name} has been added")
else:
print(f"No attribute named in memory.")

print("Available memory classes:", await memory.list_memory_classes())
result = await memory.dynamic_method_call(dynamic_memory_class, 'add_memories',
observation=content, params=params, loader_settings=loader_settings)
await update_entity(session, Operation, job_id, "SUCCESS")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function load_documents_to_vectorstore is complex and does a lot of work. It could benefit from being broken down into smaller, more focused functions for better maintainability and readability. Additionally, there are commented-out try-except blocks and print statements that should be cleaned up or replaced with proper logging.

Comment on lines +164 to +191
async def user_query_to_graph_db(session: AsyncSession, user_id: str, query_input: str):

try:
new_user = User(id=user_id)
await add_entity(session, new_user)
except:
pass

job_id = str(uuid.uuid4())

await add_entity(
session,
Operation(
id=job_id,
user_id=user_id,
operation_status="RUNNING",
operation_type="USER_QUERY_TO_GRAPH_DB",
),
)

neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url, username=config.graph_database_username, password=config.graph_database_password)
cypher_query = await neo4j_graph_db.generate_cypher_query_for_user_prompt_decomposition(user_id,query_input)
result = neo4j_graph_db.query(cypher_query)
neo4j_graph_db.close()

await update_entity(session, Operation, job_id, "SUCCESS")

return result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function user_query_to_graph_db seems to be handling database operations without any error handling. It's important to include error handling to manage any exceptions that might occur during the database operations.

Comment on lines +247 to +322
async def add_documents_to_graph_db(session: AsyncSession, user_id: str= None, document_memory_types:list=None):
""""""
if document_memory_types is None:
document_memory_types = ['PUBLIC']

logging.info("Document memory types are", document_memory_types)
try:
# await update_document_vectordb_namespace(postgres_session, user_id)
memory_details, docs = await get_unsumarized_vector_db_namespace(session, user_id)

logging.info("Docs are", docs)
memory_details= [detail for detail in memory_details if detail[1] in document_memory_types]
logging.info("Memory details", memory_details)
for doc in docs:
logging.info("Memory names are", memory_details)
doc_name, doc_id = doc
logging.info("Doc id is", doc_id)
try:
classification_content = await fetch_document_vectordb_namespace(session, user_id, memory_details[0][0], doc_id)
retrieval_chunks = [item['text'] for item in
classification_content[0]['data']['Get'][memory_details[0][0]]]
logging.info("Classification content is", classification_content)
except:
classification_content = ""
retrieval_chunks = ""
# retrieval_chunks = [item['text'] for item in classification_content[0]['data']['Get'][memory_details[0]]]
# Concatenating the extracted text values
concatenated_retrievals = ' '.join(retrieval_chunks)
print(concatenated_retrievals)
logging.info("Retrieval chunks are", retrieval_chunks)
classification = await classify_documents(doc_name, document_id =doc_id, content=concatenated_retrievals)

logging.info("Classification is", str(classification))
neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url, username=config.graph_database_username,
password=config.graph_database_password)
if document_memory_types == ['PUBLIC']:
await create_public_memory(user_id=user_id, labels=['sr'], topic="PublicMemory")
ids = neo4j_graph_db.retrieve_node_id_for_memory_type(topic="PublicMemory")
neo4j_graph_db.close()
print(ids)
else:
ids = neo4j_graph_db.retrieve_node_id_for_memory_type(topic="SemanticMemory")
neo4j_graph_db.close()
print(ids)

for id in ids:
print(id.get('memoryId'))
neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url, username=config.graph_database_username,
password=config.graph_database_password)
if document_memory_types == ['PUBLIC']:

rs = neo4j_graph_db.create_document_node_cypher(classification, user_id, public_memory_id=id.get('memoryId'))
neo4j_graph_db.close()
else:
rs = neo4j_graph_db.create_document_node_cypher(classification, user_id, memory_type='SemanticMemory')
neo4j_graph_db.close()
logging.info("Cypher query is", rs)
neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url, username=config.graph_database_username,
password=config.graph_database_password)
neo4j_graph_db.query(rs)
neo4j_graph_db.close()
logging.info("WE GOT HERE")
neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url, username=config.graph_database_username,
password=config.graph_database_password)
if memory_details[0][1] == "PUBLIC":

neo4j_graph_db.update_document_node_with_db_ids( vectordb_namespace=memory_details[0][0],
document_id=doc_id)
neo4j_graph_db.close()
else:
neo4j_graph_db.update_document_node_with_db_ids( vectordb_namespace=memory_details[0][0],
document_id=doc_id, user_id=user_id)
neo4j_graph_db.close()
# await update_entity_graph_summary(session, DocsModel, doc_id, True)
except Exception as e:
return e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function add_documents_to_graph_db has a large try-except block that catches all exceptions and returns the exception itself. This is not a good practice as it can obscure the source of errors and make debugging difficult. It's better to handle specific exceptions and log them appropriately.

main.py Outdated
Comment on lines 346 to 474
enriched_context = await user_context_enrichment(session, "user123", "How does cognitive architecture work?")
```
"""
neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url, username=config.graph_database_username,
password=config.graph_database_password)

await user_query_to_graph_db(session, user_id, query)

semantic_mem = neo4j_graph_db.retrieve_semantic_memory(user_id=user_id)
neo4j_graph_db.close()
neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url, username=config.graph_database_username,
password=config.graph_database_password)
episodic_mem = neo4j_graph_db.retrieve_episodic_memory(user_id=user_id)
neo4j_graph_db.close()
# public_mem = neo4j_graph_db.retrieve_public_memory(user_id=user_id)



if detect_language(query) != "en":
query = translate_text(query, "sr", "en")
logging.info("Translated query is", query)

neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url, username=config.graph_database_username,
password=config.graph_database_password)
summaries = await neo4j_graph_db.get_memory_linked_document_summaries(user_id=user_id, memory_type=memory_type)
neo4j_graph_db.close()
logging.info("Summaries are is %s", summaries)
# logging.info("Context from graphdb is %s", context)
# result = neo4j_graph_db.query(document_categories_query)
# summaries = [record.get("summary") for record in result]
# logging.info('Possible document categories are', str(result))
# logging.info('Possible document categories are', str(categories))
relevant_summary_id = await classify_call( query= query, document_summaries=str(summaries))


# logging.info("Relevant categories after the classifier are %s", relevant_categories)
neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url, username=config.graph_database_username,
password=config.graph_database_password)
postgres_id = await neo4j_graph_db.get_memory_linked_document_ids(user_id, summary_id = relevant_summary_id, memory_type=memory_type)
neo4j_graph_db.close()
# postgres_id = neo4j_graph_db.query(get_doc_ids)
logging.info("Postgres ids are %s", postgres_id)
namespace_id = await get_memory_name_by_doc_id(session, postgres_id[0])
logging.info("Namespace ids are %s", namespace_id)
params= {"doc_id":postgres_id[0]}
namespace_id = namespace_id[0]
namespace_class = namespace_id + "_class"
if memory_type =='PublicMemory':
user_id = 'system_user'

memory = await Memory.create_memory(user_id, session, namespace=namespace_id, job_id="23232",
memory_label=namespace_id)

existing_user = await Memory.check_existing_user(user_id, session)
print("here is the existing user", existing_user)
await memory.manage_memory_attributes(existing_user)

print("Namespace id is %s", namespace_id)
await memory.add_dynamic_memory_class(namespace_id.lower(), namespace_id)

dynamic_memory_class = getattr(memory, namespace_class.lower(), None)

methods_to_add = ["add_memories", "fetch_memories", "delete_memories"]

if dynamic_memory_class is not None:
for method_name in methods_to_add:
await memory.add_method_to_class(dynamic_memory_class, method_name)
print(f"Memory method {method_name} has been added")
else:
print(f"No attribute named in memory.")

print("Available memory classes:", await memory.list_memory_classes())
results = await memory.dynamic_method_call(dynamic_memory_class, 'fetch_memories',
observation=query, params=postgres_id[0], search_type="summary_filter_by_object_name")
logging.info("Result is", str(results))

search_context = ""

for result in results['data']['Get'][namespace_id]:
# Assuming 'result' is a dictionary and has keys like 'source', 'text'
source = result['source']
text = result['text']
search_context += f"Document source: {source}, Document text: {text} \n"

context = f""" You are a memory system that uses cognitive architecture to enrich the
LLM context and provide better query response.
You have access to the following information:
EPISODIC MEMORY: {episodic_mem}
SEMANTIC MEMORY: {semantic_mem}
PROCEDURAL MEMORY: NULL
SEARCH CONTEXT: The following documents provided with sources they were
extracted from could be used to provide an answer {search_context}
The original user query: {query}
"""
if generative_response is not True:
return context
else:
generative_result = generate_graph(context)
translation_to_srb = translate_text(generative_result.response, "en", "sr")
return translation_to_srb
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function user_context_enrichment is very long and does multiple things. It should be refactored into smaller functions, each with a single responsibility. Also, the function uses print statements for debugging, which should be replaced with proper logging.

Comment on lines +477 to +514
async def create_public_memory(user_id: str=None, labels:list=None, topic:str=None) -> Optional[int]:
"""
Create a public memory node associated with a user in a Neo4j graph database.
If Public Memory exists, it will return the id of the memory.
This is intended as standalone node that can be attached to any user.
It is not attached to any user by default.

Args:
user_id (str): The unique identifier for the user.
session (AsyncSession): An asynchronous session for database operations.

Returns:
Optional[int]: The ID of the created public memory node or None if an error occurs.
:param labels: Label for the memory, to help filter for different countries
:param topic: Topic for the memory, to help provide a name

"""
# Validate input parameters
if not labels:
labels = ['sr'] # Labels for the memory node

if not topic:
topic = "PublicMemory"


try:
neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url,
username=config.graph_database_username,
password=config.graph_database_password)

# Assuming the topic for public memory is predefined, e.g., "PublicMemory"
# Create the memory node
memory_id = await neo4j_graph_db.create_memory_node(labels=labels, topic=topic)
neo4j_graph_db.close()
return memory_id
except Neo4jError as e:
logging.error(f"Error creating public memory node: {e}")
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function create_public_memory has a try-except block that catches a Neo4jError but only logs the error without re-raising it or handling it in a way that the caller can react to. This could lead to silent failures where the caller is unaware that an error has occurred.

Comment on lines +516 to +559
async def attach_user_to_memory(user_id: str=None, labels:list=None, topic:str=None) -> Optional[int]:
"""
Link user to public memory

Args:
user_id (str): The unique identifier for the user.
topic (str): Memory name


Returns:
Optional[int]: The ID of the created public memory node or None if an error occurs.
:param labels: Label for the memory, to help filter for different countries
:param topic: Topic for the memory, to help provide a name

"""
# Validate input parameters
if not user_id:
raise ValueError("User ID is required.")
if not labels:
labels = ['sr'] # Labels for the memory node

if not topic:
topic = "PublicMemory"


try:
neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url,
username=config.graph_database_username,
password=config.graph_database_password)

# Assuming the topic for public memory is predefined, e.g., "PublicMemory"
ids = neo4j_graph_db.retrieve_node_id_for_memory_type(topic=topic)
neo4j_graph_db.close()

for id in ids:
neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url,
username=config.graph_database_username,
password=config.graph_database_password)
linked_memory = neo4j_graph_db.link_public_memory_to_user(memory_id=id.get('memoryId'), user_id=user_id)
neo4j_graph_db.close()
return 1
except Neo4jError as e:
logging.error(f"Error creating public memory node: {e}")
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function attach_user_to_memory raises a ValueError if the user_id is not provided, which is good for input validation. However, the rest of the function is wrapped in a try-except that catches Neo4jError and logs it, similar to the previous function. This could also lead to silent failures.

Comment on lines +561 to +603
async def unlink_user_from_memory(user_id: str=None, labels:list=None, topic:str=None) -> Optional[int]:
"""
Unlink user from memory

Args:
user_id (str): The unique identifier for the user.
topic (str): Memory name

Returns:
Optional[int]: The ID of the created public memory node or None if an error occurs.
:param labels: Label for the memory, to help filter for different countries
:param topic: Topic for the memory, to help provide a name

"""
# Validate input parameters
if not user_id:
raise ValueError("User ID is required.")
if not labels:
labels = ['sr'] # Labels for the memory node

if not topic:
topic = "PublicMemory"


try:
neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url,
username=config.graph_database_username,
password=config.graph_database_password)

# Assuming the topic for public memory is predefined, e.g., "PublicMemory"
ids = neo4j_graph_db.retrieve_node_id_for_memory_type(topic=topic)
neo4j_graph_db.close()

for id in ids:
neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url,
username=config.graph_database_username,
password=config.graph_database_password)
linked_memory = neo4j_graph_db.unlink_memory_from_user(memory_id=id.get('memoryId'), user_id=user_id)
neo4j_graph_db.close()
return 1
except Neo4jError as e:
logging.error(f"Error creating public memory node: {e}")
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function unlink_user_from_memory has the same issue as the attach_user_to_memory function, where it catches and logs Neo4jError without proper error handling.

main.py Outdated
Comment on lines 607 to 754
# CREATE (episodic)-[:HAS_EVENT]->(location5_5e43f4c45b3c44ea897c12220db4c051)
# CREATE (buffer)-[:CURRENTLY_HOLDING]->(location5_5e43f4c45b3c44ea897c12220db4c051)
# CREATE (episodic)-[:HAS_EVENT]->(object6_5cdb87ad488c450c9dbce07b7daf3d8d)
# CREATE (buffer)-[:CURRENTLY_HOLDING]->(object6_5cdb87ad488c450c9dbce07b7daf3d8d)
# CREATE (episodic)-[:HAS_EVENT]->(information7_f756e3f3720c4fe5aeb01287badaf088)
# CREATE (buffer)-[:CURRENTLY_HOLDING]->(information7_f756e3f3720c4fe5aeb01287badaf088)
# CREATE (episodic)-[:HAS_EVENT]->(event8_da6334e744454264900296319e14b532)
# CREATE (buffer)-[:CURRENTLY_HOLDING]->(event8_da6334e744454264900296319e14b532)
# CREATE (episodic)-[:HAS_EVENT]->(action9_48e45419604e4d66b3e718ee1d6c095f)
# CREATE (buffer)-[:CURRENTLY_HOLDING]->(action9_48e45419604e4d66b3e718ee1d6c095f)
# CREATE (episodic)-[:HAS_EVENT]->(action10_f48acb1db4da4934afbe17363e9e63a4)
# CREATE (buffer)-[:CURRENTLY_HOLDING]->(action10_f48acb1db4da4934afbe17363e9e63a4)"""
#
# out = cypher_statement_correcting(query)
# print(out)
#
# out = await user_query_to_graph_db(session, user_id, "I walked in the forest yesterday and added to my list I need to buy some milk in the store and get a summary from a classical book i read yesterday")
# print(out)
# load_doc_to_graph = await add_documents_to_graph_db(session, user_id)
# print(load_doc_to_graph)
user_id = 'test_user'
# loader_settings = {
# "format": "PDF",
# "source": "DEVICE",
# "path": [".data"]
# }
# await load_documents_to_vectorstore(session, user_id, loader_settings=loader_settings)
# await create_public_memory(user_id=user_id, labels=['sr'], topic="PublicMemory")
# await add_documents_to_graph_db(session, user_id)
#
neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url, username=config.graph_database_username,
password=config.graph_database_password)

# await attach_user_to_memory(user_id=user_id, labels=['sr'], topic="PublicMemory")

return_ = await user_context_enrichment(user_id=user_id, query="Koja je minimalna visina ograde na balkonu na stambenom objektu", session=session, memory_type="PublicMemory", generative_response=True)
print(return_)
# document_summary = {
# 'DocumentCategory': 'Science',
# 'Title': 'The Future of AI',
# 'Summary': 'An insightful article about the advancements in AI.',
# 'd_id': 'doc123'
# }
#
# # Example user ID
# user_id = 'user'
#
# # value = await neo4j_graph_db.create_memory_node(labels=['sr'])
# # print(value)
# # neo4j_graph_db.close()
#
# await add_documents_to_graph_db(session, user_id)
# neo4j_graph_db.link_public_memory_to_user(memory_id = 17,user_id=user_id)
#
# ids = neo4j_graph_db.retrieve_node_id_for_memory_type(topic="Document")
# print(ids)
#
# for id in ids:
# print(id.get('memoryId'))
#
# neo4j_graph_db.delete_memory_node(memory_id = id.get('memoryId'), topic="Document")
#
# neo4j_graph_db.delete_memory_node(memory_id=16, topic="PublicSerbianArchitecture")
# neo4j_graph_db.unlink_memory_from_user(memory_id = 17,user_id=user_id)
# cypher_query_public = neo4j_graph_db.create_document_node_cypher(document_summary, user_id, memory_type="PUBLIC")
# neo4j_graph_db.query(cypher_query_public)
# link_memory_to_user(user_id, session)

# neo4j_graph_db.create_memory_node(labels=['sr'])
# out = await get_vectordb_namespace(session, user_id)
# params = {
# "version": "1.0",
# "agreement_id": "AG123456",
# "privacy_policy": "https://example.com/privacy",
# "terms_of_service": "https://example.com/terms",
# "format": "json",
# "schema_version": "1.1",
# "checksum": "a1b2c3d4e5f6",
# "owner": "John Doe",
# "license": "MIT",
# "validity_start": "2023-08-01",
# "validity_end": "2024-07-31",
# }
# loader_settings = {
# "format": "PDF",
# "source": "DEVICE",
# "path": [".data"],
# "strategy": "SUMMARY",
# }
# await load_documents_to_vectorstore(session, user_id, loader_settings=loader_settings)
# await user_query_to_graph_db(session, user_id, "I walked in the forest yesterday and added to my list I need to buy some milk in the store and get a summary from a classical book i read yesterday")
# await add_documents_to_graph_db(session, user_id, loader_settings=loader_settings)
# await user_context_enrichment(session, user_id, query="Tell me about the book I read yesterday")


if __name__ == "__main__":
import asyncio


asyncio.run(main())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main function is mostly commented out, which suggests that it's either a work in progress or that the commented code is no longer needed. If it's the latter, the commented code should be removed to avoid confusion.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 5

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 50e4d7c and f41f091.
Files selected for processing (3)
  • api.py (1 hunks)
  • cognitive_architecture/classifiers/classifier.py (1 hunks)
  • main.py (1 hunks)
Files skipped from review as they are similar to previous changes (2)
  • cognitive_architecture/classifiers/classifier.py
  • main.py
Additional comments: 1
api.py (1)
  • 325-337: The start_api_server function is a utility to start the FastAPI server using uvicorn. The exception handling within this function is appropriate, logging the exception and potentially allowing for cleanup or recovery code to be added.

Comment on lines +28 to +30
load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY", "")
app = FastAPI(debug=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The environment variable OPENAI_API_KEY is being loaded with a default empty string. This contradicts the previous review comment which suggested adding error handling if the key is not set. The application should not start without this key.

- OPENAI_API_KEY = os.getenv("OPENAI_API_KEY", "")
+ OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
+ if not OPENAI_API_KEY:
+     raise RuntimeError("OPENAI_API_KEY is not set in the environment variables.")

Committable suggestion

IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY", "")
app = FastAPI(debug=True)
load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
if not OPENAI_API_KEY:
raise RuntimeError("OPENAI_API_KEY is not set in the environment variables.")
app = FastAPI(debug=True)

Comment on lines +69 to +96
@app.post("/add-memory", response_model=dict)
async def add_memory(
payload: Payload,
# files: List[UploadFile] = File(...),
):
try:
logging.info(" Adding to Memory ")
decoded_payload = payload.payload
async with session_scope(session=AsyncSessionLocal()) as session:
from main import load_documents_to_vectorstore

if 'settings' in decoded_payload and decoded_payload['settings'] is not None:
settings_for_loader = decoded_payload['settings']
else:
settings_for_loader = None

if 'content' in decoded_payload and decoded_payload['content'] is not None:
content = decoded_payload['content']
else:
content = None

output = await load_documents_to_vectorstore(session, decoded_payload['user_id'], content=content, loader_settings=settings_for_loader)
return JSONResponse(content={"response": output}, status_code=200)

except Exception as e:
return JSONResponse(
content={"response": {"error": str(e)}}, status_code=503
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The /add-memory endpoint's error handling could be improved by providing more specific HTTP status codes and messages based on the exception type, as previously suggested.

- return JSONResponse(content={"response": {"error": str(e)}}, status_code=503)
+ if isinstance(e, SomeSpecificException):
+     status_code = 400  # Bad Request or other appropriate status code
+ else:
+     status_code = 500  # Internal Server Error
+ return JSONResponse(content={"response": {"error": str(e)}}, status_code=status_code)

Committable suggestion

IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
@app.post("/add-memory", response_model=dict)
async def add_memory(
payload: Payload,
# files: List[UploadFile] = File(...),
):
try:
logging.info(" Adding to Memory ")
decoded_payload = payload.payload
async with session_scope(session=AsyncSessionLocal()) as session:
from main import load_documents_to_vectorstore
if 'settings' in decoded_payload and decoded_payload['settings'] is not None:
settings_for_loader = decoded_payload['settings']
else:
settings_for_loader = None
if 'content' in decoded_payload and decoded_payload['content'] is not None:
content = decoded_payload['content']
else:
content = None
output = await load_documents_to_vectorstore(session, decoded_payload['user_id'], content=content, loader_settings=settings_for_loader)
return JSONResponse(content={"response": output}, status_code=200)
except Exception as e:
return JSONResponse(
content={"response": {"error": str(e)}}, status_code=503
)
except Exception as e:
if isinstance(e, SomeSpecificException):
status_code = 400 # Bad Request or other appropriate status code
else:
status_code = 500 # Internal Server Error
return JSONResponse(content={"response": {"error": str(e)}}, status_code=status_code)

Comment on lines +98 to +126
@app.post("/add-architecture-public-memory", response_model=dict)
async def add_memory(
payload: Payload,
# files: List[UploadFile] = File(...),
):
try:
logging.info(" Adding to Memory ")
decoded_payload = payload.payload
async with session_scope(session=AsyncSessionLocal()) as session:
from main import load_documents_to_vectorstore
if 'content' in decoded_payload and decoded_payload['content'] is not None:
content = decoded_payload['content']
else:
content = None

user_id = 'system_user'
loader_settings = {
"format": "PDF",
"source": "DEVICE",
"path": [".data"]
}

output = await load_documents_to_vectorstore(session, user_id=user_id, content=content, loader_settings=loader_settings)
return JSONResponse(content={"response": output}, status_code=200)

except Exception as e:
return JSONResponse(
content={"response": {"error": str(e)}}, status_code=503
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The /add-architecture-public-memory endpoint seems to duplicate the /add-memory endpoint's functionality with minor differences. The function name add_memory is used for both endpoints, which will cause a conflict. The function name should be unique to avoid collision.

- async def add_memory(
+ async def add_architecture_public_memory(

Committable suggestion

IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
@app.post("/add-architecture-public-memory", response_model=dict)
async def add_memory(
payload: Payload,
# files: List[UploadFile] = File(...),
):
try:
logging.info(" Adding to Memory ")
decoded_payload = payload.payload
async with session_scope(session=AsyncSessionLocal()) as session:
from main import load_documents_to_vectorstore
if 'content' in decoded_payload and decoded_payload['content'] is not None:
content = decoded_payload['content']
else:
content = None
user_id = 'system_user'
loader_settings = {
"format": "PDF",
"source": "DEVICE",
"path": [".data"]
}
output = await load_documents_to_vectorstore(session, user_id=user_id, content=content, loader_settings=loader_settings)
return JSONResponse(content={"response": output}, status_code=200)
except Exception as e:
return JSONResponse(
content={"response": {"error": str(e)}}, status_code=503
)
@app.post("/add-architecture-public-memory", response_model=dict)
async def add_architecture_public_memory(
payload: Payload,
# files: List[UploadFile] = File(...),
):
try:
logging.info(" Adding to Memory ")
decoded_payload = payload.payload
async with session_scope(session=AsyncSessionLocal()) as session:
from main import load_documents_to_vectorstore
if 'content' in decoded_payload and decoded_payload['content'] is not None:
content = decoded_payload['content']
else:
content = None
user_id = 'system_user'
loader_settings = {
"format": "PDF",
"source": "DEVICE",
"path": [".data"]
}
output = await load_documents_to_vectorstore(session, user_id=user_id, content=content, loader_settings=loader_settings)
return JSONResponse(content={"response": output}, status_code=200)
except Exception as e:
return JSONResponse(
content={"response": {"error": str(e)}}, status_code=503
)

Comment on lines +206 to +245
@app.post("/drop-db")
async def drop_db(payload: Payload):
try:
decoded_payload = payload.payload

if decoded_payload['operation'] == 'drop':

if os.environ.get('AWS_ENV') == 'dev':
host = os.environ.get('POSTGRES_HOST')
username = os.environ.get('POSTGRES_USER')
password = os.environ.get('POSTGRES_PASSWORD')
database_name = os.environ.get('POSTGRES_DB')
else:
pass

from cognitive_architecture.database.create_database_tst import drop_database, create_admin_engine

engine = create_admin_engine(username, password, host, database_name)
drop_database(engine)
return JSONResponse(content={"response": "DB dropped"}, status_code=200)
else:

if os.environ.get('AWS_ENV') == 'dev':
host = os.environ.get('POSTGRES_HOST')
username = os.environ.get('POSTGRES_USER')
password = os.environ.get('POSTGRES_PASSWORD')
database_name = os.environ.get('POSTGRES_DB')
else:
pass

from cognitive_architecture.database.create_database_tst import create_database, create_admin_engine

engine = create_admin_engine(username, password, host, database_name)
create_database(engine)
return JSONResponse(content={"response": " DB created"}, status_code=200)



except Exception as e:
return HTTPException(status_code=500, detail=str(e))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The /drop-db endpoint has been updated to include logic for both dropping and creating the database based on the operation specified in the payload. However, the exception handling at the end of the function uses HTTPException incorrectly. It should be raised, not returned.

- return HTTPException(status_code=500, detail=str(e))
+ raise HTTPException(status_code=500, detail=str(e))

Committable suggestion

IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
@app.post("/drop-db")
async def drop_db(payload: Payload):
try:
decoded_payload = payload.payload
if decoded_payload['operation'] == 'drop':
if os.environ.get('AWS_ENV') == 'dev':
host = os.environ.get('POSTGRES_HOST')
username = os.environ.get('POSTGRES_USER')
password = os.environ.get('POSTGRES_PASSWORD')
database_name = os.environ.get('POSTGRES_DB')
else:
pass
from cognitive_architecture.database.create_database_tst import drop_database, create_admin_engine
engine = create_admin_engine(username, password, host, database_name)
drop_database(engine)
return JSONResponse(content={"response": "DB dropped"}, status_code=200)
else:
if os.environ.get('AWS_ENV') == 'dev':
host = os.environ.get('POSTGRES_HOST')
username = os.environ.get('POSTGRES_USER')
password = os.environ.get('POSTGRES_PASSWORD')
database_name = os.environ.get('POSTGRES_DB')
else:
pass
from cognitive_architecture.database.create_database_tst import create_database, create_admin_engine
engine = create_admin_engine(username, password, host, database_name)
create_database(engine)
return JSONResponse(content={"response": " DB created"}, status_code=200)
except Exception as e:
return HTTPException(status_code=500, detail=str(e))
@app.post("/drop-db")
async def drop_db(payload: Payload):
try:
decoded_payload = payload.payload
if decoded_payload['operation'] == 'drop':
if os.environ.get('AWS_ENV') == 'dev':
host = os.environ.get('POSTGRES_HOST')
username = os.environ.get('POSTGRES_USER')
password = os.environ.get('POSTGRES_PASSWORD')
database_name = os.environ.get('POSTGRES_DB')
else:
pass
from cognitive_architecture.database.create_database_tst import drop_database, create_admin_engine
engine = create_admin_engine(username, password, host, database_name)
drop_database(engine)
return JSONResponse(content={"response": "DB dropped"}, status_code=200)
else:
if os.environ.get('AWS_ENV') == 'dev':
host = os.environ.get('POSTGRES_HOST')
username = os.environ.get('POSTGRES_USER')
password = os.environ.get('POSTGRES_PASSWORD')
database_name = os.environ.get('POSTGRES_DB')
else:
pass
from cognitive_architecture.database.create_database_tst import create_database, create_admin_engine
engine = create_admin_engine(username, password, host, database_name)
create_database(engine)
return JSONResponse(content={"response": " DB created"}, status_code=200)
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))

Comment on lines +248 to +276
@app.post("/create-public-memory")
async def create_public_memory(payload: Payload):
try:
decoded_payload = payload.payload

if 'user_id' in decoded_payload and decoded_payload['user_id'] is not None:
user_id = decoded_payload['user_id']
else:
user_id = None

if 'labels' in decoded_payload and decoded_payload['labels'] is not None:
labels = decoded_payload['labels']
else:
labels = None

if 'topic' in decoded_payload and decoded_payload['topic'] is not None:
topic = decoded_payload['topic']
else:
topic = None

# Execute the query - replace this with the actual execution method
# async with session_scope(session=AsyncSessionLocal()) as session:
# from main import create_public_memory
# Assuming you have a method in Neo4jGraphDB to execute the query
result = await create_public_memory(user_id=user_id, labels=labels, topic=topic)
return JSONResponse(content={"response": result}, status_code=200)

except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The /create-public-memory endpoint seems to have been updated with new logic. However, the commented-out code from lines 259-261 should be removed if it's no longer needed to keep the codebase clean.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between f41f091 and fb542a3.
Files selected for processing (1)
  • main.py (1 hunks)
Files skipped from review as they are similar to previous changes (1)
  • main.py

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between fb542a3 and 7b39813.
Files selected for processing (1)
  • main.py (1 hunks)
Files skipped from review as they are similar to previous changes (1)
  • main.py

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 7b39813 and 3d6d0a0.
Files selected for processing (1)
  • main.py (1 hunks)
Files skipped from review as they are similar to previous changes (1)
  • main.py

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 3d6d0a0 and db18bd8.
Files selected for processing (1)
  • main.py (1 hunks)
Files skipped from review as they are similar to previous changes (1)
  • main.py

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between db18bd8 and eee6049.
Files selected for processing (1)
  • main.py (1 hunks)
Files skipped from review as they are similar to previous changes (1)
  • main.py

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between eee6049 and 444d4b3.
Files selected for processing (1)
  • main.py (1 hunks)
Files skipped from review as they are similar to previous changes (1)
  • main.py

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 444d4b3 and d8f621f.
Files selected for processing (1)
  • main.py (1 hunks)
Files skipped from review as they are similar to previous changes (1)
  • main.py

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between d8f621f and 94ef465.
Files selected for processing (1)
  • main.py (1 hunks)
Files skipped from review as they are similar to previous changes (1)
  • main.py

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 9

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 94ef465 and 0b1cefb.
Files selected for processing (3)
  • cognitive_architecture/database/graph_database/graph.py (1 hunks)
  • cognitive_architecture/database/vectordb/basevectordb.py (1 hunks)
  • cognitive_architecture/database/vectordb/vectordb.py (6 hunks)
Files not summarized due to errors (1)
  • cognitive_architecture/database/graph_database/graph.py: Error: Message exceeds token limit
Files skipped from review as they are similar to previous changes (2)
  • cognitive_architecture/database/vectordb/basevectordb.py
  • cognitive_architecture/database/vectordb/vectordb.py


from neo4j import AsyncSession
from neo4j.exceptions import Neo4jError

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The debugging print statement should be removed or replaced with a logging statement if necessary.

- print(os.getcwd())

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

Comment on lines +31 to +33
from typing import List, Dict, Optional
from ...utils import format_dict, append_uuid_to_variable_names, create_edge_variable_mapping, \
create_node_variable_mapping, get_unsumarized_vector_db_namespace
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The relative import from ...utils should be replaced with an absolute import for clarity and to avoid issues with changing directory structures.

- from ...utils import format_dict, append_uuid_to_variable_names, create_edge_variable_mapping, \
-     create_node_variable_mapping, get_unsumarized_vector_db_namespace
+ from cognitive_architecture.utils import format_dict, append_uuid_to_variable_names, create_edge_variable_mapping, \
+     create_node_variable_mapping, get_unsumarized_vector_db_namespace

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
from typing import List, Dict, Optional
from ...utils import format_dict, append_uuid_to_variable_names, create_edge_variable_mapping, \
create_node_variable_mapping, get_unsumarized_vector_db_namespace
from typing import List, Dict, Optional
from cognitive_architecture.utils import format_dict, append_uuid_to_variable_names, create_edge_variable_mapping, \
create_node_variable_mapping, get_unsumarized_vector_db_namespace

Comment on lines +232 to +238
try:
with self.driver.session() as session:
result = session.run(query, params).data()
return result
except Exception as e:
logging.error(f"An error occurred while executing the query: {e}")
raise e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The query method in the Neo4jGraphDB class catches a generic Exception. It should catch more specific exceptions to avoid hiding unexpected errors.

-        except Exception as e:
+        except Neo4jError as e:

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
try:
with self.driver.session() as session:
result = session.run(query, params).data()
return result
except Exception as e:
logging.error(f"An error occurred while executing the query: {e}")
raise e
try:
with self.driver.session() as session:
result = session.run(query, params).data()
return result
except Neo4jError as e:
logging.error(f"An error occurred while executing the query: {e}")
raise e

Comment on lines +444 to +447
logging.info("User Cypher Query raw: %s", cypher_statements_joined)
# corrected_cypher_statements = self.cypher_statement_correcting(input = cypher_statements_joined)
# logging.info("User Cypher Query: %s", corrected_cypher_statements.query)
# return corrected_cypher_statements.query
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commented-out code for correcting Cypher statements should be either uncommented and tested or removed to avoid confusion.

-        # corrected_cypher_statements = self.cypher_statement_correcting(input = cypher_statements_joined)
-        # logging.info("User Cypher Query: %s", corrected_cypher_statements.query)
-        # return corrected_cypher_statements.query

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
logging.info("User Cypher Query raw: %s", cypher_statements_joined)
# corrected_cypher_statements = self.cypher_statement_correcting(input = cypher_statements_joined)
# logging.info("User Cypher Query: %s", corrected_cypher_statements.query)
# return corrected_cypher_statements.query
logging.info("User Cypher Query raw: %s", cypher_statements_joined)

Comment on lines +889 to +925
async def create_memory_node(self, labels, topic=None):
"""
Create or find a memory node of the specified type with labels and a description.

Args:
labels (List[str]): A list of labels for the node.
topic (str, optional): The type of memory node to create or find. Defaults to "PublicMemory".

Returns:
int: The ID of the created or found memory node.

Raises:
ValueError: If input parameters are invalid.
Neo4jError: If an error occurs during the database operation.
"""
if topic is None:
topic = "PublicMemory"

# Prepare labels as a string
label_list = ', '.join(f"'{label}'" for label in labels)

# Cypher query to find or create the memory node with the given description and labels
memory_cypher = f"""
MERGE (memory:{topic} {{description: '{topic}', label: [{label_list}]}})
SET memory.memoryId = ID(memory)
RETURN id(memory) AS memoryId
"""

try:
result = self.query(memory_cypher)
# Assuming the result is a list of records, where each record contains 'memoryId'
memory_id = result[0]['memoryId'] if result else None
self.close()
return memory_id
except Neo4jError as e:
logging.error(f"Error creating or finding memory node: {e}")
raise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method create_memory_node uses string formatting to create a Cypher query, which could be vulnerable to injection attacks. It should be parameterized to ensure safety.

- MERGE (memory:{topic} {{description: '{topic}', label: [{label_list}]}})
+ MERGE (memory:$topic {description: $description, label: $labels})

And update the query call to use parameters:

self.query(memory_cypher, params={"topic": topic, "description": topic, "labels": labels})

Comment on lines +990 to +998
def link_public_memory_to_user(self, memory_id, user_id):
# Link an existing Public Memory node to a User node
link_cypher = f"""
MATCH (user:User {{userId: '{user_id}'}})
MATCH (publicMemory:PublicMemory) WHERE id(publicMemory) = {memory_id}
MERGE (user)-[:HAS_PUBLIC_MEMORY]->(publicMemory)
"""
self.query(link_cypher)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method link_public_memory_to_user does not validate the memory_id before using it in the Cypher query. It should ensure that memory_id is an integer to prevent potential issues.

- MATCH (publicMemory:PublicMemory) WHERE id(publicMemory) = {memory_id}
+ MATCH (publicMemory:PublicMemory) WHERE id(publicMemory) = $memory_id

And update the query call to use parameters:

self.query(link_cypher, params={"memory_id": memory_id})

Comment on lines +1000 to +1006
link_cypher = f""" MATCH(publicMemory: {topic})
RETURN
id(publicMemory)
AS
memoryId """
node_ids = self.query(link_cypher)
return node_ids
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method retrieve_node_id_for_memory_type is using a Cypher query without parameterization. It should be updated to use parameters for topic to prevent potential injection issues.

- MATCH(publicMemory: {topic})
+ MATCH(publicMemory: $topic)

And update the query call to use parameters:

self.query(link_cypher, params={"topic": topic})

Comment on lines 1018 to 1095
#
# async def link_memory_to_user(self, memory_id: int, user_id: str, relationship_type: str = 'HAS_MEMORY') -> None:
# """
# Link a memory node to a user with a specified relationship type.
#
# Args:
# memory_id (int): The ID of the memory node.
# user_id (str): The user ID to link the memory to.
# relationship_type (str): The type of relationship.
#
# Raises:
# ValueError: If input parameters are invalid.
# Neo4jError: If an error occurs during the database operation.
# """
# if not user_id or not memory_id:
# raise ValueError("User ID and Memory ID are required for linking.")
#
# try:
# link_cypher = f"""
# MATCH (user:User {{userId: '{user_id}'}})
# MATCH (memory) WHERE id(memory) = {memory_id}
# MERGE (user)-[:{relationship_type}]->(memory)
# """
# await self.query(link_cypher)
# except Neo4jError as e:
# logging.error(f"Error linking memory to user: {e}")
# raise
#
# async def delete_memory_node(self, memory_id: int, memory_type: str) -> None:
# """
# Delete a memory node of a specified type.
#
# Args:
# memory_id (int): The ID of the memory node to delete.
# memory_type (str): The type of the memory node.
#
# Raises:
# ValueError: If input parameters are invalid.
# Neo4jError: If an error occurs during the database operation.
# """
# if not memory_id or not memory_type:
# raise ValueError("Memory ID and Memory Type are required for deletion.")
#
# try:
# delete_cypher = f"""
# MATCH (memory:{memory_type}) WHERE id(memory) = {memory_id}
# DETACH DELETE memory
# """
# await self.query(delete_cypher)
# except Neo4jError as e:
# logging.error(f"Error deleting memory node: {e}")
# raise
#
# async def unlink_memory_from_user(self, memory_id: int, user_id: str, relationship_type: str = 'HAS_MEMORY') -> None:
# """
# Unlink a memory node from a user.
#
# Args:
# memory_id (int): The ID of the memory node.
# user_id (str): The user ID to unlink from the memory.
# relationship_type (str): The type of relationship.
#
# Raises:
# ValueError: If input parameters are invalid.
# Neo4jError: If an error occurs during the database operation.
# """
# if not user_id or not memory_id:
# raise ValueError("User ID and Memory ID are required for unlinking.")
#
# try:
# unlink_cypher = f"""
# MATCH (user:User {{userId: '{user_id}'}})-[r:{relationship_type}]->(memory) WHERE id(memory) = {memory_id}
# DELETE r
# """
# await self.query(unlink_cypher)
# except Neo4jError as e:
# logging.error(f"Error unlinking memory from user: {e}")
# raise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are large blocks of commented-out code that seem to be previous versions of methods or alternative implementations. This should be cleaned up to avoid confusion and maintain code clarity.

Comment on lines 1099 to 1160
# def create_public_memory(self, labels, topic=None):
# if topic is None:
# topic = "SerbianArchitecture"
# topicMemoryId = topic + "MemoryId"
# # Create an independent Architecture Memory node with countries as properties
# label_list = ', '.join(f"'{label}'" for label in labels) # Prepare countries list as a string
# architecture_memory_cypher = f"""
# CREATE ({topic.lower()}:{topic} {{description: '{topic}', label: [{label_list}]}})
# RETURN id({topic.lower()}) AS {topicMemoryId}
# """
# return self.query(architecture_memory_cypher)
#
# def link_public_memory_to_user(self, public_memory_id, user_id):
# # Link an existing Public Memory node to a User node
# link_cypher = f"""
# MATCH (user:User {{userId: '{user_id}'}})
# MATCH (publicMemory:PublicMemory) WHERE id(publicMemory) = {public_memory_id}
# MERGE (user)-[:HAS_PUBLIC_MEMORY]->(publicMemory)
# """
# self.query(link_cypher)
#
# def link_public_memory_to_architecture(self, public_memory_id):
# # Link the Public Memory node to the Architecture Memory node
# link_cypher = f"""
# MATCH (publicMemory:PublicMemory) WHERE id(publicMemory) = {public_memory_id}
# MATCH (architecture:Architecture {{description: 'Architecture'}})
# MERGE (publicMemory)-[:INCLUDES]->(architecture)
# """
# self.query(link_cypher)
#
# def delete_public_memory(self, public_memory_id):
# # Delete a Public Memory node by its ID
# delete_cypher = f"""
# MATCH (publicMemory:PublicMemory) WHERE id(publicMemory) = {public_memory_id}
# DETACH DELETE publicMemory
# """
# self.query(delete_cypher)
#
# def delete_architecture_memory(self, architecture_memory_id):
# # Delete an Architecture Memory node by its ID
# delete_cypher = f"""
# MATCH (architecture:Architecture) WHERE id(architecture) = {architecture_memory_id}
# DETACH DELETE architecture
# """
# self.query(delete_cypher)
#
# def unlink_public_memory_from_user(self, public_memory_id, user_id):
# # Unlink a Public Memory node from a User node
# unlink_cypher = f"""
# MATCH (user:User {{userId: '{user_id}'}})-[r:HAS_PUBLIC_MEMORY]->(publicMemory:PublicMemory) WHERE id(publicMemory) = {public_memory_id}
# DELETE r
# """
# self.query(unlink_cypher)
#
# def unlink_public_memory_from_architecture(self, public_memory_id):
# # Unlink the Public Memory node from the Architecture Memory node
# unlink_cypher = f"""
# MATCH (publicMemory:PublicMemory)-[r:INCLUDES]->(architecture:Architecture) WHERE id(publicMemory) = {public_memory_id}
# DELETE r
# """
# self.query(unlink_cypher)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More commented-out code is present at the end of the file. If this code is not needed, it should be removed. If it's meant to be used in the future, it should be documented as to why it's being retained.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 6

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 0b1cefb and dab514f.
Files selected for processing (7)
  • cognitive_architecture/database/graph_database/graph.py (1 hunks)
  • cognitive_architecture/database/graph_database/networkx_graph.py (1 hunks)
  • cognitive_architecture/database/vectordb/basevectordb.py (2 hunks)
  • cognitive_architecture/llm/prompts/generate_graph_prompt.txt (1 hunks)
  • cognitive_architecture/llm/queries.py (1 hunks)
  • cognitive_architecture/shared/data_models.py (1 hunks)
  • main.py (1 hunks)
Files skipped from review as they are similar to previous changes (2)
  • cognitive_architecture/database/vectordb/basevectordb.py
  • main.py

id: int
description: str
category: str
color: str ="blue"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value for the color attribute in the Node class is set as a mutable default argument. This can lead to unexpected behavior if the object is mutated.

- color: str ="blue"
+ color: str = Field(default="blue")

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
color: str ="blue"
color: str = Field(default="blue")

source: int
target: int
description: str
color: str= "blue"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same issue with mutable default arguments is present in the Edge class for the color attribute.

- color: str= "blue"
+ color: str = Field(default="blue")

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
color: str= "blue"
color: str = Field(default="blue")



def generate_graph(input) -> KnowledgeGraph:
model = "gpt-4-1106-preview" # Define the model here
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoding the model name within the generate_graph function limits flexibility and maintainability. Consider using a configuration setting for the model name.

- model = "gpt-4-1106-preview"  # Define the model here
+ model = config.model  # Use the model from the configuration

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
model = "gpt-4-1106-preview" # Define the model here
model = config.model # Use the model from the configuration




async def generate_summary(input) -> MemorySummary:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generate_summary function is defined as async but does not perform any await operations. If the aclient.chat.completions.create method is not asynchronous, the async keyword should be removed.

- async def generate_summary(input) -> MemorySummary:
+ def generate_summary(input) -> MemorySummary:

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
async def generate_summary(input) -> MemorySummary:
def generate_summary(input) -> MemorySummary:

Comment on lines +11 to +12
except (FileNotFoundError, EOFError, pickle.UnpicklingError):
self.graph = nx.Graph() # Create a new graph if loading failed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catching broad exceptions can mask other unexpected issues. Specify the exceptions that you expect to handle.

- except (FileNotFoundError, EOFError, pickle.UnpicklingError):
+ except (FileNotFoundError, EOFError, pickle.UnpicklingError) as e:
+     logging.error(f"Error loading graph: {e}")

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
except (FileNotFoundError, EOFError, pickle.UnpicklingError):
self.graph = nx.Graph() # Create a new graph if loading failed
except (FileNotFoundError, EOFError, pickle.UnpicklingError) as e:
logging.error(f"Error loading graph: {e}")
self.graph = nx.Graph() # Create a new graph if loading failed

Comment on lines +63 to +67
def generate_graph_semantic_memory_document_summary(self, document_summary, unique_graphdb_mapping_values, document_namespace, user_id):
for node, attributes in unique_graphdb_mapping_values.items():
self.graph.add_node(node, **attributes)
self.graph.add_edge(f"{user_id}_semantic", node, relation='HAS_KNOWLEDGE')
self.save_graph()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method generate_graph_semantic_memory_document_summary does not use the document_summary parameter. If it's not needed, consider removing it.

- def generate_graph_semantic_memory_document_summary(self, document_summary, unique_graphdb_mapping_values, document_namespace, user_id):
+ def generate_graph_semantic_memory_document_summary(self, unique_graphdb_mapping_values, document_namespace, user_id):

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
def generate_graph_semantic_memory_document_summary(self, document_summary, unique_graphdb_mapping_values, document_namespace, user_id):
for node, attributes in unique_graphdb_mapping_values.items():
self.graph.add_node(node, **attributes)
self.graph.add_edge(f"{user_id}_semantic", node, relation='HAS_KNOWLEDGE')
self.save_graph()
def generate_graph_semantic_memory_document_summary(self, unique_graphdb_mapping_values, document_namespace, user_id):
for node, attributes in unique_graphdb_mapping_values.items():
self.graph.add_node(node, **attributes)
self.graph.add_edge(f"{user_id}_semantic", node, relation='HAS_KNOWLEDGE')
self.save_graph()

@Vasilije1990 Vasilije1990 merged commit e189213 into main Jan 23, 2024
3 of 4 checks passed
@Vasilije1990 Vasilije1990 deleted the feature/test_deployment_of_docker branch March 25, 2024 13:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant