Build the docker and push #33

Vasilije1990 · 2023-12-16T14:17:36Z

Summary by CodeRabbit

New Features
- Enhanced CI/CD workflows with Docker image repository configuration and Terraform integration.
- Introduced new FastAPI-based web service with multiple endpoints for memory management and database interactions.
- Added support for language detection and translation using AWS Translate in language processing utilities.
- Implemented new classes and methods for managing embeddings, vector databases, job statuses, and graph databases.
- Expanded vector database functionality with support for different search types and filtering options.
Improvements
- Upgraded license from MIT to Apache License, Version 2.0.
- Improved Docker image with the addition of libgl1-mesa-glx package.
- Refined database configuration management with centralized Config object.
- Enhanced classifier function to accept different parameters and return classifier ID.
- Added memory_category field to database models for better memory categorization.
Documentation
- Updated environment variable template with new AWS and QDRANT API keys and URLs.
Bug Fixes
- Fixed issues with database connection parameter retrieval in database creation scripts.
- Adjusted logging for the search type in vector database operations.
Refactor
- Transitioned synchronous retry mechanisms to asynchronous in OpenAI tools.
- Reorganized database settings to use centralized configuration for PostgreSQL.
- Streamlined chunk processing and document loading in vector database operations.
- Centralized configuration management in various database-related scripts.
Style
- Cleaned up code by removing trailing whitespace and commented-out code blocks.
- Reorganized imports and modified exception handling in vector database chunkers.
Chores
- Updated entrypoint script to export ENV variable and commented out the execution of the create_database.py script.

coderabbitai · 2023-12-16T14:17:42Z

Walkthrough

The changes reflect a significant update to a software project, enhancing its CI/CD pipelines, refining database interactions, and expanding its cognitive architecture. The project shifts to the Apache License and introduces new functionalities for language processing, memory management, and vector database operations. It also improves Docker configurations and adds a FastAPI-based web service, signaling a broad enhancement of the system's capabilities and infrastructure.

Changes

File Path	Change Summary
`.github/actions/image_builder/action.yaml` `.github/workflows/cd.yaml` `.github/workflows/cd_prd.yaml`	Updated CI/CD workflows, Docker image repo configurations, `libgl1-mesa-glx` package installation, and Terraform triggers.
`LICENSE`	Changed from MIT to Apache License, Version 2.0.
`bin/dockerize` `.env.template` `entrypoint.sh`	Altered Docker-related script, environment variable declarations, and script execution.
`cognitive_architecture/openai_tools.py` `cognitive_architecture/database/postgres/database.py` `cognitive_architecture/database/vectordb/vectordb.py`	Refactored asynchronous retry mechanisms, centralized database settings, and expanded database functionalities.
`api.py` `Dockerfile` `cognitive_architecture/classifiers/classifier.py`	Introduced a FastAPI-based web service, updated Docker configurations, and modified classifier logic.
`cognitive_architecture/database/postgres/models/...` `cognitive_architecture/database/vectordb/...`	Added and updated functionalities for database models and vector database management.
`cognitive_architecture/database/graph_database/...` `cognitive_architecture/shared/language_processing.py`	Enhanced graph database logic and added language processing functionalities.
`cognitive_architecture/utils.py` `cognitive_architecture/fetch_secret.py` `cognitive_architecture/database/create_database_tst.py` `main.py`	Updated utility functions, enhanced secret fetching, introduced database interaction functionalities, and expanded memory management and document processing.

🐇 "With code refined and systems aligned,
🌟 New features shine, so deftly designed.
📜 From Apache's embrace to a FastAPI face,
Our project leaps through the digital space."

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit-tests for this file.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit tests for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository from git and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit tests.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
The JSON schema for the configuration file is available here.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

coderabbitai

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 05aaee6 and c43536c.

Files selected for processing (1)

README.md (1 hunks)

Files skipped from review due to trivial changes (1)

README.md

coderabbitai

Review Status

Actionable comments generated: 5

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between c43536c and dfb13a2.

Files ignored due to filter (6)

assets/Dashboard_example.png
assets/User_graph.png
assets/infographic_final.png
docker-compose.yml
poetry.lock
pyproject.toml

Files selected for processing (7)

.github/actions/image_builder/action.yaml (1 hunks)
.github/workflows/cd.yaml (2 hunks)
.github/workflows/cd_prd.yaml (2 hunks)
.github/workflows/ci.yaml (1 hunks)
LICENSE (1 hunks)
bin/dockerize (2 hunks)
cognitive_architecture/openai_tools.py (1 hunks)

Files skipped from review due to trivial changes (1)

.github/actions/image_builder/action.yaml

Additional comments: 13

.github/workflows/cd.yaml (2)

50-51: The ecr_image_repo_name and dockerfile_location have been updated. Verify that the new Docker image repository name and Dockerfile location are consistent with the intended Docker configurations and naming conventions used across the project.

61-76: The new apply_tf job is introduced to trigger a Terraform apply workflow. Ensure that the PAT_FOR_CROSS_REPOS_CICD_TRIGGERING secret is properly configured for cross-repository workflow dispatch and that the workflow terraform.apply.yml exists and is set up to be triggered from this repository.
.github/workflows/cd_prd.yaml (3)
23-57: The steps added to the publish_docker_to_ecr job for setting environment variables, configuring AWS credentials, and creating and pushing Docker images appear to be correctly implemented. However, it's important to verify that the .github/actions/image_builder action exists and is configured properly, as it is being used to build and push the Docker image.

Verification successful

The verification of the .github/actions/image_builder action has been successful. The directory exists within the .github/actions/ path as confirmed by the output of the shell script.
Scripts Executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Verify that the image_builder GitHub action exists and check its configuration.
fd --exec ls -la '{}'/image_builder '*/.github/actions/'
Length of output: 34680

Script:
#!/bin/bash
# Search for the 'image_builder' directory within the '.github/actions/' path.
fd 'image_builder' '.github/actions/'
Length of output: 66
* 81-82: The output `promethai_docker_tag_backend` is correctly set to be used by downstream jobs or steps. This is a good practice for passing data between jobs in a GitHub Actions workflow.

84-99: The apply_tf job is configured to trigger a Terraform apply workflow after the Docker image is published. The use of actions/github-script to programmatically dispatch a workflow is correct. Ensure that the PAT_FOR_CROSS_REPOS_CICD_TRIGGERING secret is set with the appropriate permissions and that the PromethAI-Infra repository and terraform.apply.yml workflow exist and are configured to accept dispatch events.
.github/workflows/ci.yaml (2)

23-23: The docker_login and publish flags are set to "false" in the dockerize command. Confirm that this is the intended behavior for the CI workflow and that there are no scenarios where these should be set to "true".

24-24: The Docker image version is now being read from /tmp/.DOCKER_IMAGE_VERSION. Verify that the dockerize script writes the Docker image version to this new path and that there are no conflicts with other processes that may use the /tmp directory.

LICENSE (1)

1-201: The change from the MIT License to the Apache License, Version 2.0, is a significant alteration to the terms under which the software is provided. Ensure that this change is intentional and that all contributors are aware of the implications of this change. Additionally, verify that all necessary documentation and code headers are updated to reflect the new licensing terms.

bin/dockerize (2)

11-11: The change to the IMAGE_NAME variable removes the "-memory" suffix from the Docker image name. Ensure that this new naming convention is consistent with the intended Docker image naming strategy across the project.

15-15: The APP_DIR variable has been changed from "level_2/" to the current directory ".". Verify that this change aligns with the directory structure of the application and that the Docker build context is correctly set to include all necessary files.

cognitive_architecture/openai_tools.py (3)

108-110: The use of the asynchronous retry decorator @aretry_with_exponential_backoff on acompletions_with_backoff is correct, but ensure that all calls to acompletions_with_backoff have been updated to be awaited, as this change affects the outward-facing interface of the codebase.

119-125: The function async_get_embedding_with_backoff is correctly implemented as an asynchronous function. It's important to verify that this function is awaited wherever it's called, and that the introduction of this function does not conflict with any existing synchronous code paths.

128-137: The synchronous functions create_embedding_with_backoff and get_embedding_with_backoff remain unchanged and should continue to work as before. However, it's important to ensure that the introduction of their asynchronous counterparts does not lead to confusion or misuse in the codebase.

coderabbitai · 2023-12-16T14:28:28Z

bin/dockerize

  fi
  docker push "${FULL_IMAGE_NAME}" &&
    echo "Successfully pushed docker image ${FULL_IMAGE_NAME} to ECR repository"
-fi
+fi


The nested if statement checking the value of PUBLISH is redundant and can be removed for clarity and simplicity.

if [ "${PUBLISH}" = true ]; then - echo "logging in" - aws ecr get-login-password --region "${AWS_REGION}" | docker login --username AWS --password-stdin "${AWS_REPOSITORY}" - if [ "${PUBLISH}" = true ]; then - echo "logging in" - aws ecr get-login-password --region "${AWS_REGION}" | docker login --username AWS --password-stdin "${AWS_REPOSITORY}" fi

Committable suggestion

❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

fi

docker push "${FULL_IMAGE_NAME}" &&

echo "Successfully pushed docker image ${FULL_IMAGE_NAME} to ECR repository"

fi

fi

fi

docker push "${FULL_IMAGE_NAME}" &&

echo "Successfully pushed docker image ${FULL_IMAGE_NAME} to ECR repository"

fi

coderabbitai · 2023-12-16T14:28:28Z

bin/dockerize

 SHA_SHORT="$(git rev-parse --short HEAD)"
 CUR_DATE="$(date +%Y%m%d%H%M%S)"
 VERSION="$STAGE-$CUR_DATE-$SHA_SHORT"
-IMAGE_NAME=${image_name:-promethai-${STAGE}-promethai-backend-memory}
+IMAGE_NAME=${image_name:-promethai-${STAGE}-promethai-backend}

 REPO_NAME="${AWS_REPOSITORY}/${IMAGE_NAME}"
 FULL_IMAGE_NAME="${REPO_NAME}:${VERSION}"
-APP_DIR=${app_dir:-"level_2/"}   # Updated this line
+APP_DIR=${app_dir:-"."}

 PUBLISH=${publish:-false}



Note: This review was outside the patches, and no patch overlapping with it was found. Original lines [19-19]

The echo statement should use the APP_DIR variable instead of app_dir to reflect the directory being used in the build process.

- echo "Building docker image ${FULL_IMAGE_NAME} located in dir ${app_dir}" + echo "Building docker image ${FULL_IMAGE_NAME} located in dir ${APP_DIR}"

coderabbitai · 2023-12-16T14:28:28Z

.github/workflows/cd_prd.yaml

 #            with:
 #              tag_name: ${{ steps.bump_version_and_push_tag.outputs.tag }}
 #              release_name: Release ${{ steps.bump_version_and_push_tag.outputs.tag }}
-#    outputs:
-#      promethai_docker_tag_backend: ${{ steps.export-promethai-docker-tag.outputs.promethai_docker_tag_backend }}
+    outputs:
+      promethai_docker_tag_backend: ${{ steps.export-promethai-docker-tag.outputs.promethai_docker_tag_backend }}

-#  apply_tf:
-#    name: Trigger terraform apply workflow
-#    runs-on: ubuntu-latest
-#    needs: publish_docker_to_ecr
-#    steps:
-#      - name: TF apply workflow triggers step
-#        uses: actions/github-script@v6
-#        with:
-#          github-token: ${{ secrets.PAT_FOR_CROSS_REPOS_CICD_TRIGGERING }}
-#          script: |
-#            await github.rest.actions.createWorkflowDispatch({
-#              owner: 'topoteretes',
-#              repo: 'PromethAI-Infra',
-#              workflow_id: 'terraform.apply.yml',
-#              ref: 'main'
-#            })
+  apply_tf:
+    name: Trigger terraform apply workflow
+    runs-on: ubuntu-latest
+    needs: publish_docker_to_ecr
+    steps:
+      - name: TF apply workflow triggers step
+        uses: actions/github-script@v6
+        with:
+          github-token: ${{ secrets.PAT_FOR_CROSS_REPOS_CICD_TRIGGERING }}
+          script: |
+            await github.rest.actions.createWorkflowDispatch({
+              owner: 'topoteretes',
+              repo: 'PromethAI-Infra',
+              workflow_id: 'terraform.apply.yml',
+              ref: 'main'
+            })


Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [59-80]

The commented-out code for creating tags and releases is still present in the workflow file. If these steps have been implemented elsewhere, this commented-out code should be removed to avoid confusion and maintain cleanliness of the workflow file.

- # - name: Create Tag and Release - # runs-on: ubuntu-latest - # uses: actions/checkout@v3 - # needs: publish_docker_to_ecr # ensure this job runs after Docker image is pushed - # steps: - # - name: Check out code - # uses: actions/checkout@v3 - # - name: Bump version and push tag - # id: bump_version_and_push_tag - # uses: anothrNick/[email protected] - # env: - # GITHUB_TOKEN: ${{ secrets.PAT_FOR_CROSS_REPOS_CICD_TRIGGERING }} - # WITH_V: true - # DEFAULT_BUMP: 'minor' # or 'minor' or 'major' - # - name: Create Release - # id: create_release - # uses: actions/create-release@v1 - # env: - # GITHUB_TOKEN: ${{ secrets.PAT_FOR_CROSS_REPOS_CICD_TRIGGERING }} - # with: - # tag_name: ${{ steps.bump_version_and_push_tag.outputs.tag }} - # release_name: Release ${{ steps.bump_version_and_push_tag.outputs.tag }}

coderabbitai · 2023-12-16T14:28:28Z

cognitive_architecture/openai_tools.py

+def aretry_with_exponential_backoff(
+    func,
+    initial_delay: float = 1,
+    exponential_base: float = 2,
+    jitter: bool = True,
+    max_retries: int = 20,
+    errors: tuple = (openai.error.RateLimitError,),
+):
+    """Retry a function with exponential backoff."""
+
+    async def wrapper(*args, **kwargs):
+        # Initialize variables
+        num_retries = 0
+        delay = initial_delay
+
+        # Loop until a successful response or max_retries is hit or an exception is raised
+        while True:
+            try:
+                return await func(*args, **kwargs)
+
+            # Retry on specified errors
+            except errors as e:
+                print(f"acreate (backoff): caught error: {e}")
+                # Increment retries
+                num_retries += 1
+
+                # Check if max retries has been reached
+                if num_retries > max_retries:
+                    raise Exception(f"Maximum number of retries ({max_retries}) exceeded.")
+
+                # Increment the delay
+                delay *= exponential_base * (1 + jitter * random.random())
+
+                # Sleep for the delay
+                await asyncio.sleep(delay)
+
+            # Raise exceptions for any errors not specified
+            except Exception as e:
+                raise e
+
+    return wrapper


The new asynchronous retry function aretry_with_exponential_backoff seems correctly implemented, mirroring the logic of the synchronous version but using async and await keywords. However, the exception handling prints an error message for caught RateLimitError but does not log or handle other exceptions, which could be inconsistent with the synchronous version's behavior. Consider adding logging for all exceptions or re-raising them to maintain consistency with the synchronous version.

- except errors as e: + except errors as rate_limit_error: - print(f"acreate (backoff): caught error: {e}") + logging.error(f"Retrying due to rate limit: {rate_limit_error}") - except Exception as e: + except Exception as general_error: - raise e + logging.error(f"An unexpected error occurred: {general_error}") + raise

Committable suggestion

❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

def aretry_with_exponential_backoff(

func,

initial_delay: float = 1,

exponential_base: float = 2,

jitter: bool = True,

max_retries: int = 20,

errors: tuple = (openai.error.RateLimitError,),

):

"""Retry a function with exponential backoff."""

async def wrapper(*args, **kwargs):

# Initialize variables

num_retries = 0

delay = initial_delay

# Loop until a successful response or max_retries is hit or an exception is raised

while True:

try:

return await func(*args, **kwargs)

# Retry on specified errors

except errors as e:

print(f"acreate (backoff): caught error: {e}")

# Increment retries

num_retries += 1

# Check if max retries has been reached

if num_retries > max_retries:

raise Exception(f"Maximum number of retries ({max_retries}) exceeded.")

# Increment the delay

delay *= exponential_base * (1 + jitter * random.random())

# Sleep for the delay

await asyncio.sleep(delay)

# Raise exceptions for any errors not specified

except Exception as e:

raise e

return wrapper

def aretry_with_exponential_backoff(

func,

initial_delay: float = 1,

exponential_base: float = 2,

jitter: bool = True,

max_retries: int = 20,

errors: tuple = (openai.error.RateLimitError,),

):

"""Retry a function with exponential backoff."""

async def wrapper(*args, **kwargs):

# Initialize variables

num_retries = 0

delay = initial_delay

# Loop until a successful response or max_retries is hit or an exception is raised

while True:

try:

return await func(*args, **kwargs)

# Retry on specified errors

except errors as rate_limit_error:

logging.error(f"Retrying due to rate limit: {rate_limit_error}")

# Increment retries

num_retries += 1

# Check if max retries has been reached

if num_retries > max_retries:

raise Exception(f"Maximum number of retries ({max_retries}) exceeded.")

# Increment the delay

delay *= exponential_base * (1 + jitter * random.random())

# Sleep for the delay

await asyncio.sleep(delay)

# Raise exceptions for any errors not specified

except Exception as general_error:

logging.error(f"An unexpected error occurred: {general_error}")

raise

return wrapper

coderabbitai · 2023-12-16T14:28:29Z

cognitive_architecture/openai_tools.py

+@aretry_with_exponential_backoff
+async def acreate_embedding_with_backoff(**kwargs):
+    """Wrapper around Embedding.acreate w/ backoff"""
+    return await openai.Embedding.acreate(**kwargs)


💡 NOTE
Codebase verification is a beta feature.

The verification process has identified that the asynchronous function acreate_embedding_with_backoff is used correctly with the await keyword in at least one instance within the codebase. However, there is also a synchronous call to create_embedding_with_backoff present, which may indicate that not all parts of the code have been updated to use the new asynchronous function. It is recommended to verify whether the synchronous call is intentional or if it requires updating to the asynchronous pattern to maintain consistency and prevent potential issues.

The asynchronous call is found at ./cognitive_architecture/openai_tools.py:123.

The synchronous call is found at ./cognitive_architecture/openai_tools.py:135.

Analysis chain

The function acreate_embedding_with_backoff is correctly implemented as an asynchronous function with the retry decorator. Ensure that all calls to this function are awaited and that the function's introduction does not break any existing code that relies on the synchronous version.

Scripts Executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Search for calls to `acreate_embedding_with_backoff` to ensure they are awaited. ast-grep --lang python --pattern $'await acreate_embedding_with_backoff($$$)'

Length of output: 202

Script:

#!/bin/bash # Search for all instances of `acreate_embedding_with_backoff` to ensure they are all awaited. ast-grep --lang python --pattern $'await acreate_embedding_with_backoff($$$)'

Length of output: 202

Script:

#!/bin/bash # Search for any calls to a synchronous version of the function, which might be named without the 'a' prefix. ast-grep --lang python --pattern $'create_embedding_with_backoff($$$)'

Length of output: 188

coderabbitai

Review Status

Actionable comments generated: 13

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between dfb13a2 and 6e47c1f.

Files ignored due to filter (2)

poetry.lock
pyproject.toml

Files selected for processing (5)

.env.template (1 hunks)
cognitive_architecture/database/postgres/database_crud.py (1 hunks)
cognitive_architecture/database/vectordb/vectordb.py (5 hunks)
cognitive_architecture/vectorstore_manager.py (1 hunks)
main.py (6 hunks)

Files skipped from review due to trivial changes (2)

.env.template
cognitive_architecture/vectorstore_manager.py

Additional comments: 9

cognitive_architecture/database/postgres/database_crud.py (1)

54-54: Explicit commit inside the update_entity function is correct and aligns with the async nature of the function.

cognitive_architecture/database/vectordb/vectordb.py (5)

154-154: The addition of params['user_id'] ensures that the user ID is included in the parameters for memory operations, which is likely important for associating memories with the correct user.

176-176: Incrementing params['chunk_order'] within the loop correctly assigns an order to each chunk, which is important for maintaining the sequence of the data.

258-289: The addition of a new search_type == 'summary' condition with a specific filter object is a significant change. Ensure that this new search type and its logic are documented and tested to verify that it behaves as expected.

314-315: The return statement for the fetch_memories method is correctly placed after all conditional logic, ensuring that the method returns the query output regardless of the search_type.

320-321: The check for namespace being None and the subsequent assignment to self.namespace is consistent with the rest of the methods in the class, ensuring that the default namespace is used when none is provided.

main.py (3)

8-8: The addition of update_entity_graph_summary to imports aligns with the PR objectives and summary.

180-209: The signature of the add_documents_to_graph_db function has been updated by removing the stupid_local_testing_flag parameter, which aligns with the PR objectives and summary.

332-332: The update operation for DocsModel in the main function aligns with the PR objectives and summary.

coderabbitai · 2023-12-17T20:51:50Z

cognitive_architecture/database/postgres/database_crud.py

+def update_entity_graph_summary(session, model, entity_id, new_value):
+    with session_scope(session) as s:
+        # Retrieve the entity from the database
+        entity = s.query(model).filter_by(id=entity_id).first()
+        if entity:
+            entity.graph_summary = new_value
+            s.commit()
+            return "Successfully updated entity"
+        else:
+            return "Entity not found"


The update_entity_graph_summary function is synchronous, which is inconsistent with the rest of the database functions that are designed to be asynchronous. Consider refactoring this function to be asynchronous for consistency and to avoid potential integration issues with other parts of the system.

- def update_entity_graph_summary(session, model, entity_id, new_value): + async def update_entity_graph_summary(session, model, entity_id, new_value): - with session_scope(session) as s: + async with session_scope(session) as s: - s.commit() + await s.commit()

Committable suggestion

❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

def update_entity_graph_summary(session, model, entity_id, new_value):

with session_scope(session) as s:

# Retrieve the entity from the database

entity = s.query(model).filter_by(id=entity_id).first()

if entity:

entity.graph_summary = new_value

s.commit()

return "Successfully updated entity"

else:

return "Entity not found"

async def update_entity_graph_summary(session, model, entity_id, new_value):

async with session_scope(session) as s:

# Retrieve the entity from the database

entity = s.query(model).filter_by(id=entity_id).first()

if entity:

entity.graph_summary = new_value

await s.commit()

return "Successfully updated entity"

else:

return "Entity not found"

coderabbitai · 2023-12-17T20:51:50Z

cognitive_architecture/database/postgres/database_crud.py

@@ -34,16 +34,24 @@ async def add_entity(session, entity):
        return "Successfully added entity"


-
+def update_entity_graph_summary(session, model, entity_id, new_value):
+    with session_scope(session) as s:


The session_scope context manager is asynchronous and should be used with async with, not with. This will cause a runtime error as it stands.

- with session_scope(session) as s: + async with session_scope(session) as s:

Committable suggestion

❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

with session_scope(session) as s:

async with session_scope(session) as s:

coderabbitai · 2023-12-17T20:51:50Z

cognitive_architecture/database/postgres/database_crud.py

+        entity = s.query(model).filter_by(id=entity_id).first()
+        if entity:
+            entity.graph_summary = new_value
+            s.commit()


The session.commit() call inside update_entity_graph_summary is redundant because the session_scope context manager already takes care of committing the session. This line should be removed to avoid confusion and redundancy.

- s.commit()

Committable suggestion

❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

s.commit()

coderabbitai · 2023-12-17T20:51:51Z

main.py

-        #         neo4j_graph_db.update_document_node_with_namespace(user_id, vectordb_namespace=memory_name, document_id=doc_id)
-        # else:
-        try:
+        logging.info("Docs are", docs)


Incorrect logging statement syntax; it will cause a runtime error. Use proper string formatting.

- logging.info("Docs are", docs) + logging.info("Docs are %s", docs)

Committable suggestion

❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

logging.info("Docs are", docs)

logging.info("Docs are %s", docs)

coderabbitai · 2023-12-17T20:51:51Z

main.py

-        try:
+        logging.info("Docs are", docs)
+        for doc, memory_name in zip(docs, memory_names):
+            logging.info("Memory names are", memory_name)


Incorrect logging statement syntax; it will cause a runtime error. Use proper string formatting.

- logging.info("Memory names are", memory_name) + logging.info("Memory names are %s", memory_name)

Committable suggestion

❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

logging.info("Memory names are", memory_name)

logging.info("Memory names are %s", memory_name)

coderabbitai · 2023-12-17T20:51:51Z

main.py

-        await update_entity(session, DocsModel, doc_id, True)
-    # except:
-    #     pass
+            logging.info("Classification content is", classification_content)


Incorrect logging statement syntax; it will cause a runtime error. Use proper string formatting.

- logging.info("Classification content is", classification_content) + logging.info("Classification content is %s", classification_content)

Committable suggestion

❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

logging.info("Classification content is", classification_content)

logging.info("Classification content is %s", classification_content)

coderabbitai · 2023-12-17T20:51:51Z

main.py

+            doc_name, doc_id = doc
+            classification = await classify_documents(doc_name, document_id =doc_id, content=concatenated_retrievals)
+
+            logging.info("Classification is", str(classification))


Incorrect logging statement syntax; it will cause a runtime error. Use proper string formatting.

- logging.info("Classification is", str(classification)) + logging.info("Classification is %s", str(classification))

Committable suggestion

❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

logging.info("Classification is", str(classification))

logging.info("Classification is %s", str(classification))

coderabbitai · 2023-12-17T20:51:51Z

main.py

+            neo4j_graph_db.query(rs, classification)
+            neo4j_graph_db.update_document_node_with_namespace(user_id, vectordb_namespace=memory_name,
+                                                               document_id=doc_id)
+            logging.info("hERE IS THE OUT2323", doc_id)


Incorrect logging statement syntax; it will cause a runtime error. Use proper string formatting.

- logging.info("hERE IS THE OUT2323", doc_id) + logging.info("hERE IS THE OUT2323 %s", doc_id)

Committable suggestion

❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

logging.info("hERE IS THE OUT2323", doc_id)

logging.info("hERE IS THE OUT2323 %s", doc_id)

coderabbitai · 2023-12-17T20:51:51Z

main.py

+        await update_entity(session, DocsModel, "8cd9a022-5a7a-4af5-815a-f988415536ae", True)
        # out = await get_vectordb_namespace(session, user_id)
-        params = {
-            "version": "1.0",
-            "agreement_id": "AG123456",
-            "privacy_policy": "https://example.com/privacy",
-            "terms_of_service": "https://example.com/terms",
-            "format": "json",
-            "schema_version": "1.1",
-            "checksum": "a1b2c3d4e5f6",
-            "owner": "John Doe",
-            "license": "MIT",
-            "validity_start": "2023-08-01",
-            "validity_end": "2024-07-31",
-        }
-        loader_settings = {
-            "format": "PDF",
-            "source": "DEVICE",
-            "path": [".data"],
-            "strategy": "SUMMARY",
-        }
-        await load_documents_to_vectorstore(session, user_id, loader_settings=loader_settings)
-        await user_query_to_graph_db(session, user_id, "I walked in the forest yesterday and added to my list I need to buy some milk in the store and get a summary from a classical book i read yesterday")
-        await add_documents_to_graph_db(session, user_id, loader_settings=loader_settings)
-        await user_context_enrichment(session, user_id, query="Tell me about the book I read yesterday")
+        # params = {
+        #     "version": "1.0",
+        #     "agreement_id": "AG123456",
+        #     "privacy_policy": "https://example.com/privacy",
+        #     "terms_of_service": "https://example.com/terms",
+        #     "format": "json",
+        #     "schema_version": "1.1",
+        #     "checksum": "a1b2c3d4e5f6",
+        #     "owner": "John Doe",
+        #     "license": "MIT",
+        #     "validity_start": "2023-08-01",
+        #     "validity_end": "2024-07-31",
+        # }
+        # loader_settings = {
+        #     "format": "PDF",
+        #     "source": "DEVICE",
+        #     "path": [".data"],
+        #     "strategy": "SUMMARY",
+        # }
+        # await load_documents_to_vectorstore(session, user_id, loader_settings=loader_settings)
+        # await user_query_to_graph_db(session, user_id, "I walked in the forest yesterday and added to my list I need to buy some milk in the store and get a summary from a classical book i read yesterday")
+        # await add_documents_to_graph_db(session, user_id, loader_settings=loader_settings)
+        # await user_context_enrichment(session, user_id, query="Tell me about the book I read yesterday")


Commented-out code should be removed or clarified with a comment explaining why it's retained.

coderabbitai · 2023-12-17T20:51:51Z

cognitive_architecture/database/vectordb/vectordb.py

            search_type = 'hybrid'
        logging.info("The search type is s%", search_type)

-        if search_type == 'summary':
-            from weaviate.classes import Filter
-            client = weaviate.connect_to_wcs(
-                cluster_url=config.weaviate_url,
-                auth_credentials=weaviate.AuthApiKey(config.weaviate_api_key)
-            )

-            summary_collection = client.collections.get(self.namespace)
-            response = summary_collection.query.fetch_objects(
-                filters=Filter("user_id").equal(self.user_id) &
-                        Filter("chunk_order").less_than(25),
-                limit=15
-            )
-            return response

        if not namespace:
            namespace = self.namespace


The check for search_type being None and the subsequent assignment to 'hybrid' should be moved outside of the fetch_memories method to ensure that the default value is set before the method is called. This will prevent the method from being called with an undefined search_type.

- if search_type is None: - search_type = 'hybrid' + # Default search_type should be set before calling fetch_memories

Committable suggestion

❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

search_type = 'hybrid'

logging.info("The search type is s%", search_type)

if search_type == 'summary':

from weaviate.classes import Filter

client = weaviate.connect_to_wcs(

cluster_url=config.weaviate_url,

auth_credentials=weaviate.AuthApiKey(config.weaviate_api_key)

)

summary_collection = client.collections.get(self.namespace)

response = summary_collection.query.fetch_objects(

filters=Filter("user_id").equal(self.user_id) &

Filter("chunk_order").less_than(25),

limit=15

)

return response

if not namespace:

namespace = self.namespace

search_type = 'hybrid'

logging.info("The search type is s%", search_type)

# Default search_type should be set before calling fetch_memories

if not namespace:

namespace = self.namespace

coderabbitai

Review Status

Actionable comments generated: 6

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 6e47c1f and 5ed04de.

Files selected for processing (7)

.env.template (1 hunks)
cognitive_architecture/config.py (1 hunks)
cognitive_architecture/database/create_database.py (2 hunks)
cognitive_architecture/database/postgres/database.py (1 hunks)
cognitive_architecture/database/vectordb/vectordb.py (5 hunks)
entrypoint.sh (1 hunks)
main.py (7 hunks)

Files skipped from review due to trivial changes (1)

.env.template

Additional comments: 8

entrypoint.sh (1)

1-6: The addition of the ENV environment variable should be verified to ensure it's utilized within the Python scripts or the Gunicorn service that this script is starting. If ENV is not used, consider removing it to avoid unnecessary environment variables.

cognitive_architecture/config.py (1)

34-53: The conditional assignments for configuration parameters based on environment variables are correctly implemented. Ensure that the environment variables used here are documented and that their values are set in the environment where the application runs.

cognitive_architecture/database/vectordb/vectordb.py (3)

154-154: The addition of user_id to params within the add_memories method should be verified to ensure that it aligns with the method's expected behavior and that the user_id is indeed required for the operation.

172-177: The change in chunk_size to 300 should be verified to ensure it aligns with the expected behavior of the chunk_data function and that it does not negatively impact the performance or accuracy of the memory addition process.

258-289: The addition of the 'summary' search type in the fetch_memories method should be verified to ensure that it aligns with the expected behavior and that the logic for fetching summaries is correctly implemented.

main.py (3)

8-8: The addition of update_entity_graph_summary to the import statement should be verified to ensure that it is used within the file and that the function is correctly implemented where it is called.

180-209: The modifications in the add_documents_to_graph_db function, including the addition of a try-except block and changes in the classification process, should be verified to ensure that they are correctly implemented and that they align with the expected behavior of the function.

332-332: The addition of the update_entity call within the main function should be verified to ensure that it is correctly implemented and that it aligns with the expected behavior of the function.

coderabbitai · 2023-12-21T17:51:39Z

cognitive_architecture/database/postgres/database.py

+from cognitive_architecture.config import Config
+config = Config()
+config.load()


 # in seconds
 MAX_RETRIES = 3
 RETRY_DELAY = 5

-username = os.getenv('POSTGRES_USER')
-password = os.getenv('POSTGRES_PASSWORD')
-database_name = os.getenv('POSTGRES_DB')
-import os

-environment = os.environ.get("ENVIRONMENT")

-if environment == "local":
-    host= os.getenv('POSTGRES_HOST')
+host = config.postgres_host
+username = config.postgres_user
+password = config.postgres_password
+database_name = config.postgres_db


The refactoring to use a Config object for database configuration parameters is a good practice for maintainability and centralization of configuration. Ensure that all instances in the codebase that previously accessed these environment variables directly are updated to use the Config object.

coderabbitai · 2023-12-21T17:51:39Z

cognitive_architecture/database/create_database.py

+from cognitive_architecture.config import Config
+config = Config()
+config.load()
+


 from postgres.database import Base


The use of the Config object to load and assign database connection parameters is a good practice for maintainability and centralization of configuration. Ensure that all instances in the codebase that previously accessed these environment variables directly are updated to use the Config object.

coderabbitai · 2023-12-21T17:51:40Z

cognitive_architecture/database/create_database.py

+    #
+    # username = os.getenv('POSTGRES_USER')
+    # password = os.getenv('POSTGRES_PASSWORD')
+    # database_name = os.getenv('POSTGRES_DB')
+    # environment = os.environ.get("ENV")
+    host  = config.postgres_host
+    username = config.postgres_user
+    password = config.postgres_password
+    database_name = config.postgres_db

    engine = create_admin_engine(username, password, host, database_name)


Ensure that the create_admin_engine function is updated to use the new config object for database connection parameters, as the old environment variable access has been replaced.

coderabbitai · 2023-12-21T17:51:40Z

cognitive_architecture/database/vectordb/vectordb.py

                       chunk_overlap=20)]
            for doc in documents[0]:
                chunk_count += 1
                params['chunk_order'] = chunk_count
+
                # document_to_load = self._stuct(observation, params, metadata_schema_class)

                logging.info("Loading document with defautl loader settings %s", str(doc))


There is a typo in the log message: "defautl" should be corrected to "default".

- logging.info("Loading document with defautl loader settings %s", str(doc)) + logging.info("Loading document with default loader settings %s", str(doc))

Committable suggestion

❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

logging.info("Loading document with defautl loader settings %s", str(doc))

logging.info("Loading document with default loader settings %s", str(doc))

coderabbitai

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 5ed04de and 9293c74.

Files selected for processing (2)

cognitive_architecture/database/create_database.py (2 hunks)
cognitive_architecture/database/postgres/database.py (1 hunks)

Additional comments: 7

cognitive_architecture/database/postgres/database.py (4)

18-20: The introduction of the Config object and the call to config.load() are good practices for centralizing configuration management.

29-32: The assignment of database connection parameters from the Config object is correct and improves maintainability by centralizing configuration settings.

37-37: Verify that the SQLALCHEMY_DATABASE_URL is correctly formatted and that the Config object is loaded before this line to ensure database connectivity.

36-37: The retry mechanism in safe_db_operation appears to be correctly implemented, using the MAX_RETRIES and RETRY_DELAY constants.

cognitive_architecture/database/create_database.py (3)

12-14: The introduction of the Config object and the call to config.load() are good practices for centralizing configuration management.

66-69: The assignment of database connection parameters from the Config object is correct and improves maintainability by centralizing configuration settings.

61-71: Ensure that the create_admin_engine, database_exists, create_database, and create_tables functions are updated to use the new Config object for database connection parameters, as the old environment variable access has been replaced.

coderabbitai

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 9293c74 and cbde083.

Files selected for processing (1)

cognitive_architecture/database/postgres/database.py (1 hunks)

Files skipped from review as they are similar to previous changes (1)

cognitive_architecture/database/postgres/database.py

coderabbitai

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between cbde083 and 078d697.

Files selected for processing (1)

cognitive_architecture/database/postgres/database.py (1 hunks)

Files skipped from review due to trivial changes (1)

cognitive_architecture/database/postgres/database.py

coderabbitai

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 078d697 and 523f893.

Files selected for processing (1)

cognitive_architecture/database/create_database.py (3 hunks)

Files skipped from review as they are similar to previous changes (1)

cognitive_architecture/database/create_database.py

coderabbitai

Review Status

Actionable comments generated: 2

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 523f893 and 98383b7.

Files selected for processing (1)

api.py (2 hunks)

Additional comments: 1

api.py (1)

157-158: The exception handling in the user_query_classfier function has been changed to raise an HTTPException directly instead of returning a JSONResponse. This is a positive change as it leverages FastAPI's built-in exception handling to return the appropriate response.

coderabbitai · 2023-12-21T22:22:55Z

api.py

+@app.post("/drop-db")
+async def drop_db(payload: Payload):
+    try:
+        decoded_payload = payload.payload
+        return JSONResponse(content={"response": "dropped"}, status_code=200)
+
+    except Exception as e:
+        return HTTPException(status_code=500, detail=str(e))


💡 NOTE
Codebase verification is a beta feature.

The drop_db function in api.py does not appear to have any active authorization checks associated with it. The references to authentication are commented out, indicating that the function could be executed without any security checks, which is a serious security concern.

The drop_db function lacks active authorization checks, potentially allowing unauthorized database operations.

Analysis chain

The drop_db function currently returns a static JSONResponse with the message "dropped" and does not perform any action. This could be a placeholder, but it's important to ensure that the actual database drop logic is implemented securely and with proper authorization checks to prevent accidental or malicious data loss.

Scripts Executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Verify if there are any authorization checks for sensitive operations like dropping the database. rg --type py 'drop_db'

Length of output: 66

Script:

#!/bin/bash # Search for any references to authorization or authentication in the api.py file. rg --type py 'auth|Auth|authorize|Authorize' api.py

Length of output: 154

coderabbitai · 2023-12-21T22:22:55Z

api.py

+    except Exception as e:
+        return HTTPException(status_code=500, detail=str(e))


Consider adding more detailed error handling for the drop_db function. Returning a generic 500 error for all exceptions may not be informative for the client. It would be beneficial to distinguish between different types of errors and handle them accordingly.

coderabbitai

Review Status

Actionable comments generated: 2

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 98383b7 and 118613c.

Files selected for processing (1)

cognitive_architecture/config.py (1 hunks)

Additional comments: 3

cognitive_architecture/config.py (3)

34-37: The environment variable GRAPH_DB_PW_PROD is used for the production password, but there is no equivalent for the development password. Verify if this is intentional or an oversight.

57-57: The load method overrides class attributes with environment variables and then with config file values. Ensure that the order of precedence (environment variables over config file) is intentional and documented.

57-57: The to_dict and from_dict methods are correctly implemented for converting the Config object to and from a dictionary representation. This is good for interoperability with other parts of the system that may require configuration data in dictionary format.

coderabbitai · 2023-12-21T22:41:06Z

cognitive_architecture/config.py

+    if os.getenv('ENV') == 'prod' or os.getenv('ENV') == 'dev' or os.getenv('AWS_ENV') == 'dev' or   os.getenv('AWS_ENV') == 'prd':
+        graph_database_url: str = os.getenv('GRAPH_DB_URL_PROD')
+        graph_database_username: str = os.getenv('GRAPH_DB_USER')
+        graph_database_password: str = os.getenv('GRAPH_DB_PW_PROD')
+    else:
+        graph_database_url: str = os.getenv('GRAPH_DB_URL')
+        graph_database_username: str = os.getenv('GRAPH_DB_USER')
+        graph_database_password: str = os.getenv('GRAPH_DB_PW')
    weaviate_url: str = os.getenv('WEAVIATE_URL')
    weaviate_api_key: str = os.getenv('WEAVIATE_API_KEY')
+    postgres_user: str = os.getenv('POSTGRES_USER')
+    postgres_password: str = os.getenv('POSTGRES_PASSWORD')
+    postgres_db: str = os.getenv('POSTGRES_DB')
+    if os.getenv('ENV') == 'prod' or os.getenv('ENV') == 'dev' or os.getenv('AWS_ENV') == 'dev' or os.getenv('AWS_ENV') == 'prd':
+        postgres_host: str = os.getenv('POSTGRES_PROD_HOST')
+    elif os.getenv('ENV') == 'docker':
+        postgres_host: str = os.getenv('POSTGRES_HOST_DOCKER')
+    elif os.getenv('ENV') == 'local':
+        postgres_host: str = os.getenv('POSTGRES_HOST_LOCAL')
+


The conditional logic for setting database parameters based on environment variables is complex and may be error-prone. Consider refactoring to improve readability and maintainability.

- if os.getenv('ENV') == 'prod' or os.getenv('ENV') == 'dev' or os.getenv('AWS_ENV') == 'dev' or os.getenv('AWS_ENV') == 'prd': + env = os.getenv('ENV') + aws_env = os.getenv('AWS_ENV') + is_prod_or_dev = env in ['prod', 'dev'] or aws_env in ['dev', 'prd'] + if is_prod_or_dev:

Committable suggestion

❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

if os.getenv('ENV') == 'prod' or os.getenv('ENV') == 'dev' or os.getenv('AWS_ENV') == 'dev' or os.getenv('AWS_ENV') == 'prd':

graph_database_url: str = os.getenv('GRAPH_DB_URL_PROD')

graph_database_username: str = os.getenv('GRAPH_DB_USER')

graph_database_password: str = os.getenv('GRAPH_DB_PW_PROD')

else:

graph_database_url: str = os.getenv('GRAPH_DB_URL')

graph_database_username: str = os.getenv('GRAPH_DB_USER')

graph_database_password: str = os.getenv('GRAPH_DB_PW')

weaviate_url: str = os.getenv('WEAVIATE_URL')

weaviate_api_key: str = os.getenv('WEAVIATE_API_KEY')

postgres_user: str = os.getenv('POSTGRES_USER')

postgres_password: str = os.getenv('POSTGRES_PASSWORD')

postgres_db: str = os.getenv('POSTGRES_DB')

if os.getenv('ENV') == 'prod' or os.getenv('ENV') == 'dev' or os.getenv('AWS_ENV') == 'dev' or os.getenv('AWS_ENV') == 'prd':

postgres_host: str = os.getenv('POSTGRES_PROD_HOST')

elif os.getenv('ENV') == 'docker':

postgres_host: str = os.getenv('POSTGRES_HOST_DOCKER')

elif os.getenv('ENV') == 'local':

postgres_host: str = os.getenv('POSTGRES_HOST_LOCAL')

env = os.getenv('ENV')

aws_env = os.getenv('AWS_ENV')

is_prod_or_dev = env in ['prod', 'dev'] or aws_env in ['dev', 'prd']

if is_prod_or_dev:

graph_database_url: str = os.getenv('GRAPH_DB_URL_PROD')

graph_database_username: str = os.getenv('GRAPH_DB_USER')

graph_database_password: str = os.getenv('GRAPH_DB_PW_PROD')

else:

graph_database_url: str = os.getenv('GRAPH_DB_URL')

graph_database_username: str = os.getenv('GRAPH_DB_USER')

graph_database_password: str = os.getenv('GRAPH_DB_PW')

weaviate_url: str = os.getenv('WEAVIATE_URL')

weaviate_api_key: str = os.getenv('WEAVIATE_API_KEY')

postgres_user: str = os.getenv('POSTGRES_USER')

postgres_password: str = os.getenv('POSTGRES_PASSWORD')

postgres_db: str = os.getenv('POSTGRES_DB')

if is_prod_or_dev:

postgres_host: str = os.getenv('POSTGRES_PROD_HOST')

elif env == 'docker':

postgres_host: str = os.getenv('POSTGRES_HOST_DOCKER')

elif env == 'local':

postgres_host: str = os.getenv('POSTGRES_HOST_LOCAL')

coderabbitai · 2023-12-21T22:41:06Z

cognitive_architecture/config.py

+    elif os.getenv('ENV') == 'local':
+        postgres_host: str = os.getenv('POSTGRES_HOST_LOCAL')
+
+





The save method splits attribute names on the first underscore to determine the section name for the config file. This could lead to unexpected behavior if attribute names contain more than one underscore. Consider a more robust approach to determine section names.

…refactor and reformat issues, add boilerplate for loading jobs logic and extend search functionalities.

coderabbitai

Review Status

Actionable comments generated: 21

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 118613c and 2928f51.

Files ignored due to filter (2)

poetry.lock
pyproject.toml

Files selected for processing (21)

.env.template (1 hunks)
Dockerfile (1 hunks)
api.py (1 hunks)
cognitive_architecture/classifiers/classifier.py (1 hunks)
cognitive_architecture/database/create_database.py (3 hunks)
cognitive_architecture/database/graph_database/graph.py (14 hunks)
cognitive_architecture/database/postgres/models/docs.py (1 hunks)
cognitive_architecture/database/postgres/models/memory.py (1 hunks)
cognitive_architecture/database/vectordb/chunkers/chunkers.py (2 hunks)
cognitive_architecture/database/vectordb/cognee_manager.py (1 hunks)
cognitive_architecture/database/vectordb/embeddings/embeddings.py (1 hunks)
cognitive_architecture/database/vectordb/embeddings/embeddings_type.py (1 hunks)
cognitive_architecture/database/vectordb/job.py (1 hunks)
cognitive_architecture/database/vectordb/loaders/loaders.py (1 hunks)
cognitive_architecture/database/vectordb/response.py (1 hunks)
cognitive_architecture/database/vectordb/vector_db_type.py (1 hunks)
cognitive_architecture/database/vectordb/vectordb.py (5 hunks)
cognitive_architecture/shared/language_processing.py (1 hunks)
cognitive_architecture/utils.py (3 hunks)
cognitive_architecture/vectorstore_manager.py (3 hunks)
main.py (1 hunks)

Files skipped from review due to trivial changes (1)

Dockerfile

Additional comments: 56

cognitive_architecture/database/vectordb/embeddings/embeddings_type.py (1)

1-8: The EmbeddingsType enum is well-defined and covers a variety of embedding sources.

cognitive_architecture/database/vectordb/vector_db_type.py (1)

1-13: The VectorDBType enum is well-defined and covers a variety of vector database types.

cognitive_architecture/database/vectordb/job.py (1)

1-18: The Job class is well-defined with clear string representation methods.

cognitive_architecture/database/postgres/models/memory.py (1)

12-18: The addition of the memory_category field to the MemoryModel is correctly implemented. Ensure that corresponding database migration scripts are in place to handle this schema change.

cognitive_architecture/database/vectordb/embeddings/embeddings.py (1)

5-29: The Embeddings class is updated with new parameters and a serialize method. Ensure that the serialization logic correctly handles all new parameters and that the serialize method is tested for various embedding configurations.

cognitive_architecture/database/vectordb/response.py (1)

4-72: The Response class is updated with new fields and class methods for JSON conversion. Ensure that the JSON conversion logic correctly handles all new fields and that the class methods are tested for various response scenarios.

cognitive_architecture/shared/language_processing.py (1)

50-81: The translate_text function uses AWS Translate and has error handling for BotoCoreError and ClientError. Ensure that the error messages are logged and that the function is tested for various error scenarios.
cognitive_architecture/database/create_database.py (2)
27-44: The environment variable handling has been updated to use AWS_ENV. Ensure that this change is reflected throughout the codebase and that the AWS_ENV variable is set in all required environments.

74-83: The drop_database function has been added to handle database deletion. Ensure that this function is protected against accidental execution in production environments and that it is tested thoroughly.
+ # Add a safety check to prevent accidental drops in production
+ if os.environ.get('AWS_ENV') != 'local':
+     raise RuntimeError("Database dropping is not allowed in non-local environments")
cognitive_architecture/database/vectordb/chunkers/chunkers.py (1)
1-1: The import path for ChunkStrategy has been updated. Verify that this new path is correct and that the ChunkStrategy is accessible from this new location.

Verification successful

The shell script output confirms the presence of the ChunkStrategy class within the chunk_strategy.py file. This verifies that the updated import path in chunkers.py is correct and that the ChunkStrategy class is indeed accessible from the new location.

: The import path for ChunkStrategy in chunkers.py is correct and the class is accessible.
Scripts Executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Verify that the ChunkStrategy import is correct and accessible
rg --type py 'from cognitive_architecture.database.vectordb.chunkers.chunk_strategy import ChunkStrategy'
# Expecting to find the ChunkStrategy definition in the specified path
Length of output: 260

Script:
#!/bin/bash
# Check for the presence of the ChunkStrategy class within the chunk_strategy.py file
rg --type py 'class ChunkStrategy' cognitive_architecture/database/vectordb/chunkers/chunk_strategy.py
# Expecting to find the class definition to confirm the import is correct
Length of output: 131
cognitive_architecture/classifiers/classifier.py (1)
76-104: The classify_call function has been updated with new parameters. Verify that all calls to this function have been updated to pass the correct arguments and that the function's logic correctly handles the new document_summaries parameter.

Verification successful

The verification process has confirmed that the calls to the classify_call function within the codebase have been updated to use the new document_summaries parameter. The function's logic in classifier.py and its invocation in main.py are consistent with the changes.

The classify_call function update has been properly handled in the codebase.
Scripts Executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Verify that all calls to classify_call have been updated
rg --type py 'classify_call' --files-with-matches | xargs rg 'document_summaries'
# Expecting to find the updated calls with the new parameter
Length of output: 759
cognitive_architecture/database/vectordb/cognee_manager.py (4)

33-42: The serialize method correctly converts instance attributes to a dictionary, excluding any attributes with None values. This is a clean and efficient way to prepare data for serialization.

68-93: The get_job_statuses method is well-structured and handles different HTTP status codes appropriately. It uses the requests library to make a POST request and constructs the request payload and headers correctly.

121-142: The get_job_status method correctly handles HTTP status codes and uses the requests library to make a GET request. It constructs the URL and headers appropriately and returns a Response object based on the JSON response.

144-151: The generate_headers function constructs request headers and filters out any None values. Ensure that API keys and other sensitive information are securely managed and not hardcoded or exposed.

cognitive_architecture/database/vectordb/loaders/loaders.py (3)

23-25: The fetch_text_content function uses the UnstructuredURLLoader to load text content from a URL. This is a straightforward use of an external loader class.

27-46: The process_content function chunks data and translates it if necessary. It's important to ensure that the language detection and translation services are secure and efficient. Additionally, consider handling exceptions that may arise from these operations.

48-92: The _document_loader function is an asynchronous function that handles document loading and processing. It correctly awaits other async functions and raises a ValueError for unsupported formats. Ensure that the error handling is consistent and that all resources are managed correctly.

cognitive_architecture/utils.py (2)

10-13: The addition of AsyncSessionLocal and load_dotenv imports followed by the invocation of load_dotenv() is a common pattern for loading environment variables. Ensure that .env files are not included in version control for security reasons.

222-229: The modification in the get_unsumarized_vector_db_namespace function to return memory_details and docs instead of just memory_names is a significant change. Ensure that all callers of this function are updated to handle the new return type.

api.py (12)

1-17: The imports and initial setup of the FastAPI application look standard and correct. The use of environment variables for configuration is a good practice.

19-23: The logging configuration is set globally, which is a common practice. However, it's important to ensure that this configuration does not override other logging configurations elsewhere in the application.

48-55: The root endpoint is correctly implemented and returns a simple welcome message.

58-63: The health check endpoint is implemented correctly and follows best practices by providing a simple status check.

128-141: The /user-query-to-graph endpoint's error handling raises an HTTPException directly, which is consistent with FastAPI's practices.

144-162: The /document-to-graph-db endpoint is implemented correctly, but ensure that the add_documents_to_graph_db function handles exceptions appropriately.

164-176: The /cognitive-context-enrichment endpoint is implemented correctly, but ensure that the user_context_enrichment function handles exceptions appropriately.

179-192: The /user-query-classifier endpoint is implemented correctly, but ensure that the classify_user_query function handles exceptions appropriately.

238-266: The /create-public-memory endpoint is implemented correctly, but ensure that the create_public_memory function handles exceptions appropriately.

269-292: The /attach-user-to-public-memory endpoint is implemented correctly, but ensure that the attach_user_to_memory function handles exceptions appropriately.

295-313: The /unlink-user-from-public-memory endpoint is implemented correctly, but ensure that the unlink_user_from_memory function handles exceptions appropriately.

315-329: The start_api_server function is implemented correctly, and the logging of exceptions is a good practice.

cognitive_architecture/database/vectordb/vectordb.py (3)

149-181: The add_memories method has been updated with new logic for chunk processing and document loading. Ensure that the OpenAIEmbeddings and _document_loader functions are tested with the new logic.

201-207: > Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [183-206]

The fetch_memories method has been extended to support different search types. The default assignment of search_type to 'hybrid' should be moved outside the method to ensure that the default value is set before the method is called, as previously commented.

255-313: New search types 'summary' and 'summary_filter_by_object_name' have been added. Ensure that these new search types are properly tested and that the filtering logic is correct.
cognitive_architecture/vectorstore_manager.py (3)
173-173: The addition of default values for job_id and memory_label in the create_memory method signature is a good practice for optional parameters. Ensure that all invocations of this method are updated to reflect these changes if they rely on the previous signature.

254-254: The new memory_category parameter with a default value of 'PUBLIC' in the handle_new_memory method is a sensible addition for categorization purposes. Confirm that this default aligns with the business logic and that all calls to this method are compatible with the new parameter.

Verification successful

The search with rg has provided us with the locations where the handle_new_memory method is called within the vectorstore_manager.py file. From the output, we can see that the method is called twice, and in both instances, it is called with four parameters: user_id, session, job_id, and memory_name. The calls do not explicitly pass the new memory_category parameter, which means they rely on the default value of 'PUBLIC' as specified in the method's definition.

Given this information, we can conclude the following:

The handle_new_memory method has been updated to include a new parameter memory_category with a default value of 'PUBLIC'.

The method is called twice in the codebase, and both calls do not specify the memory_category parameter, thus defaulting to 'PUBLIC'.

There is no evidence from the provided output that the new parameter is causing any issues with existing calls to the method.

Based on the above points, the addition of the memory_category parameter with a default value seems to be correctly handled in the current codebase.
Scripts Executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Verify that the `handle_new_memory` method is called correctly with the new parameter.
ast-grep --lang python --pattern $'handle_new_memory($_, $_, $_, $_, $_)'
Length of output: 74

Script:
#!/bin/bash
# Search for calls to the `handle_new_memory` method to check if it's being called with the correct number of parameters.
ast-grep --lang python --pattern $'handle_new_memory($_, $_, $$$)'
Length of output: 67

Script:
#!/bin/bash
# Use ripgrep to search for the string "handle_new_memory" to find any calls to the method.
rg "handle_new_memory"
Length of output: 554
* 278-278: Adding a new field `memory_category` to the `MemoryModel` instantiation within the `handle_new_memory` method. Ensure that the database schema has been updated to include this new field and that it is properly indexed if necessary for performance.

Verification successful

I will generate a new script to verify the inclusion of the memory_category field in the MemoryModel class using ast-grep with the correct language specification and pattern syntax.

I will generate a new script to search for the memory_category field directly within the files that define the MemoryModel class using rg.

The verification process has confirmed the presence of the memory_category field within the MemoryModel class definition in the file cognitive_architecture/database/postgres/models/memory.py. This aligns with the changes mentioned in the pull request.

The memory_category field is a column in the MemoryModel with the type String and is nullable.

Based on this information, the database schema has indeed been updated to include the new memory_category field as required by the changes in the vectorstore_manager.py file.
Scripts Executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Verify that the database schema includes the new `memory_category` field.
rg --type python 'memory_category' --files-with-matches | xargs cat | ast-grep --lang python --pattern $'class MemoryModel { $$$ memory_category: str $$$ }'
Length of output: 531

Script:
#!/bin/bash
# Verify that the database schema includes the new `memory_category` field.
rg --type py 'class MemoryModel' --files-with-matches | xargs cat | ast-grep --lang python --pattern $'class MemoryModel { $$$ memory_category: str $$$ }'
Length of output: 500

Script:
#!/bin/bash
# Search for the 'memory_category' field within the 'MemoryModel' class definition.
rg --type py 'class MemoryModel' -A 10 | rg 'memory_category'
Length of output: 171
main.py (6)
1-40: The import section is generally clean, but it's important to ensure that all imported modules and functions are used within the file to avoid unnecessary imports which can lead to confusion and slight performance issues.

Verification successful

The executed scripts have checked the usage of Optional and List from the typing module, as well as Neo4jError from the neo4j.exceptions module within main.py. The results show that these imports are indeed used in main.py.

However, the script output also includes matches from other files, which are not relevant to the verification of main.py. For the purpose of this task, we are only concerned with the usage of imports in main.py.

Based on the provided output:

Optional and List are used in function signatures and type hints within main.py.

Neo4jError is used in exception handling within main.py.

Given this information, the imports of Optional, List, and Neo4jError are justified and correctly used in main.py.
Scripts Executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Verify that all imported modules are used in the file.
rg --type python --files-with-matches 'from typing import Optional, List' main.py | xargs -L1 rg 'Optional|List'
rg --type python --files-with-matches 'from neo4j.exceptions import Neo4jError' main.py | xargs -L1 rg 'Neo4jError'
# ... continue for all imports
Length of output: 21390
* 45-45:

71-71:

256-256:

278-278:

331-453: The function user_context_enrichment is complex and integrates various memory systems. Ensure that the logic is thoroughly tested, especially the dynamic memory class management and the exception handling.
cognitive_architecture/database/graph_database/graph.py (12)
8-9: The addition of AsyncSession and Neo4jError imports indicates the use of asynchronous database sessions and improved error handling. Ensure that the rest of the codebase is compatible with these changes.

121-122: The GraphQLQuery class is a straightforward model for encapsulating GraphQL queries. This is a good use of Pydantic models for type safety and validation.

194-212: The Neo4jGraphDB class now uses GraphDatabase.driver for database connections, which is a standard practice. The exception handling in the query method has been improved to log errors and re-raise them, which is good for debugging and control flow.

277-290: The cypher_statement_correcting method uses an AI client to check and correct Cypher query syntax. Ensure that the AI client is properly configured and that this functionality is covered by tests to verify its effectiveness.

439-445: The retrieve_public_memory method is well-defined for fetching public memory nodes. It's important to ensure that the method is used consistently across the codebase and that the necessary indexes are in place for performance.

736-772: The create_memory_node method is designed to create or find a memory node with specific labels and a description. It's crucial to validate the input parameters and handle potential database errors as done here.

790-803: The delete_memory_node method correctly uses a Cypher query to delete a memory node by ID. It's good practice to validate inputs and handle exceptions as implemented.

805-834: The unlink_memory_from_user method provides functionality to unlink a memory node from a user, which is a necessary feature for managing user data. The input validation and error handling are appropriate.

627-705: > Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [604-673]

The create_document_node_cypher method generates a Cypher query to create a Document node and link it to a memory node. Ensure that the method is used correctly and that the input parameters are validated before the query is executed.

Verification successful

The create_document_node_cypher method is used in main.py and is defined in graph.py. To verify its correct usage, we need to check if the method calls in main.py align with the updated method signature and intended functionality.

The method calls to create_document_node_cypher in main.py match the updated method signature in graph.py. The calls correctly pass the required parameters, including document_summary, user_id, and conditionally public_memory_id or memory_type based on the context.

The first call passes classification, user_id, and public_memory_id when memory_type is 'PUBLIC'.

The second call passes classification, user_id, and sets memory_type to 'SemanticMemory'.

The commented-out call at the bottom suggests a previous or alternative usage where memory_type is set to "PUBLIC".

Based on the provided context, the usage of create_document_node_cypher in main.py appears to be consistent with the method's signature and intended functionality.
Scripts Executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Verify that the `create_document_node_cypher` method is used correctly in the codebase.
fd '.*\.py' --exec grep -Hn 'create_document_node_cypher'
Length of output: 735

Script:
#!/bin/bash
# Check the usage of `create_document_node_cypher` in main.py and ensure it aligns with the updated method signature.
cat main.py | grep -A 5 -B 5 'create_document_node_cypher'
Length of output: 2126
* 675-704: The `update_document_node_with_db_ids` method updates the namespace of a Document node in the database. It's important to ensure that the document ID and namespace are properly sanitized before being included in the Cypher query to prevent injection attacks.

733-1010: > Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [705-734]

The get_namespaces_by_document_category method is a useful addition for retrieving Vectordb namespaces for documents of a specified category. Ensure that the method is used consistently and that the category parameter is validated.

736-854: The methods for managing memory nodes and their relationships, such as create_memory_node, link_user_to_public, delete_memory_node, and unlink_memory_from_user, are well-implemented with proper error handling and input validation.

coderabbitai · 2023-12-27T23:05:35Z

main.py

+
+async def fetch_document_vectordb_namespace(session: AsyncSession, user_id: str, namespace_id:str, doc_id:str=None):
+    logging.info("user id is", user_id)
+    memory = await Memory.create_memory(user_id, session, namespace=namespace_id, memory_label=namespace_id)
+
+
+    # Managing memory attributes
+    existing_user = await Memory.check_existing_user(user_id, session)
+    print("here is the existing user", existing_user)
+    await memory.manage_memory_attributes(existing_user)
+    print("Namespace id is %s", namespace_id)
+    await memory.add_dynamic_memory_class(namespace_id.lower(), namespace_id)
+    namespace_class = namespace_id + "_class"
+
+    dynamic_memory_class = getattr(memory, namespace_class.lower(), None)
+
+    methods_to_add = ["add_memories", "fetch_memories", "delete_memories"]
+
+    if dynamic_memory_class is not None:
+        for method_name in methods_to_add:
+            await memory.add_method_to_class(dynamic_memory_class, method_name)
+            print(f"Memory method {method_name} has been added")
+    else:
+        print(f"No attribute named  in memory.")
+
+    print("Available memory classes:", await memory.list_memory_classes())
+    result = await memory.dynamic_method_call(dynamic_memory_class, 'fetch_memories',
+                                                    observation="placeholder", search_type="summary_filter_by_object_name", params=doc_id)
+    logging.info("Result is", result)
+
+    return result, namespace_id


Consider replacing print statements with logging for better control over log levels and outputs, which is more suitable for production environments.

- print("here is the existing user", existing_user) + logging.debug("here is the existing user %s", existing_user) # ... apply this change to all print statements

Committable suggestion

❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

async def fetch_document_vectordb_namespace(session: AsyncSession, user_id: str, namespace_id:str, doc_id:str=None):

logging.info("user id is", user_id)

memory = await Memory.create_memory(user_id, session, namespace=namespace_id, memory_label=namespace_id)

# Managing memory attributes

existing_user = await Memory.check_existing_user(user_id, session)

print("here is the existing user", existing_user)

await memory.manage_memory_attributes(existing_user)

print("Namespace id is %s", namespace_id)

await memory.add_dynamic_memory_class(namespace_id.lower(), namespace_id)

namespace_class = namespace_id + "_class"

dynamic_memory_class = getattr(memory, namespace_class.lower(), None)

methods_to_add = ["add_memories", "fetch_memories", "delete_memories"]

if dynamic_memory_class is not None:

for method_name in methods_to_add:

await memory.add_method_to_class(dynamic_memory_class, method_name)

print(f"Memory method {method_name} has been added")

else:

print(f"No attribute named in memory.")

print("Available memory classes:", await memory.list_memory_classes())

result = await memory.dynamic_method_call(dynamic_memory_class, 'fetch_memories',

observation="placeholder", search_type="summary_filter_by_object_name", params=doc_id)

logging.info("Result is", result)

return result, namespace_id

async def fetch_document_vectordb_namespace(session: AsyncSession, user_id: str, namespace_id:str, doc_id:str=None):

logging.info("user id is", user_id)

memory = await Memory.create_memory(user_id, session, namespace=namespace_id, memory_label=namespace_id)

# Managing memory attributes

existing_user = await Memory.check_existing_user(user_id, session)

logging.debug("here is the existing user %s", existing_user)

await memory.manage_memory_attributes(existing_user)

logging.debug("Namespace id is %s", namespace_id)

await memory.add_dynamic_memory_class(namespace_id.lower(), namespace_id)

namespace_class = namespace_id + "_class"

dynamic_memory_class = getattr(memory, namespace_class.lower(), None)

methods_to_add = ["add_memories", "fetch_memories", "delete_memories"]

if dynamic_memory_class is not None:

for method_name in methods_to_add:

await memory.add_method_to_class(dynamic_memory_class, method_name)

logging.debug(f"Memory method {method_name} has been added")

else:

logging.debug(f"No attribute named in memory.")

logging.debug("Available memory classes: %s", await memory.list_memory_classes())

result = await memory.dynamic_method_call(dynamic_memory_class, 'fetch_memories',

observation="placeholder", search_type="summary_filter_by_object_name", params=doc_id)

logging.info("Result is %s", result)

return result, namespace_id

coderabbitai · 2023-12-27T23:05:35Z

api.py

+@app.post("/add-memory", response_model=dict)
+async def add_memory(
+    payload: Payload,
+    # files: List[UploadFile] = File(...),
+):
+    try:
+        logging.info(" Adding to Memory ")
+        decoded_payload = payload.payload
+        async with session_scope(session=AsyncSessionLocal()) as session:
+            from main import load_documents_to_vectorstore
+
+            if 'settings' in decoded_payload and decoded_payload['settings'] is not None:
+                settings_for_loader = decoded_payload['settings']
+            else:
+                settings_for_loader = None
+
+            if 'content' in decoded_payload and decoded_payload['content'] is not None:
+                content = decoded_payload['content']
+            else:
+                content = None
+
+            output = await load_documents_to_vectorstore(session, decoded_payload['user_id'], content=content, loader_settings=settings_for_loader)
+            return JSONResponse(content={"response": output}, status_code=200)


The /add-memory endpoint is well-structured. However, the error handling could be improved by providing more specific HTTP status codes and messages based on the exception type.

- return JSONResponse(content={"response": {"error": str(e)}}, status_code=503) + if isinstance(e, SomeSpecificException): + status_code = 400 # Bad Request or other appropriate status code + else: + status_code = 500 # Internal Server Error + return JSONResponse(content={"response": {"error": str(e)}}, status_code=status_code)

Committable suggestion

❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

@app.post("/add-memory", response_model=dict)

async def add_memory(

payload: Payload,

# files: List[UploadFile] = File(...),

):

try:

logging.info(" Adding to Memory ")

decoded_payload = payload.payload

async with session_scope(session=AsyncSessionLocal()) as session:

from main import load_documents_to_vectorstore

if 'settings' in decoded_payload and decoded_payload['settings'] is not None:

settings_for_loader = decoded_payload['settings']

else:

settings_for_loader = None

if 'content' in decoded_payload and decoded_payload['content'] is not None:

content = decoded_payload['content']

else:

content = None

output = await load_documents_to_vectorstore(session, decoded_payload['user_id'], content=content, loader_settings=settings_for_loader)

return JSONResponse(content={"response": output}, status_code=200)

@app.post("/add-memory", response_model=dict)

async def add_memory(

payload: Payload,

# files: List[UploadFile] = File(...),

):

try:

logging.info(" Adding to Memory ")

decoded_payload = payload.payload

async with session_scope(session=AsyncSessionLocal()) as session:

from main import load_documents_to_vectorstore

if 'settings' in decoded_payload and decoded_payload['settings'] is not None:

settings_for_loader = decoded_payload['settings']

else:

settings_for_loader = None

if 'content' in decoded_payload and decoded_payload['content'] is not None:

content = decoded_payload['content']

else:

content = None

output = await load_documents_to_vectorstore(session, decoded_payload['user_id'], content=content, loader_settings=settings_for_loader)

return JSONResponse(content={"response": output}, status_code=200)

except Exception as e:

if isinstance(e, SomeSpecificException):

status_code = 400 # Bad Request or other appropriate status code

else:

status_code = 500 # Internal Server Error

return JSONResponse(content={"response": {"error": str(e)}}, status_code=status_code)

coderabbitai · 2023-12-27T23:05:35Z

cognitive_architecture/database/vectordb/cognee_manager.py

+class CogneeManager:
+    def __init__(self, embeddings: Embeddings = None,
+                 vector_db: VectorDB = None,
+                 vector_db_key: str = None,
+                 embedding_api_key: str = None,
+                 webhook_url: str = None,
+                 lines_per_batch: int = 1000,
+                 webhook_key: str = None,
+                 document_id: str = None,
+                 chunk_validation_url: str = None,
+                 internal_api_key: str = "test123",
+                 base_url="http://localhost:8000"):
+        self.embeddings = embeddings if embeddings else Embeddings()
+        self.vector_db = vector_db if vector_db else VectorDB()
+        self.webhook_url = webhook_url
+        self.lines_per_batch = lines_per_batch
+        self.webhook_key = webhook_key
+        self.document_id = document_id
+        self.chunk_validation_url = chunk_validation_url
+        self.vector_db_key = vector_db_key
+        self.embeddings_api_key = embedding_api_key
+        self.internal_api_key = internal_api_key
+        self.base_url = base_url


The constructor of CogneeManager class initializes several instance variables. It's important to ensure that the default values provided, such as internal_api_key and base_url, are appropriate and secure. The use of a default API key ("test123") in production code is a security risk and should be replaced with a secure method of configuration.

- internal_api_key: str = "test123", + internal_api_key: str = None,

[SECURITY]

Committable suggestion

❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

class CogneeManager:

def __init__(self, embeddings: Embeddings = None,

vector_db: VectorDB = None,

vector_db_key: str = None,

embedding_api_key: str = None,

webhook_url: str = None,

lines_per_batch: int = 1000,

webhook_key: str = None,

document_id: str = None,

chunk_validation_url: str = None,

internal_api_key: str = "test123",

base_url="http://localhost:8000"):

self.embeddings = embeddings if embeddings else Embeddings()

self.vector_db = vector_db if vector_db else VectorDB()

self.webhook_url = webhook_url

self.lines_per_batch = lines_per_batch

self.webhook_key = webhook_key

self.document_id = document_id

self.chunk_validation_url = chunk_validation_url

self.vector_db_key = vector_db_key

self.embeddings_api_key = embedding_api_key

self.internal_api_key = internal_api_key

self.base_url = base_url

class CogneeManager:

def __init__(self, embeddings: Embeddings = None,

vector_db: VectorDB = None,

vector_db_key: str = None,

embedding_api_key: str = None,

webhook_url: str = None,

lines_per_batch: int = 1000,

webhook_key: str = None,

document_id: str = None,

chunk_validation_url: str = None,

internal_api_key: str = None,

base_url="http://localhost:8000"):

self.embeddings = embeddings if embeddings else Embeddings()

self.vector_db = vector_db if vector_db else VectorDB()

self.webhook_url = webhook_url

self.lines_per_batch = lines_per_batch

self.webhook_key = webhook_key

self.document_id = document_id

self.chunk_validation_url = chunk_validation_url

self.vector_db_key = vector_db_key

self.embeddings_api_key = embedding_api_key

self.internal_api_key = internal_api_key

self.base_url = base_url

coderabbitai · 2023-12-27T23:05:35Z

.env.template

+ENV = docker
 POSTGRES_USER = bla
 POSTGRES_PASSWORD = bla
 POSTGRES_DB = bubu
 POSTGRES_HOST = localhost
 POSTGRES_HOST_DOCKER = postgres
 SEGMENT_KEY = Etl4WJwzOkeDPAjaOXOMgyU16hO7mV7B
 COG_ARCH_DIR = cognitive_architecture
-    GRAPH_DB_URL =
-    GRAPH_DB_PW =
-    GRAPH_DB_USER =
+GRAPH_DB_URL =
+GRAPH_DB_PW =
+GRAPH_DB_USER =
+AWS_ACCESS_KEY_ID =
+AWS_SECRET_ACCESS_KEY =
+QDRANT_API_KEY
+QDRANT_API_URL


💡 NOTE
Codebase verification is a beta feature.

The search results indicate that the ENVIRONMENT variable is still being used in several places within the codebase:

entrypoint.sh

README.md (twice, in instructions)

iterations/level_3/create_database.py

iterations/level_3/entrypoint.sh

iterations/level_3/database/database.py

iterations/level_2/entrypoint.sh

Given this information, the renaming of ENVIRONMENT to ENV in the .env.template file is not consistently applied across the entire project. The references to ENVIRONMENT in the README.md are likely instructional and may need to be updated to reflect the changes in the .env.template file. The occurrences in the scripts and Python files suggest that the code still expects an ENVIRONMENT variable to be set.

Based on these findings, the following issues have been identified:

The ENVIRONMENT variable is still being used in multiple places, which could lead to inconsistencies or errors if the environment variable is expected to be ENV now.

The README.md file contains outdated instructions regarding the ENVIRONMENT variable.

Analysis chain

Ensure that the new AWS and QDRANT API keys are handled securely and are not exposed in the codebase or version control. Verify that the renaming of ENVIRONMENT to ENV is consistently applied across the entire project.

Scripts Executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Verify that ENV is used instead of ENVIRONMENT throughout the codebase rg --type env 'ENVIRONMENT' # Expecting no results for ENVIRONMENT, only ENV should be present

Length of output: 55

Script:

#!/bin/bash # Verify that ENV is used instead of ENVIRONMENT throughout the codebase # This time, search all text files without specifying a file type rg 'ENVIRONMENT' # Expecting no results for ENVIRONMENT, only ENV should be present

Length of output: 495

coderabbitai · 2023-12-27T23:05:36Z

cognitive_architecture/database/vectordb/cognee_manager.py

+    def upload(self, file_paths: list[str], base_url=None):
+        if base_url:
+            url = base_url + "/jobs"
+        else:
+            url = self.base_url + "/jobs"
+
+        data = self.serialize()
+        headers = self.generate_headers()
+        multipart_form_data = [('file', (os.path.basename(filepath), open(filepath, 'rb'), 'application/octet-stream'))
+                               for filepath in file_paths]
+
+        print(f"embedding {len(file_paths)} documents at {url}")
+        response = requests.post(url, files=multipart_form_data, headers=headers, stream=True, data=data)
+
+        if response.status_code == 500:
+            print(response.text)
+            return Response(error=response.text, status_code=response.status_code)
+
+        response_json = response.json()
+        if response.status_code >= 400 and response.status_code < 500:
+            print(f"Error: {response_json['error']}")
+
+        return Response.from_json(response_json, response.status_code)


The upload method opens files for reading without ensuring they are closed, which can lead to resource leaks. Use a context manager to ensure files are properly closed.

- multipart_form_data = [('file', (os.path.basename(filepath), open(filepath, 'rb'), 'application/octet-stream')) + multipart_form_data = [('file', (os.path.basename(filepath), open(filepath, 'rb'), 'application/octet-stream')) for filepath in file_paths] + with contextlib.ExitStack() as stack: + files = [stack.enter_context(open(filepath, 'rb')) for filepath in file_paths] + multipart_form_data = [('file', (os.path.basename(filepath), file, 'application/octet-stream')) for filepath, file in zip(file_paths, files)]

Committable suggestion

❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

def upload(self, file_paths: list[str], base_url=None):

if base_url:

url = base_url + "/jobs"

else:

url = self.base_url + "/jobs"

data = self.serialize()

headers = self.generate_headers()

multipart_form_data = [('file', (os.path.basename(filepath), open(filepath, 'rb'), 'application/octet-stream'))

for filepath in file_paths]

print(f"embedding {len(file_paths)} documents at {url}")

response = requests.post(url, files=multipart_form_data, headers=headers, stream=True, data=data)

if response.status_code == 500:

print(response.text)

return Response(error=response.text, status_code=response.status_code)

response_json = response.json()

if response.status_code >= 400 and response.status_code < 500:

print(f"Error: {response_json['error']}")

return Response.from_json(response_json, response.status_code)

def upload(self, file_paths: list[str], base_url=None):

if base_url:

url = base_url + "/jobs"

else:

url = self.base_url + "/jobs"

data = self.serialize()

headers = self.generate_headers()

with contextlib.ExitStack() as stack:

files = [stack.enter_context(open(filepath, 'rb')) for filepath in file_paths]

multipart_form_data = [('file', (os.path.basename(filepath), file, 'application/octet-stream')) for filepath, file in zip(file_paths, files)]

print(f"embedding {len(file_paths)} documents at {url}")

response = requests.post(url, files=multipart_form_data, headers=headers, stream=True, data=data)

if response.status_code == 500:

print(response.text)

return Response(error=response.text, status_code=response.status_code)

response_json = response.json()

if response.status_code >= 400 and response.status_code < 500:

print(f"Error: {response_json['error']}")

return Response.from_json(response_json, response.status_code)

coderabbitai · 2023-12-27T23:05:37Z

cognitive_architecture/database/vectordb/vectordb.py

+
+
    async def delete_memories(self, namespace:str, params: dict = None):
        if namespace is None:
            namespace = self.namespace


The delete_memories method should handle exceptions and provide appropriate logging or error messages.

+ try: + # existing deletion logic + except Exception as e: + logging.error(f"Failed to delete memories: {e}") + raise

Committable suggestion

❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

async def delete_memories(self, namespace:str, params: dict = None):

if namespace is None:

namespace = self.namespace

async def delete_memories(self, namespace:str, params: dict = None):

try:

if namespace is None:

namespace = self.namespace

# existing deletion logic

except Exception as e:

logging.error(f"Failed to delete memories: {e}")

raise

coderabbitai · 2023-12-27T23:05:37Z

main.py

+async def create_public_memory(user_id: str=None, labels:list=None, topic:str=None) -> Optional[int]:
+    """
+    Create a public memory node associated with a user in a Neo4j graph database.
+    If Public Memory exists, it will return the id of the memory.
+    This is intended as standalone node that can be attached to any user.
+    It is not attached to any user by default.
+
+    Args:
+        user_id (str): The unique identifier for the user.
+        session (AsyncSession): An asynchronous session for database operations.
+
+    Returns:
+        Optional[int]: The ID of the created public memory node or None if an error occurs.
+        :param labels: Label for the memory, to help filter for different countries
+        :param topic: Topic for the memory, to help provide a name
+
+    """
+    # Validate input parameters
+    if not labels:
+        labels = ['sr']  # Labels for the memory node
+
+    if not topic:
+        topic = "PublicMemory"
+
+
+    try:
+        neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url,
+                                      username=config.graph_database_username,
+                                      password=config.graph_database_password)
+
+        # Assuming the topic for public memory is predefined, e.g., "PublicMemory"
+        # Create the memory node
+        memory_id = await neo4j_graph_db.create_memory_node(labels=labels, topic=topic)
+        return memory_id
+    except Neo4jError as e:
+        logging.error(f"Error creating public memory node: {e}")
+        return None


The function create_public_memory returns None in case of a Neo4jError. Consider if this is the best error handling strategy for the context in which this function is used. It might be better to re-raise the exception or handle it differently.

coderabbitai · 2023-12-27T23:05:37Z

main.py

+        # CREATE (episodic)-[:HAS_EVENT]->(location5_5e43f4c45b3c44ea897c12220db4c051)
+        # CREATE (buffer)-[:CURRENTLY_HOLDING]->(location5_5e43f4c45b3c44ea897c12220db4c051)
+        # CREATE (episodic)-[:HAS_EVENT]->(object6_5cdb87ad488c450c9dbce07b7daf3d8d)
+        # CREATE (buffer)-[:CURRENTLY_HOLDING]->(object6_5cdb87ad488c450c9dbce07b7daf3d8d)
+        # CREATE (episodic)-[:HAS_EVENT]->(information7_f756e3f3720c4fe5aeb01287badaf088)
+        # CREATE (buffer)-[:CURRENTLY_HOLDING]->(information7_f756e3f3720c4fe5aeb01287badaf088)
+        # CREATE (episodic)-[:HAS_EVENT]->(event8_da6334e744454264900296319e14b532)
+        # CREATE (buffer)-[:CURRENTLY_HOLDING]->(event8_da6334e744454264900296319e14b532)
+        # CREATE (episodic)-[:HAS_EVENT]->(action9_48e45419604e4d66b3e718ee1d6c095f)
+        # CREATE (buffer)-[:CURRENTLY_HOLDING]->(action9_48e45419604e4d66b3e718ee1d6c095f)
+        # CREATE (episodic)-[:HAS_EVENT]->(action10_f48acb1db4da4934afbe17363e9e63a4)
+        # CREATE (buffer)-[:CURRENTLY_HOLDING]->(action10_f48acb1db4da4934afbe17363e9e63a4)"""
+        #
+        # out = cypher_statement_correcting(query)
+        # print(out)
+        #
+        # out = await user_query_to_graph_db(session, user_id, "I walked in the forest yesterday and added to my list I need to buy some milk in the store and get a summary from a classical book i read yesterday")
+        # print(out)
+        # load_doc_to_graph = await add_documents_to_graph_db(session, user_id)
+        # print(load_doc_to_graph)
+        # user_id = 'user'
+        # loader_settings = {
+        #     "format": "PDF",
+        #     "source": "DEVICE",
+        #     "path": [".data"]
+        # }
+        # await load_documents_to_vectorstore(session, user_id, loader_settings=loader_settings)
+        # await create_public_memory(user_id=user_id, labels=['sr'], topic="PublicMemory")
+        # await add_documents_to_graph_db(session, user_id)
+        #
+        neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url, username=config.graph_database_username,
+                                      password=config.graph_database_password)
+
+        # await attach_user_to_memory(user_id=user_id, labels=['sr'], topic="PublicMemory")
+
+        # return_ = await user_context_enrichment(user_id=user_id, query="Koja je minimalna visina ograde na balkonu na stambenom objektu", session=session)
+        # print(return_)
+
+        # document_summary = {
+        #     'DocumentCategory': 'Science',
+        #     'Title': 'The Future of AI',
+        #     'Summary': 'An insightful article about the advancements in AI.',
+        #     'd_id': 'doc123'
+        # }
+        #
+        # # Example user ID
+        # user_id = 'user'
+        #
+        # # value = await neo4j_graph_db.create_memory_node(labels=['sr'])
+        # # print(value)
+        # # neo4j_graph_db.close()
+        #
+        # await add_documents_to_graph_db(session, user_id)
+        # neo4j_graph_db.link_public_memory_to_user(memory_id = 17,user_id=user_id)
+        #
+        ids = neo4j_graph_db.retrieve_node_id_for_memory_type(topic="Document")
+        print(ids)
+
+        for id in ids:
+            print(id.get('memoryId'))
+
+            neo4j_graph_db.delete_memory_node(memory_id = id.get('memoryId'), topic="Document")
+        #
+        # neo4j_graph_db.delete_memory_node(memory_id=16, topic="PublicSerbianArchitecture")
+        # neo4j_graph_db.unlink_memory_from_user(memory_id = 17,user_id=user_id)
+        # cypher_query_public = neo4j_graph_db.create_document_node_cypher(document_summary, user_id, memory_type="PUBLIC")
+        # neo4j_graph_db.query(cypher_query_public)
+        # link_memory_to_user(user_id, session)
+
+        # neo4j_graph_db.create_memory_node(labels=['sr'])
+        # out = await get_vectordb_namespace(session, user_id)
+        # params = {
+        #     "version": "1.0",
+        #     "agreement_id": "AG123456",
+        #     "privacy_policy": "https://example.com/privacy",
+        #     "terms_of_service": "https://example.com/terms",
+        #     "format": "json",
+        #     "schema_version": "1.1",
+        #     "checksum": "a1b2c3d4e5f6",
+        #     "owner": "John Doe",
+        #     "license": "MIT",
+        #     "validity_start": "2023-08-01",
+        #     "validity_end": "2024-07-31",
+        # }
+        # loader_settings = {
+        #     "format": "PDF",
+        #     "source": "DEVICE",
+        #     "path": [".data"],
+        #     "strategy": "SUMMARY",
+        # }
+        # await load_documents_to_vectorstore(session, user_id, loader_settings=loader_settings)
+        # await user_query_to_graph_db(session, user_id, "I walked in the forest yesterday and added to my list I need to buy some milk in the store and get a summary from a classical book i read yesterday")
+        # await add_documents_to_graph_db(session, user_id, loader_settings=loader_settings)
+        # await user_context_enrichment(session, user_id, query="Tell me about the book I read yesterday")
+
+
+if __name__ == "__main__":
+    import asyncio
+
+    asyncio.run(main())


The main function is commented out. If this is intentional and the code is not needed, consider removing it to avoid confusion. If it's meant to be part of the application, uncomment and ensure it's being used correctly.

coderabbitai · 2023-12-27T23:05:37Z

cognitive_architecture/shared/language_processing.py

+def detect_language(text):
+    """
+    Detect the language of the given text and return its ISO 639-1 language code.
+    If the detected language is Croatian ('hr'), it maps to Serbian ('sr').
+    The text is trimmed to the first 100 characters for efficient processing.
+
+    Parameters:
+    text (str): The text for language detection.
+
+    Returns:
+    str: The ISO 639-1 language code of the detected language, or 'None' in case of an error.
+    """
+
+    # Trim the text to the first 100 characters
+    trimmed_text = text[:100]
+
+    try:
+        # Detect the language using langdetect
+        detected_lang_iso639_1 = detect(trimmed_text)
+        logging.info(f"Detected ISO 639-1 code: {detected_lang_iso639_1}")
+
+        # Special case: map 'hr' (Croatian) to 'sr' (Serbian ISO 639-2)
+        if detected_lang_iso639_1 == 'hr':
+            return 'sr'
+        return detected_lang_iso639_1
+
+    except LangDetectException as e:
+        logging.error(f"Language detection error: {e}")
+    except Exception as e:
+        logging.error(f"Unexpected error: {e}")
+
+    return -1


The detect_language function has an error path that returns -1, which is inconsistent with the documented return type of a string. Consider returning None or an empty string instead to maintain consistency with the return type.

- return -1 + return None # or return an empty string if that's more appropriate

Committable suggestion

❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

def detect_language(text):

"""

Detect the language of the given text and return its ISO 639-1 language code.

If the detected language is Croatian ('hr'), it maps to Serbian ('sr').

The text is trimmed to the first 100 characters for efficient processing.

Parameters:

text (str): The text for language detection.

Returns:

str: The ISO 639-1 language code of the detected language, or 'None' in case of an error.

"""

# Trim the text to the first 100 characters

trimmed_text = text[:100]

try:

# Detect the language using langdetect

detected_lang_iso639_1 = detect(trimmed_text)

logging.info(f"Detected ISO 639-1 code: {detected_lang_iso639_1}")

# Special case: map 'hr' (Croatian) to 'sr' (Serbian ISO 639-2)

if detected_lang_iso639_1 == 'hr':

return 'sr'

return detected_lang_iso639_1

except LangDetectException as e:

logging.error(f"Language detection error: {e}")

except Exception as e:

logging.error(f"Unexpected error: {e}")

return -1

def detect_language(text):

"""

Detect the language of the given text and return its ISO 639-1 language code.

If the detected language is Croatian ('hr'), it maps to Serbian ('sr').

The text is trimmed to the first 100 characters for efficient processing.

Parameters:

text (str): The text for language detection.

Returns:

str: The ISO 639-1 language code of the detected language, or 'None' in case of an error.

"""

# Trim the text to the first 100 characters

trimmed_text = text[:100]

try:

# Detect the language using langdetect

detected_lang_iso639_1 = detect(trimmed_text)

logging.info(f"Detected ISO 639-1 code: {detected_lang_iso639_1}")

# Special case: map 'hr' (Croatian) to 'sr' (Serbian ISO 639-2)

if detected_lang_iso639_1 == 'hr':

return 'sr'

return detected_lang_iso639_1

except LangDetectException as e:

logging.error(f"Language detection error: {e}")

except Exception as e:

logging.error(f"Unexpected error: {e}")

return None # or return an empty string if that's more appropriate

coderabbitai · 2023-12-27T23:05:38Z

cognitive_architecture/database/vectordb/chunkers/chunkers.py

+    # try:
+    #     pages = text_splitter.create_documents([source_data])
+    # except:
+    # try:
+    pages = text_splitter.create_documents([source_data])
+    # except:
+    #     pages = text_splitter.create_documents(source_data.content)
    # pages = source_data.load_and_split()
    return pages


The commented-out exception handling block in the vanilla_chunker function should be removed or updated to reflect the current exception handling strategy.

- # try: - # pages = text_splitter.create_documents([source_data]) - # except: - # try: - # pages = text_splitter.create_documents(source_data.content) - # except: - # pages = text_splitter.create_documents(source_data.content)

Committable suggestion

❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

# try:

# pages = text_splitter.create_documents([source_data])

# except:

# try:

pages = text_splitter.create_documents([source_data])

# except:

# pages = text_splitter.create_documents(source_data.content)

# pages = source_data.load_and_split()

return pages

pages = text_splitter.create_documents([source_data])

return pages

…refactor and reformat issues, add boilerplate for loading jobs logic and extend search functionalities.

coderabbitai

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 2928f51 and be3f179.

Files selected for processing (1)

Dockerfile (1 hunks)

Files skipped from review as they are similar to previous changes (1)

Dockerfile

…refactor and reformat issues, add boilerplate for loading jobs logic and extend search functionalities.

coderabbitai

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between be3f179 and 3632a85.

Files ignored due to filter (2)

poetry.lock
pyproject.toml

Files selected for processing (1)

entrypoint.sh (1 hunks)

Files skipped from review due to trivial changes (1)

entrypoint.sh

coderabbitai

Review Status

Actionable comments generated: 22

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 7afe9a7 and 0323ab6.

Files selected for processing (5)

cognitive_architecture/classifiers/classifier.py (1 hunks)
cognitive_architecture/database/create_database_tst.py (1 hunks)
cognitive_architecture/database/graph_database/graph.py (14 hunks)
cognitive_architecture/database/vectordb/basevectordb.py (1 hunks)
cognitive_architecture/database/vectordb/vectordb.py (6 hunks)

Additional comments: 8

cognitive_architecture/classifiers/classifier.py (1)
76-109: The classify_call function has been modified to accept document_summaries instead of query, context, and document_types. The JSON structure and function invocation have been updated accordingly. The return value has been changed from the classifier value to the classifier ID. Ensure that all calls to this function throughout the codebase have been updated to match the new signature and that the change in return value is handled appropriately.

Verification successful

Based on the information gathered from the executed shell scripts, the classify_call function has been updated in cognitive_architecture/classifiers/classifier.py and is correctly used with the new signature in main.py. There are no other occurrences of the function in the codebase, which indicates that the changes have been properly reflected where necessary. The old function signature does not appear to be used anymore.
Scripts Executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Search to find files that import classifier module. Expecting list of dependent files.
rg -C3 --type python $'from cognitive_architecture.classifiers import classifier'
# Search old signature to find omissions. Expecting no results.
ast-grep --lang python --pattern $'classify_call($_, $_, $_)'
# Search new signature to validate usage. Expecting list of locations where the new signature is used.
ast-grep --lang python --pattern $'classify_call($_, $_)'
Length of output: 342

Script:
#!/bin/bash
# Correct the search to find files that import classifier module without specifying the file type.
rg -C3 'from cognitive_architecture.classifiers import classifier'
# Search for any other usages of the new `classify_call` function signature.
ast-grep --lang python --pattern $'classify_call($_, $_)'
# Search for any usages of the old `classify_call` function signature to ensure it's not used anymore.
ast-grep --lang python --pattern $'classify_call($_, $_, $_)'
Length of output: 297

Script:
#!/bin/bash
# Search for any file that might be using the `classify_call` function, regardless of the import statement.
rg 'classify_call' --type py
Length of output: 310
cognitive_architecture/database/vectordb/basevectordb.py (1)

281-281: The change in the hunk modifies the logging statement for the search type by adding str() around the search_type variable. This is a good practice to ensure that the variable is converted to a string before being concatenated in the logging message.

cognitive_architecture/database/vectordb/vectordb.py (3)

63-63: The init_weaviate method has been updated to include a retriever_type parameter. Ensure that this change is reflected wherever the method is called and that the new parameter is used correctly.

152-179: The add_memories method has been updated to handle document loading and chunking. Ensure that the changes are compatible with the expected document structure and that the chunk_data function is implemented correctly.

252-316: > Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [183-313]

The fetch_memories method has been modified to include different search types. Verify that the new search types are implemented correctly and that the method's callers are updated to handle the new functionality.

cognitive_architecture/database/graph_database/graph.py (3)

8-9: New imports from neo4j have been added, which suggests that the application is interacting with a Neo4j database. Ensure that these new dependencies are included in the project's dependency management files (e.g., requirements.txt or Pipfile).

361-364: The commented-out code for corrected_cypher_statements is related to a previous comment about uncommenting or removing the code. Since the comment is still valid, we can skip generating a new comment.

805-834: The method unlink_memory_from_user is correctly using parameterized queries, which is a good practice for security. However, ensure that the memory_id is validated as an integer before using it in the query to prevent type-related issues.

coderabbitai · 2024-01-10T23:04:35Z

cognitive_architecture/database/create_database_tst.py

+# this is needed to import classes from other modules
+# script_dir = os.path.dirname(os.path.abspath(__file__))
+# # Get the parent directory of your script and add it to sys.path
+# parent_dir = os.path.dirname(script_dir)
+# sys.path.append(parent_dir)


The commented-out code for setting the script directory and appending the parent directory to sys.path is still present. As previously mentioned, if this code is no longer needed, it should be removed to avoid confusion.

- # script_dir = os.path.dirname(os.path.abspath(__file__)) - # # Get the parent directory of your script and add it to sys.path - # parent_dir = os.path.dirname(script_dir) - # sys.path.append(parent_dir)

Committable suggestion

❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

# this is needed to import classes from other modules

# script_dir = os.path.dirname(os.path.abspath(__file__))

# # Get the parent directory of your script and add it to sys.path

# parent_dir = os.path.dirname(script_dir)

# sys.path.append(parent_dir)

# this is needed to import classes from other modules

coderabbitai · 2024-01-10T23:04:36Z

cognitive_architecture/database/create_database_tst.py

+# from cognitive_architecture.config import Config
+# config = Config()
+# config.load()


The commented-out code for loading configuration using a Config object is still present. As previously mentioned, if the new pattern is to use environment variables directly, this commented-out code should be removed to prevent confusion.

- # from cognitive_architecture.config import Config - # config = Config() - # config.load()

Committable suggestion

❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

# from cognitive_architecture.config import Config

# config = Config()

# config.load()

coderabbitai · 2024-01-10T23:04:36Z

cognitive_architecture/database/create_database_tst.py

+if os.environ.get('AWS_ENV') == 'prd' or os.environ.get('AWS_ENV') == 'dev':
+    host = os.environ.get('POSTGRES_HOST')
+    username = os.environ.get('POSTGRES_USER')
+    password = os.environ.get('POSTGRES_PASSWORD')
+    database_name = os.environ.get('POSTGRES_DB')
+elif os.environ.get('AWS_ENV') == 'local':
+    host = os.environ.get('POSTGRES_HOST')
+    username = os.environ.get('POSTGRES_USER')
+    password = os.environ.get('POSTGRES_PASSWORD')
+    database_name = os.environ.get('POSTGRES_DB')
+else:
+    host = os.environ.get('POSTGRES_HOST')
+    username = os.environ.get('POSTGRES_USER')
+    password = os.environ.get('POSTGRES_PASSWORD')
+    database_name = os.environ.get('POSTGRES_DB')


The code for setting database connection parameters based on the AWS_ENV environment variable is still redundant across the prd, dev, and local cases. This can be simplified since the assignments are identical in each case.

- if os.environ.get('AWS_ENV') == 'prd' or os.environ.get('AWS_ENV') == 'dev': - host = os.environ.get('POSTGRES_HOST') - username = os.environ.get('POSTGRES_USER') - password = os.environ.get('POSTGRES_PASSWORD') - database_name = os.environ.get('POSTGRES_DB') - elif os.environ.get('AWS_ENV') == 'local': - host = os.environ.get('POSTGRES_HOST') - username = os.environ.get('POSTGRES_USER') - password = os.environ.get('POSTGRES_PASSWORD') - database_name = os.environ.get('POSTGRES_DB') - else: - host = os.environ.get('POSTGRES_HOST') - username = os.environ.get('POSTGRES_USER') - password = os.environ.get('POSTGRES_PASSWORD') - database_name = os.environ.get('POSTGRES_DB') + host = os.environ.get('POSTGRES_HOST') + username = os.environ.get('POSTGRES_USER') + password = os.environ.get('POSTGRES_PASSWORD') + database_name = os.environ.get('POSTGRES_DB')

Committable suggestion

❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

if os.environ.get('AWS_ENV') == 'prd' or os.environ.get('AWS_ENV') == 'dev':

host = os.environ.get('POSTGRES_HOST')

username = os.environ.get('POSTGRES_USER')

password = os.environ.get('POSTGRES_PASSWORD')

database_name = os.environ.get('POSTGRES_DB')

elif os.environ.get('AWS_ENV') == 'local':

host = os.environ.get('POSTGRES_HOST')

username = os.environ.get('POSTGRES_USER')

password = os.environ.get('POSTGRES_PASSWORD')

database_name = os.environ.get('POSTGRES_DB')

else:

host = os.environ.get('POSTGRES_HOST')

username = os.environ.get('POSTGRES_USER')

password = os.environ.get('POSTGRES_PASSWORD')

database_name = os.environ.get('POSTGRES_DB')

host = os.environ.get('POSTGRES_HOST')

username = os.environ.get('POSTGRES_USER')

password = os.environ.get('POSTGRES_PASSWORD')

database_name = os.environ.get('POSTGRES_DB')

coderabbitai · 2024-01-10T23:04:36Z

cognitive_architecture/database/create_database_tst.py

+def database_exists(username, password, host, db_name):
+    engine = create_admin_engine(username, password, host, db_name)
+    connection = engine.connect()
+    query = text(f"SELECT 1 FROM pg_database WHERE datname='{db_name}'")
+    result = connection.execute(query).fetchone()
+    connection.close()
+    engine.dispose()
+    return result is not None


The database_exists function constructs a raw SQL query using string formatting, which could be vulnerable to SQL injection if the db_name variable is not properly sanitized. As previously mentioned, use query parameters to mitigate this risk.

- query = text(f"SELECT 1 FROM pg_database WHERE datname='{db_name}'") + query = text("SELECT 1 FROM pg_database WHERE datname=:db_name") + result = connection.execute(query, {'db_name': db_name}).fetchone()

Committable suggestion

❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

def database_exists(username, password, host, db_name):

engine = create_admin_engine(username, password, host, db_name)

connection = engine.connect()

query = text(f"SELECT 1 FROM pg_database WHERE datname='{db_name}'")

result = connection.execute(query).fetchone()

connection.close()

engine.dispose()

return result is not None

def database_exists(username, password, host, db_name):

engine = create_admin_engine(username, password, host, db_name)

connection = engine.connect()

query = text("SELECT 1 FROM pg_database WHERE datname=:db_name")

result = connection.execute(query, {'db_name': db_name}).fetchone()

connection.close()

engine.dispose()

return result is not None

coderabbitai · 2024-01-10T23:04:36Z

cognitive_architecture/database/create_database_tst.py

+def create_database(username, password, host, db_name):
+    engine = create_admin_engine(username, password, host, db_name)
+    connection = engine.raw_connection()
+    connection.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT)
+    cursor = connection.cursor()
+    cursor.execute(f"CREATE DATABASE {db_name}")
+    cursor.close()
+    connection.close()
+    engine.dispose()


The create_database function uses string formatting to construct a SQL command, which could be vulnerable to SQL injection. As previously mentioned, consider using safer methods to execute SQL commands.

- cursor.execute(f"CREATE DATABASE {db_name}") + cursor.execute("CREATE DATABASE %s", (db_name,))

Committable suggestion

❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

def create_database(username, password, host, db_name):

engine = create_admin_engine(username, password, host, db_name)

connection = engine.raw_connection()

connection.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT)

cursor = connection.cursor()

cursor.execute(f"CREATE DATABASE {db_name}")

cursor.close()

connection.close()

engine.dispose()

def create_database(username, password, host, db_name):

engine = create_admin_engine(username, password, host, db_name)

connection = engine.raw_connection()

connection.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT)

cursor = connection.cursor()

cursor.execute("CREATE DATABASE %s", (db_name,))

cursor.close()

connection.close()

engine.dispose()

coderabbitai · 2024-01-10T23:04:37Z

cognitive_architecture/database/graph_database/graph.py

+        # self.graph = Neo4jGraph(url=url, username=username, password=password)
+        from neo4j import GraphDatabase
+        self.driver = GraphDatabase.driver(url, auth=(username, password))


The commented-out line # self.graph = Neo4jGraph(url=url, username=username, password=password) suggests that there was a previous implementation that has been replaced. Ensure that this commented-out code is no longer needed and remove it to avoid confusion.

coderabbitai · 2024-01-10T23:04:37Z

cognitive_architecture/database/graph_database/graph.py

    def delete_all_user_memories(self, user_id):
        try:
            # Check if the user exists
-            user_exists = self.graph.query(f"MATCH (user:User {{userId: '{user_id}'}}) RETURN user")
+            user_exists = self.query(f"MATCH (user:User {{userId: '{user_id}'}}) RETURN user")
            if not user_exists:
                return f"No user found with ID: {user_id}"



Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [375-389]

The method delete_all_user_memories uses string formatting to inject the user_id directly into the Cypher query, which could lead to Cypher injection if the user_id is not properly sanitized. Use parameterization to prevent potential security issues.

- user_exists = self.query(f"MATCH (user:User {{userId: '{user_id}'}}) RETURN user") + user_exists = self.query("MATCH (user:User {userId: $user_id}) RETURN user", params={"user_id": user_id})

coderabbitai · 2024-01-10T23:04:37Z

cognitive_architecture/database/graph_database/graph.py

            MATCH (user)-[:HAS_BUFFER]->(buffer)
            DETACH DELETE semantic, episodic, buffer
            """
-            self.graph.query(delete_query)
+            self.query(delete_query)
            return f"All memories deleted for user ID: {user_id}"
        except Exception as e:
            return f"An error occurred: {str(e)}"

    def delete_specific_memory_type(self, user_id, memory_type):
        try:
            # Check if the user exists
-            user_exists = self.graph.query(f"MATCH (user:User {{userId: '{user_id}'}}) RETURN user")
+            user_exists = self.query(f"MATCH (user:User {{userId: '{user_id}'}}) RETURN user")
            if not user_exists:
                return f"No user found with ID: {user_id}"



Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [397-410]

Similar to the previous comment, the delete_specific_memory_type method should use parameterized queries to prevent Cypher injection.

- user_exists = self.query(f"MATCH (user:User {{userId: '{user_id}'}}) RETURN user") + user_exists = self.query("MATCH (user:User {userId: $user_id}) RETURN user", params={"user_id": user_id})

coderabbitai · 2024-01-10T23:04:37Z

cognitive_architecture/database/graph_database/graph.py

+    def update_document_node_with_db_ids(self, vectordb_namespace: str, document_id: str, user_id: str = None):
+        """
+        Update the namespace of a Document node in the database. The document can be linked
+        either to a SemanticMemory node (if a user ID is provided) or to a PublicMemory node.

+        Parameters:
+        - vectordb_namespace (str): The namespace to set for the vectordb.
+        - document_id (str): The unique identifier of the document.
+        - user_id (str, optional): The unique identifier for the user. Default is None.
+
+        Returns:
+        - str: A Cypher query string to perform the update.
+        """
+
+        if user_id:
+            # Update for a document linked to a SemanticMemory node
+            cypher_query = f'''
+            MATCH (user:User {{userId: '{user_id}' }})-[:HAS_SEMANTIC_MEMORY]->(:SemanticMemory)-[:HAS_DOCUMENT]->(document:Document {{d_id: '{document_id}'}})
+            SET document.vectordbNamespace = '{vectordb_namespace}'
+            RETURN document
+            '''
+        else:
+            # Update for a document linked to a PublicMemory node
+            cypher_query = f'''
+            MATCH (:PublicMemory)-[:HAS_DOCUMENT]->(document:Document {{d_id: '{document_id}'}})
+            SET document.vectordbNamespace = '{vectordb_namespace}'
+            RETURN document
+            '''


The method update_document_node_with_db_ids has a conditional that checks if user_id is provided, but the method signature does not mark user_id as optional, which could lead to a TypeError if user_id is not passed. The method signature should be updated to reflect that user_id is optional.

- def update_document_node_with_db_ids(self, vectordb_namespace: str, document_id: str, user_id: str = None): + def update_document_node_with_db_ids(self, vectordb_namespace: str, document_id: str, user_id: Optional[str] = None):

Committable suggestion

❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

def update_document_node_with_db_ids(self, vectordb_namespace: str, document_id: str, user_id: str = None):

"""

Update the namespace of a Document node in the database. The document can be linked

either to a SemanticMemory node (if a user ID is provided) or to a PublicMemory node.

Parameters:

- vectordb_namespace (str): The namespace to set for the vectordb.

- document_id (str): The unique identifier of the document.

- user_id (str, optional): The unique identifier for the user. Default is None.

Returns:

- str: A Cypher query string to perform the update.

"""

if user_id:

# Update for a document linked to a SemanticMemory node

cypher_query = f'''

MATCH (user:User {{userId: '{user_id}' }})-[:HAS_SEMANTIC_MEMORY]->(:SemanticMemory)-[:HAS_DOCUMENT]->(document:Document {{d_id: '{document_id}'}})

SET document.vectordbNamespace = '{vectordb_namespace}'

RETURN document

'''

else:

# Update for a document linked to a PublicMemory node

cypher_query = f'''

MATCH (:PublicMemory)-[:HAS_DOCUMENT]->(document:Document {{d_id: '{document_id}'}})

SET document.vectordbNamespace = '{vectordb_namespace}'

RETURN document

'''

def update_document_node_with_db_ids(self, vectordb_namespace: str, document_id: str, user_id: Optional[str] = None):

"""

Update the namespace of a Document node in the database. The document can be linked

either to a SemanticMemory node (if a user ID is provided) or to a PublicMemory node.

Parameters:

- vectordb_namespace (str): The namespace to set for the vectordb.

- document_id (str): The unique identifier of the document.

- user_id (str, optional): The unique identifier for the user. Default is None.

Returns:

- str: A Cypher query string to perform the update.

"""

if user_id:

# Update for a document linked to a SemanticMemory node

cypher_query = f'''

MATCH (user:User {{userId: '{user_id}' }})-[:HAS_SEMANTIC_MEMORY]->(:SemanticMemory)-[:HAS_DOCUMENT]->(document:Document {{d_id: '{document_id}'}})

SET document.vectordbNamespace = '{vectordb_namespace}'

RETURN document

'''

else:

# Update for a document linked to a PublicMemory node

cypher_query = f'''

MATCH (:PublicMemory)-[:HAS_DOCUMENT]->(document:Document {{d_id: '{document_id}'}})

SET document.vectordbNamespace = '{vectordb_namespace}'

RETURN document

'''

coderabbitai · 2024-01-10T23:04:38Z

cognitive_architecture/database/vectordb/vectordb.py

+
+

The delete_memories method allows for deletion of memories from the vector database. As previously mentioned, ensure that proper authorization checks are in place to prevent unauthorized deletions and that exceptions are handled appropriately.

+ # TODO: Add authorization checks before allowing deletion of memories + try: + # existing deletion logic + except Exception as e: + logging.error(f"Failed to delete memories: {e}") + raise

Committable suggestion

❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

# TODO: Add authorization checks before allowing deletion of memories

try:

# existing deletion logic

except Exception as e:

logging.error(f"Failed to delete memories: {e}")

raise

coderabbitai

Review Status

Actionable comments generated: 9

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 0323ab6 and 50e4d7c.

Files selected for processing (1)

main.py (1 hunks)

Additional comments: 1

main.py (1)

26-26: The instantiation of aclient using instructor.patch(OpenAI()) is not explained. It's unclear what instructor.patch does. If it's a custom wrapper or patch for the OpenAI client, ensure that it's functioning as expected and that the OpenAI client is properly configured.

coderabbitai · 2024-01-10T23:23:48Z

main.py

+async def fetch_document_vectordb_namespace(session: AsyncSession, user_id: str, namespace_id:str, doc_id:str=None):
+    logging.info("user id is", user_id)
+    memory = await Memory.create_memory(user_id, session, namespace=namespace_id, memory_label=namespace_id)
+
+
+    # Managing memory attributes
+    existing_user = await Memory.check_existing_user(user_id, session)
+    print("here is the existing user", existing_user)
+    await memory.manage_memory_attributes(existing_user)
+    print("Namespace id is %s", namespace_id)
+    await memory.add_dynamic_memory_class(namespace_id.lower(), namespace_id)
+    namespace_class = namespace_id + "_class"
+
+    dynamic_memory_class = getattr(memory, namespace_class.lower(), None)
+
+    methods_to_add = ["add_memories", "fetch_memories", "delete_memories"]
+
+    if dynamic_memory_class is not None:
+        for method_name in methods_to_add:
+            await memory.add_method_to_class(dynamic_memory_class, method_name)
+            print(f"Memory method {method_name} has been added")
+    else:
+        print(f"No attribute named  in memory.")
+
+    print("Available memory classes:", await memory.list_memory_classes())
+    result = await memory.dynamic_method_call(dynamic_memory_class, 'fetch_memories',
+                                                    observation="placeholder", search_type="summary_filter_by_object_name", params=doc_id)
+    logging.info("Result is", result)
+
+    return result, namespace_id


The function fetch_document_vectordb_namespace has several print statements which are generally not recommended for production code as they can clutter logs and are not easily manageable. Consider using a logging framework with appropriate log levels.

coderabbitai · 2024-01-10T23:23:48Z

main.py

+async def load_documents_to_vectorstore(session: AsyncSession, user_id: str, content:str=None, job_id:str=None, loader_settings:dict=None, memory_type:str="PRIVATE"):
+    namespace_id = str(generate_letter_uuid()) + "_" + "SEMANTICMEMORY"
+    namespace_class = namespace_id + "_class"
+
+    logging.info("Namespace created with id %s", namespace_id)
+    try:
+        new_user = User(id=user_id)
+        await add_entity(session, new_user)
+    except:
+        pass
+    if job_id is None:
+        job_id = str(uuid.uuid4())
+
+    await add_entity(
+        session,
+        Operation(
+            id=job_id,
+            user_id=user_id,
+            operation_status="RUNNING",
+            operation_type="DATA_LOAD",
+        ),
+    )
+    memory = await Memory.create_memory(user_id, session, namespace=namespace_id, job_id=job_id, memory_label=namespace_id)
+    if content is not None:
+        document_names = [content[:30]]
+    if loader_settings is not None:
+        document_source = loader_settings.get("document_names") if isinstance(loader_settings.get("document_names"),
+                                                                    list) else loader_settings.get("path", "None")
+        logging.info("Document source is %s", document_source)
+        # try:
+        document_names = get_document_names(document_source[0])
+        logging.info(str(document_names))
+        # except:
+        #     document_names = document_source
+    for doc in document_names:
+        from cognitive_architecture.shared.language_processing import translate_text, detect_language
+        #translates doc titles to english
+        if loader_settings is not None:
+            logging.info("Detecting language of document %s", doc)
+            loader_settings["single_document_path"]= loader_settings.get("path", "None")[0] +"/"+doc
+            logging.info("Document path is %s", loader_settings.get("single_document_path", "None"))
+            memory_category = loader_settings.get("memory_category", "PUBLIC")
+        if loader_settings is None:
+            memory_category = "CUSTOM"
+        if detect_language(doc) != "en":
+            doc_ = doc.strip(".pdf").replace("-", " ")
+            doc_ = translate_text(doc_, "sr", "en")
+        else:
+            doc_=doc
+        doc_id = str(uuid.uuid4())
+
+        logging.info("Document name is %s", doc_)
+        await add_entity(
+            session,
+            DocsModel(
+                id=doc_id,
+                operation_id=job_id,
+                graph_summary= False,
+                memory_category= memory_category,
+                doc_name=doc_
+            )
+        )
+        # Managing memory attributes
+        existing_user = await Memory.check_existing_user(user_id, session)
+        await memory.manage_memory_attributes(existing_user)
+        params = {
+            "doc_id":doc_id
+        }
+        print("Namespace id is %s", namespace_id)
+        await memory.add_dynamic_memory_class(namespace_id.lower(), namespace_id)
+
+        dynamic_memory_class = getattr(memory, namespace_class.lower(), None)
+
+        methods_to_add = ["add_memories", "fetch_memories", "delete_memories"]
+
+        if dynamic_memory_class is not None:
+            for method_name in methods_to_add:
+                await memory.add_method_to_class(dynamic_memory_class, method_name)
+                print(f"Memory method {method_name} has been added")
+        else:
+            print(f"No attribute named  in memory.")
+
+        print("Available memory classes:", await memory.list_memory_classes())
+        result = await memory.dynamic_method_call(dynamic_memory_class, 'add_memories',
+                                                        observation=content, params=params, loader_settings=loader_settings)
+        await update_entity(session, Operation, job_id, "SUCCESS")
+


The function load_documents_to_vectorstore is complex and does a lot of work. It could benefit from being broken down into smaller, more focused functions for better maintainability and readability. Additionally, there are commented-out try-except blocks and print statements that should be cleaned up or replaced with proper logging.

coderabbitai · 2024-01-10T23:23:48Z

main.py

+async def user_query_to_graph_db(session: AsyncSession, user_id: str, query_input: str):
+
+    try:
+        new_user = User(id=user_id)
+        await add_entity(session, new_user)
+    except:
+        pass
+
+    job_id = str(uuid.uuid4())
+
+    await add_entity(
+        session,
+        Operation(
+            id=job_id,
+            user_id=user_id,
+            operation_status="RUNNING",
+            operation_type="USER_QUERY_TO_GRAPH_DB",
+        ),
+    )
+
+    neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url, username=config.graph_database_username, password=config.graph_database_password)
+    cypher_query = await neo4j_graph_db.generate_cypher_query_for_user_prompt_decomposition(user_id,query_input)
+    result = neo4j_graph_db.query(cypher_query)
+    neo4j_graph_db.close()
+
+    await update_entity(session, Operation, job_id, "SUCCESS")
+
+    return result


The function user_query_to_graph_db seems to be handling database operations without any error handling. It's important to include error handling to manage any exceptions that might occur during the database operations.

coderabbitai · 2024-01-10T23:23:48Z

main.py

+async def add_documents_to_graph_db(session: AsyncSession, user_id: str= None, document_memory_types:list=None):
+    """"""
+    if document_memory_types is None:
+        document_memory_types = ['PUBLIC']
+
+    logging.info("Document memory types are", document_memory_types)
+    try:
+        # await update_document_vectordb_namespace(postgres_session, user_id)
+        memory_details, docs = await get_unsumarized_vector_db_namespace(session, user_id)
+
+        logging.info("Docs are", docs)
+        memory_details= [detail for detail in memory_details if detail[1] in document_memory_types]
+        logging.info("Memory details", memory_details)
+        for doc in docs:
+            logging.info("Memory names are", memory_details)
+            doc_name, doc_id = doc
+            logging.info("Doc id is", doc_id)
+            try:
+                classification_content = await fetch_document_vectordb_namespace(session, user_id, memory_details[0][0], doc_id)
+                retrieval_chunks = [item['text'] for item in
+                                    classification_content[0]['data']['Get'][memory_details[0][0]]]
+                logging.info("Classification content is", classification_content)
+            except:
+                classification_content = ""
+                retrieval_chunks = ""
+            # retrieval_chunks = [item['text'] for item in classification_content[0]['data']['Get'][memory_details[0]]]
+            # Concatenating the extracted text values
+            concatenated_retrievals = ' '.join(retrieval_chunks)
+            print(concatenated_retrievals)
+            logging.info("Retrieval chunks are", retrieval_chunks)
+            classification = await classify_documents(doc_name, document_id =doc_id, content=concatenated_retrievals)
+
+            logging.info("Classification is", str(classification))
+            neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url, username=config.graph_database_username,
+                                          password=config.graph_database_password)
+            if document_memory_types == ['PUBLIC']:
+                await create_public_memory(user_id=user_id, labels=['sr'], topic="PublicMemory")
+                ids = neo4j_graph_db.retrieve_node_id_for_memory_type(topic="PublicMemory")
+                neo4j_graph_db.close()
+                print(ids)
+            else:
+                ids = neo4j_graph_db.retrieve_node_id_for_memory_type(topic="SemanticMemory")
+                neo4j_graph_db.close()
+                print(ids)
+
+            for id in ids:
+                print(id.get('memoryId'))
+                neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url, username=config.graph_database_username,
+                                              password=config.graph_database_password)
+                if document_memory_types == ['PUBLIC']:
+
+                    rs = neo4j_graph_db.create_document_node_cypher(classification, user_id, public_memory_id=id.get('memoryId'))
+                    neo4j_graph_db.close()
+                else:
+                    rs = neo4j_graph_db.create_document_node_cypher(classification, user_id, memory_type='SemanticMemory')
+                    neo4j_graph_db.close()
+                logging.info("Cypher query is", rs)
+                neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url, username=config.graph_database_username,
+                                              password=config.graph_database_password)
+                neo4j_graph_db.query(rs)
+                neo4j_graph_db.close()
+            logging.info("WE GOT HERE")
+            neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url, username=config.graph_database_username,
+                                          password=config.graph_database_password)
+            if memory_details[0][1] == "PUBLIC":
+
+                neo4j_graph_db.update_document_node_with_db_ids( vectordb_namespace=memory_details[0][0],
+                                                               document_id=doc_id)
+                neo4j_graph_db.close()
+            else:
+                neo4j_graph_db.update_document_node_with_db_ids( vectordb_namespace=memory_details[0][0],
+                                                                document_id=doc_id, user_id=user_id)
+                neo4j_graph_db.close()
+            # await update_entity_graph_summary(session, DocsModel, doc_id, True)
+    except Exception as e:
+        return e


The function add_documents_to_graph_db has a large try-except block that catches all exceptions and returns the exception itself. This is not a good practice as it can obscure the source of errors and make debugging difficult. It's better to handle specific exceptions and log them appropriately.

coderabbitai · 2024-01-10T23:23:48Z

main.py

+    enriched_context = await user_context_enrichment(session, "user123", "How does cognitive architecture work?")
+    ```
+    """
+    neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url, username=config.graph_database_username,
+                                  password=config.graph_database_password)
+
+    await user_query_to_graph_db(session, user_id, query)
+
+    semantic_mem = neo4j_graph_db.retrieve_semantic_memory(user_id=user_id)
+    neo4j_graph_db.close()
+    neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url, username=config.graph_database_username,
+                                  password=config.graph_database_password)
+    episodic_mem = neo4j_graph_db.retrieve_episodic_memory(user_id=user_id)
+    neo4j_graph_db.close()
+    # public_mem = neo4j_graph_db.retrieve_public_memory(user_id=user_id)
+
+
+
+    if detect_language(query) != "en":
+        query = translate_text(query, "sr", "en")
+    logging.info("Translated query is", query)
+
+    neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url, username=config.graph_database_username,
+                                  password=config.graph_database_password)
+    summaries = await neo4j_graph_db.get_memory_linked_document_summaries(user_id=user_id, memory_type=memory_type)
+    neo4j_graph_db.close()
+    logging.info("Summaries are  is %s", summaries)
+    # logging.info("Context from graphdb is %s", context)
+    # result = neo4j_graph_db.query(document_categories_query)
+    # summaries = [record.get("summary") for record in result]
+    # logging.info('Possible document categories are', str(result))
+    # logging.info('Possible document categories are', str(categories))
+    relevant_summary_id = await classify_call( query= query, document_summaries=str(summaries))
+
+
+    # logging.info("Relevant categories after the classifier are %s", relevant_categories)
+    neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url, username=config.graph_database_username,
+                                  password=config.graph_database_password)
+    postgres_id = await neo4j_graph_db.get_memory_linked_document_ids(user_id, summary_id = relevant_summary_id, memory_type=memory_type)
+    neo4j_graph_db.close()
+    # postgres_id  = neo4j_graph_db.query(get_doc_ids)
+    logging.info("Postgres ids are %s", postgres_id)
+    namespace_id = await get_memory_name_by_doc_id(session, postgres_id[0])
+    logging.info("Namespace ids are %s", namespace_id)
+    params= {"doc_id":postgres_id[0]}
+    namespace_id = namespace_id[0]
+    namespace_class = namespace_id + "_class"
+    if memory_type =='PublicMemory':
+        user_id = 'system_user'
+
+    memory = await Memory.create_memory(user_id, session, namespace=namespace_id, job_id="23232",
+                                        memory_label=namespace_id)
+
+    existing_user = await Memory.check_existing_user(user_id, session)
+    print("here is the existing user", existing_user)
+    await memory.manage_memory_attributes(existing_user)
+
+    print("Namespace id is %s", namespace_id)
+    await memory.add_dynamic_memory_class(namespace_id.lower(), namespace_id)
+
+    dynamic_memory_class = getattr(memory, namespace_class.lower(), None)
+
+    methods_to_add = ["add_memories", "fetch_memories", "delete_memories"]
+
+    if dynamic_memory_class is not None:
+        for method_name in methods_to_add:
+            await memory.add_method_to_class(dynamic_memory_class, method_name)
+            print(f"Memory method {method_name} has been added")
+    else:
+        print(f"No attribute named  in memory.")
+
+    print("Available memory classes:", await memory.list_memory_classes())
+    results = await memory.dynamic_method_call(dynamic_memory_class, 'fetch_memories',
+                                              observation=query, params=postgres_id[0], search_type="summary_filter_by_object_name")
+    logging.info("Result is", str(results))
+
+    search_context = ""
+
+    for result in results['data']['Get'][namespace_id]:
+        # Assuming 'result' is a dictionary and has keys like 'source', 'text'
+        source = result['source']
+        text = result['text']
+        search_context += f"Document source: {source}, Document text: {text} \n"
+
+    context = f""" You are a memory system that uses cognitive architecture to enrich the 
+    LLM context and provide better query response.
+    You have access to the following information:
+    EPISODIC MEMORY: {episodic_mem}
+    SEMANTIC MEMORY: {semantic_mem}
+    PROCEDURAL MEMORY: NULL
+    SEARCH CONTEXT: The following documents provided with sources they were 
+    extracted from could be used to provide an answer {search_context}
+    The original user query: {query}
+    """
+    if generative_response is not True:
+        return context
+    else:
+        generative_result = generate_graph(context)
+        translation_to_srb = translate_text(generative_result.response, "en", "sr")
+        return translation_to_srb


The function user_context_enrichment is very long and does multiple things. It should be refactored into smaller functions, each with a single responsibility. Also, the function uses print statements for debugging, which should be replaced with proper logging.

coderabbitai · 2024-01-10T23:23:48Z

main.py

+async def create_public_memory(user_id: str=None, labels:list=None, topic:str=None) -> Optional[int]:
+    """
+    Create a public memory node associated with a user in a Neo4j graph database.
+    If Public Memory exists, it will return the id of the memory.
+    This is intended as standalone node that can be attached to any user.
+    It is not attached to any user by default.
+
+    Args:
+        user_id (str): The unique identifier for the user.
+        session (AsyncSession): An asynchronous session for database operations.
+
+    Returns:
+        Optional[int]: The ID of the created public memory node or None if an error occurs.
+        :param labels: Label for the memory, to help filter for different countries
+        :param topic: Topic for the memory, to help provide a name
+
+    """
+    # Validate input parameters
+    if not labels:
+        labels = ['sr']  # Labels for the memory node
+
+    if not topic:
+        topic = "PublicMemory"
+
+
+    try:
+        neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url,
+                                      username=config.graph_database_username,
+                                      password=config.graph_database_password)
+
+        # Assuming the topic for public memory is predefined, e.g., "PublicMemory"
+        # Create the memory node
+        memory_id = await neo4j_graph_db.create_memory_node(labels=labels, topic=topic)
+        neo4j_graph_db.close()
+        return memory_id
+    except Neo4jError as e:
+        logging.error(f"Error creating public memory node: {e}")
+        return None


The function create_public_memory has a try-except block that catches a Neo4jError but only logs the error without re-raising it or handling it in a way that the caller can react to. This could lead to silent failures where the caller is unaware that an error has occurred.

coderabbitai · 2024-01-10T23:23:49Z

main.py

+async def attach_user_to_memory(user_id: str=None, labels:list=None, topic:str=None) -> Optional[int]:
+    """
+    Link user to public memory
+
+    Args:
+        user_id (str): The unique identifier for the user.
+        topic (str): Memory name
+
+
+    Returns:
+        Optional[int]: The ID of the created public memory node or None if an error occurs.
+        :param labels: Label for the memory, to help filter for different countries
+        :param topic: Topic for the memory, to help provide a name
+
+    """
+    # Validate input parameters
+    if not user_id:
+        raise ValueError("User ID is required.")
+    if not labels:
+        labels = ['sr']  # Labels for the memory node
+
+    if not topic:
+        topic = "PublicMemory"
+
+
+    try:
+        neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url,
+                                      username=config.graph_database_username,
+                                      password=config.graph_database_password)
+
+        # Assuming the topic for public memory is predefined, e.g., "PublicMemory"
+        ids = neo4j_graph_db.retrieve_node_id_for_memory_type(topic=topic)
+        neo4j_graph_db.close()
+
+        for id in ids:
+            neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url,
+                                          username=config.graph_database_username,
+                                          password=config.graph_database_password)
+            linked_memory = neo4j_graph_db.link_public_memory_to_user(memory_id=id.get('memoryId'), user_id=user_id)
+            neo4j_graph_db.close()
+        return 1
+    except Neo4jError as e:
+        logging.error(f"Error creating public memory node: {e}")
+        return None


The function attach_user_to_memory raises a ValueError if the user_id is not provided, which is good for input validation. However, the rest of the function is wrapped in a try-except that catches Neo4jError and logs it, similar to the previous function. This could also lead to silent failures.

coderabbitai · 2024-01-10T23:23:49Z

main.py

+async def unlink_user_from_memory(user_id: str=None, labels:list=None, topic:str=None) -> Optional[int]:
+    """
+    Unlink user from memory
+
+    Args:
+        user_id (str): The unique identifier for the user.
+        topic (str): Memory name
+
+    Returns:
+        Optional[int]: The ID of the created public memory node or None if an error occurs.
+        :param labels: Label for the memory, to help filter for different countries
+        :param topic: Topic for the memory, to help provide a name
+
+    """
+    # Validate input parameters
+    if not user_id:
+        raise ValueError("User ID is required.")
+    if not labels:
+        labels = ['sr']  # Labels for the memory node
+
+    if not topic:
+        topic = "PublicMemory"
+
+
+    try:
+        neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url,
+                                      username=config.graph_database_username,
+                                      password=config.graph_database_password)
+
+        # Assuming the topic for public memory is predefined, e.g., "PublicMemory"
+        ids = neo4j_graph_db.retrieve_node_id_for_memory_type(topic=topic)
+        neo4j_graph_db.close()
+
+        for id in ids:
+            neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url,
+                                          username=config.graph_database_username,
+                                          password=config.graph_database_password)
+            linked_memory = neo4j_graph_db.unlink_memory_from_user(memory_id=id.get('memoryId'), user_id=user_id)
+            neo4j_graph_db.close()
+        return 1
+    except Neo4jError as e:
+        logging.error(f"Error creating public memory node: {e}")
+        return None


The function unlink_user_from_memory has the same issue as the attach_user_to_memory function, where it catches and logs Neo4jError without proper error handling.

coderabbitai · 2024-01-10T23:23:49Z

main.py

+        # CREATE (episodic)-[:HAS_EVENT]->(location5_5e43f4c45b3c44ea897c12220db4c051)
+        # CREATE (buffer)-[:CURRENTLY_HOLDING]->(location5_5e43f4c45b3c44ea897c12220db4c051)
+        # CREATE (episodic)-[:HAS_EVENT]->(object6_5cdb87ad488c450c9dbce07b7daf3d8d)
+        # CREATE (buffer)-[:CURRENTLY_HOLDING]->(object6_5cdb87ad488c450c9dbce07b7daf3d8d)
+        # CREATE (episodic)-[:HAS_EVENT]->(information7_f756e3f3720c4fe5aeb01287badaf088)
+        # CREATE (buffer)-[:CURRENTLY_HOLDING]->(information7_f756e3f3720c4fe5aeb01287badaf088)
+        # CREATE (episodic)-[:HAS_EVENT]->(event8_da6334e744454264900296319e14b532)
+        # CREATE (buffer)-[:CURRENTLY_HOLDING]->(event8_da6334e744454264900296319e14b532)
+        # CREATE (episodic)-[:HAS_EVENT]->(action9_48e45419604e4d66b3e718ee1d6c095f)
+        # CREATE (buffer)-[:CURRENTLY_HOLDING]->(action9_48e45419604e4d66b3e718ee1d6c095f)
+        # CREATE (episodic)-[:HAS_EVENT]->(action10_f48acb1db4da4934afbe17363e9e63a4)
+        # CREATE (buffer)-[:CURRENTLY_HOLDING]->(action10_f48acb1db4da4934afbe17363e9e63a4)"""
+        #
+        # out = cypher_statement_correcting(query)
+        # print(out)
+        #
+        # out = await user_query_to_graph_db(session, user_id, "I walked in the forest yesterday and added to my list I need to buy some milk in the store and get a summary from a classical book i read yesterday")
+        # print(out)
+        # load_doc_to_graph = await add_documents_to_graph_db(session, user_id)
+        # print(load_doc_to_graph)
+        user_id = 'test_user'
+        # loader_settings = {
+        #     "format": "PDF",
+        #     "source": "DEVICE",
+        #     "path": [".data"]
+        # }
+        # await load_documents_to_vectorstore(session, user_id, loader_settings=loader_settings)
+        # await create_public_memory(user_id=user_id, labels=['sr'], topic="PublicMemory")
+        # await add_documents_to_graph_db(session, user_id)
+        #
+        neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url, username=config.graph_database_username,
+                                      password=config.graph_database_password)
+
+        # await attach_user_to_memory(user_id=user_id, labels=['sr'], topic="PublicMemory")
+
+        return_ = await user_context_enrichment(user_id=user_id, query="Koja je minimalna visina ograde na balkonu na stambenom objektu", session=session, memory_type="PublicMemory", generative_response=True)
+        print(return_)
+        # document_summary = {
+        #     'DocumentCategory': 'Science',
+        #     'Title': 'The Future of AI',
+        #     'Summary': 'An insightful article about the advancements in AI.',
+        #     'd_id': 'doc123'
+        # }
+        #
+        # # Example user ID
+        # user_id = 'user'
+        #
+        # # value = await neo4j_graph_db.create_memory_node(labels=['sr'])
+        # # print(value)
+        # # neo4j_graph_db.close()
+        #
+        # await add_documents_to_graph_db(session, user_id)
+        # neo4j_graph_db.link_public_memory_to_user(memory_id = 17,user_id=user_id)
+        #
+        # ids = neo4j_graph_db.retrieve_node_id_for_memory_type(topic="Document")
+        # print(ids)
+        #
+        # for id in ids:
+        #     print(id.get('memoryId'))
+        #
+        #     neo4j_graph_db.delete_memory_node(memory_id = id.get('memoryId'), topic="Document")
+        #
+        # neo4j_graph_db.delete_memory_node(memory_id=16, topic="PublicSerbianArchitecture")
+        # neo4j_graph_db.unlink_memory_from_user(memory_id = 17,user_id=user_id)
+        # cypher_query_public = neo4j_graph_db.create_document_node_cypher(document_summary, user_id, memory_type="PUBLIC")
+        # neo4j_graph_db.query(cypher_query_public)
+        # link_memory_to_user(user_id, session)
+
+        # neo4j_graph_db.create_memory_node(labels=['sr'])
+        # out = await get_vectordb_namespace(session, user_id)
+        # params = {
+        #     "version": "1.0",
+        #     "agreement_id": "AG123456",
+        #     "privacy_policy": "https://example.com/privacy",
+        #     "terms_of_service": "https://example.com/terms",
+        #     "format": "json",
+        #     "schema_version": "1.1",
+        #     "checksum": "a1b2c3d4e5f6",
+        #     "owner": "John Doe",
+        #     "license": "MIT",
+        #     "validity_start": "2023-08-01",
+        #     "validity_end": "2024-07-31",
+        # }
+        # loader_settings = {
+        #     "format": "PDF",
+        #     "source": "DEVICE",
+        #     "path": [".data"],
+        #     "strategy": "SUMMARY",
+        # }
+        # await load_documents_to_vectorstore(session, user_id, loader_settings=loader_settings)
+        # await user_query_to_graph_db(session, user_id, "I walked in the forest yesterday and added to my list I need to buy some milk in the store and get a summary from a classical book i read yesterday")
+        # await add_documents_to_graph_db(session, user_id, loader_settings=loader_settings)
+        # await user_context_enrichment(session, user_id, query="Tell me about the book I read yesterday")
+
+
+if __name__ == "__main__":
+    import asyncio
+
+
+    asyncio.run(main())


The main function is mostly commented out, which suggests that it's either a work in progress or that the commented code is no longer needed. If it's the latter, the commented code should be removed to avoid confusion.

coderabbitai

Review Status

Actionable comments generated: 5

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 50e4d7c and f41f091.

Files selected for processing (3)

api.py (1 hunks)
cognitive_architecture/classifiers/classifier.py (1 hunks)
main.py (1 hunks)

Files skipped from review as they are similar to previous changes (2)

cognitive_architecture/classifiers/classifier.py
main.py

Additional comments: 1

api.py (1)

325-337: The start_api_server function is a utility to start the FastAPI server using uvicorn. The exception handling within this function is appropriate, logging the exception and potentially allowing for cleanup or recovery code to be added.

coderabbitai · 2024-01-11T00:20:44Z

api.py

+load_dotenv()
+OPENAI_API_KEY = os.getenv("OPENAI_API_KEY", "")
+app = FastAPI(debug=True)


The environment variable OPENAI_API_KEY is being loaded with a default empty string. This contradicts the previous review comment which suggested adding error handling if the key is not set. The application should not start without this key.

- OPENAI_API_KEY = os.getenv("OPENAI_API_KEY", "") + OPENAI_API_KEY = os.getenv("OPENAI_API_KEY") + if not OPENAI_API_KEY: + raise RuntimeError("OPENAI_API_KEY is not set in the environment variables.")

Committable suggestion

❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY", "")

app = FastAPI(debug=True)

load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

if not OPENAI_API_KEY:

raise RuntimeError("OPENAI_API_KEY is not set in the environment variables.")

app = FastAPI(debug=True)

coderabbitai · 2024-01-11T00:20:44Z

api.py

+@app.post("/add-memory", response_model=dict)
+async def add_memory(
+    payload: Payload,
+    # files: List[UploadFile] = File(...),
+):
+    try:
+        logging.info(" Adding to Memory ")
+        decoded_payload = payload.payload
+        async with session_scope(session=AsyncSessionLocal()) as session:
+            from main import load_documents_to_vectorstore
+
+            if 'settings' in decoded_payload and decoded_payload['settings'] is not None:
+                settings_for_loader = decoded_payload['settings']
+            else:
+                settings_for_loader = None
+
+            if 'content' in decoded_payload and decoded_payload['content'] is not None:
+                content = decoded_payload['content']
+            else:
+                content = None
+
+            output = await load_documents_to_vectorstore(session, decoded_payload['user_id'], content=content, loader_settings=settings_for_loader)
+            return JSONResponse(content={"response": output}, status_code=200)
+
+    except Exception as e:
+        return JSONResponse(
+            content={"response": {"error": str(e)}}, status_code=503
+        )


The /add-memory endpoint's error handling could be improved by providing more specific HTTP status codes and messages based on the exception type, as previously suggested.

- return JSONResponse(content={"response": {"error": str(e)}}, status_code=503) + if isinstance(e, SomeSpecificException): + status_code = 400 # Bad Request or other appropriate status code + else: + status_code = 500 # Internal Server Error + return JSONResponse(content={"response": {"error": str(e)}}, status_code=status_code)

Committable suggestion

❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

@app.post("/add-memory", response_model=dict)

async def add_memory(

payload: Payload,

# files: List[UploadFile] = File(...),

):

try:

logging.info(" Adding to Memory ")

decoded_payload = payload.payload

async with session_scope(session=AsyncSessionLocal()) as session:

from main import load_documents_to_vectorstore

if 'settings' in decoded_payload and decoded_payload['settings'] is not None:

settings_for_loader = decoded_payload['settings']

else:

settings_for_loader = None

if 'content' in decoded_payload and decoded_payload['content'] is not None:

content = decoded_payload['content']

else:

content = None

output = await load_documents_to_vectorstore(session, decoded_payload['user_id'], content=content, loader_settings=settings_for_loader)

return JSONResponse(content={"response": output}, status_code=200)

except Exception as e:

return JSONResponse(

content={"response": {"error": str(e)}}, status_code=503

)

except Exception as e:

if isinstance(e, SomeSpecificException):

status_code = 400 # Bad Request or other appropriate status code

else:

status_code = 500 # Internal Server Error

return JSONResponse(content={"response": {"error": str(e)}}, status_code=status_code)

coderabbitai · 2024-01-11T00:20:44Z

api.py

+@app.post("/add-architecture-public-memory", response_model=dict)
+async def add_memory(
+    payload: Payload,
+    # files: List[UploadFile] = File(...),
+):
+    try:
+        logging.info(" Adding to Memory ")
+        decoded_payload = payload.payload
+        async with session_scope(session=AsyncSessionLocal()) as session:
+            from main import load_documents_to_vectorstore
+            if 'content' in decoded_payload and decoded_payload['content'] is not None:
+                content = decoded_payload['content']
+            else:
+                content = None
+
+            user_id = 'system_user'
+            loader_settings = {
+                "format": "PDF",
+                "source": "DEVICE",
+                "path": [".data"]
+            }
+
+            output = await load_documents_to_vectorstore(session, user_id=user_id, content=content, loader_settings=loader_settings)
+            return JSONResponse(content={"response": output}, status_code=200)
+
+    except Exception as e:
+        return JSONResponse(
+            content={"response": {"error": str(e)}}, status_code=503
+        )


The /add-architecture-public-memory endpoint seems to duplicate the /add-memory endpoint's functionality with minor differences. The function name add_memory is used for both endpoints, which will cause a conflict. The function name should be unique to avoid collision.

- async def add_memory( + async def add_architecture_public_memory(

Committable suggestion

❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

@app.post("/add-architecture-public-memory", response_model=dict)

async def add_memory(

payload: Payload,

# files: List[UploadFile] = File(...),

):

try:

logging.info(" Adding to Memory ")

decoded_payload = payload.payload

async with session_scope(session=AsyncSessionLocal()) as session:

from main import load_documents_to_vectorstore

if 'content' in decoded_payload and decoded_payload['content'] is not None:

content = decoded_payload['content']

else:

content = None

user_id = 'system_user'

loader_settings = {

"format": "PDF",

"source": "DEVICE",

"path": [".data"]

}

output = await load_documents_to_vectorstore(session, user_id=user_id, content=content, loader_settings=loader_settings)

return JSONResponse(content={"response": output}, status_code=200)

except Exception as e:

return JSONResponse(

content={"response": {"error": str(e)}}, status_code=503

)

@app.post("/add-architecture-public-memory", response_model=dict)

async def add_architecture_public_memory(

payload: Payload,

# files: List[UploadFile] = File(...),

):

try:

logging.info(" Adding to Memory ")

decoded_payload = payload.payload

async with session_scope(session=AsyncSessionLocal()) as session:

from main import load_documents_to_vectorstore

if 'content' in decoded_payload and decoded_payload['content'] is not None:

content = decoded_payload['content']

else:

content = None

user_id = 'system_user'

loader_settings = {

"format": "PDF",

"source": "DEVICE",

"path": [".data"]

}

output = await load_documents_to_vectorstore(session, user_id=user_id, content=content, loader_settings=loader_settings)

return JSONResponse(content={"response": output}, status_code=200)

except Exception as e:

return JSONResponse(

content={"response": {"error": str(e)}}, status_code=503

)

coderabbitai · 2024-01-11T00:20:44Z

api.py

+@app.post("/drop-db")
+async def drop_db(payload: Payload):
+    try:
+        decoded_payload = payload.payload
+
+        if decoded_payload['operation'] == 'drop':
+
+            if  os.environ.get('AWS_ENV') == 'dev':
+                host = os.environ.get('POSTGRES_HOST')
+                username = os.environ.get('POSTGRES_USER')
+                password = os.environ.get('POSTGRES_PASSWORD')
+                database_name = os.environ.get('POSTGRES_DB')
+            else:
+                pass
+
+            from cognitive_architecture.database.create_database_tst import drop_database, create_admin_engine
+
+            engine = create_admin_engine(username, password, host, database_name)
+            drop_database(engine)
+            return JSONResponse(content={"response": "DB dropped"}, status_code=200)
+        else:
+
+            if os.environ.get('AWS_ENV') == 'dev':
+                host = os.environ.get('POSTGRES_HOST')
+                username = os.environ.get('POSTGRES_USER')
+                password = os.environ.get('POSTGRES_PASSWORD')
+                database_name = os.environ.get('POSTGRES_DB')
+            else:
+                pass
+
+            from cognitive_architecture.database.create_database_tst import create_database, create_admin_engine
+
+            engine = create_admin_engine(username, password, host, database_name)
+            create_database(engine)
+            return JSONResponse(content={"response": " DB created"}, status_code=200)
+
+
+
+    except Exception as e:
+        return HTTPException(status_code=500, detail=str(e))


The /drop-db endpoint has been updated to include logic for both dropping and creating the database based on the operation specified in the payload. However, the exception handling at the end of the function uses HTTPException incorrectly. It should be raised, not returned.

- return HTTPException(status_code=500, detail=str(e)) + raise HTTPException(status_code=500, detail=str(e))

Committable suggestion

❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

@app.post("/drop-db")

async def drop_db(payload: Payload):

try:

decoded_payload = payload.payload

if decoded_payload['operation'] == 'drop':

if os.environ.get('AWS_ENV') == 'dev':

host = os.environ.get('POSTGRES_HOST')

username = os.environ.get('POSTGRES_USER')

password = os.environ.get('POSTGRES_PASSWORD')

database_name = os.environ.get('POSTGRES_DB')

else:

pass

from cognitive_architecture.database.create_database_tst import drop_database, create_admin_engine

engine = create_admin_engine(username, password, host, database_name)

drop_database(engine)

return JSONResponse(content={"response": "DB dropped"}, status_code=200)

else:

if os.environ.get('AWS_ENV') == 'dev':

host = os.environ.get('POSTGRES_HOST')

username = os.environ.get('POSTGRES_USER')

password = os.environ.get('POSTGRES_PASSWORD')

database_name = os.environ.get('POSTGRES_DB')

else:

pass

from cognitive_architecture.database.create_database_tst import create_database, create_admin_engine

engine = create_admin_engine(username, password, host, database_name)

create_database(engine)

return JSONResponse(content={"response": " DB created"}, status_code=200)

except Exception as e:

return HTTPException(status_code=500, detail=str(e))

@app.post("/drop-db")

async def drop_db(payload: Payload):

try:

decoded_payload = payload.payload

if decoded_payload['operation'] == 'drop':

if os.environ.get('AWS_ENV') == 'dev':

host = os.environ.get('POSTGRES_HOST')

username = os.environ.get('POSTGRES_USER')

password = os.environ.get('POSTGRES_PASSWORD')

database_name = os.environ.get('POSTGRES_DB')

else:

pass

from cognitive_architecture.database.create_database_tst import drop_database, create_admin_engine

engine = create_admin_engine(username, password, host, database_name)

drop_database(engine)

return JSONResponse(content={"response": "DB dropped"}, status_code=200)

else:

if os.environ.get('AWS_ENV') == 'dev':

host = os.environ.get('POSTGRES_HOST')

username = os.environ.get('POSTGRES_USER')

password = os.environ.get('POSTGRES_PASSWORD')

database_name = os.environ.get('POSTGRES_DB')

else:

pass

from cognitive_architecture.database.create_database_tst import create_database, create_admin_engine

engine = create_admin_engine(username, password, host, database_name)

create_database(engine)

return JSONResponse(content={"response": " DB created"}, status_code=200)

except Exception as e:

raise HTTPException(status_code=500, detail=str(e))

coderabbitai · 2024-01-11T00:20:44Z

api.py

+@app.post("/create-public-memory")
+async def create_public_memory(payload: Payload):
+    try:
+        decoded_payload = payload.payload
+
+        if 'user_id' in decoded_payload and decoded_payload['user_id'] is not None:
+            user_id = decoded_payload['user_id']
+        else:
+            user_id = None
+
+        if 'labels' in decoded_payload and decoded_payload['labels'] is not None:
+            labels = decoded_payload['labels']
+        else:
+            labels = None
+
+        if 'topic' in decoded_payload and decoded_payload['topic'] is not None:
+            topic = decoded_payload['topic']
+        else:
+            topic = None
+
+        # Execute the query - replace this with the actual execution method
+        # async with session_scope(session=AsyncSessionLocal()) as session:
+        #     from main import create_public_memory
+            # Assuming you have a method in Neo4jGraphDB to execute the query
+        result = await create_public_memory(user_id=user_id, labels=labels, topic=topic)
+        return JSONResponse(content={"response": result}, status_code=200)
+
+    except Exception as e:
+        raise HTTPException(status_code=500, detail=str(e))


The /create-public-memory endpoint seems to have been updated with new logic. However, the commented-out code from lines 259-261 should be removed if it's no longer needed to keep the codebase clean.

coderabbitai

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between f41f091 and fb542a3.

Files selected for processing (1)

main.py (1 hunks)

Files skipped from review as they are similar to previous changes (1)

main.py

coderabbitai

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between fb542a3 and 7b39813.

Files selected for processing (1)

main.py (1 hunks)

Files skipped from review as they are similar to previous changes (1)

main.py

coderabbitai

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 7b39813 and 3d6d0a0.

Files selected for processing (1)

main.py (1 hunks)

Files skipped from review as they are similar to previous changes (1)

main.py

coderabbitai

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 3d6d0a0 and db18bd8.

Files selected for processing (1)

main.py (1 hunks)

Files skipped from review as they are similar to previous changes (1)

main.py

coderabbitai

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between db18bd8 and eee6049.

Files selected for processing (1)

main.py (1 hunks)

Files skipped from review as they are similar to previous changes (1)

main.py

coderabbitai

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between eee6049 and 444d4b3.

Files selected for processing (1)

main.py (1 hunks)

Files skipped from review as they are similar to previous changes (1)

main.py

coderabbitai

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 444d4b3 and d8f621f.

Files selected for processing (1)

main.py (1 hunks)

Files skipped from review as they are similar to previous changes (1)

main.py

coderabbitai

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between d8f621f and 94ef465.

Files selected for processing (1)

main.py (1 hunks)

Files skipped from review as they are similar to previous changes (1)

main.py

…cency

coderabbitai

Review Status

Actionable comments generated: 9

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 94ef465 and 0b1cefb.

Files selected for processing (3)

cognitive_architecture/database/graph_database/graph.py (1 hunks)
cognitive_architecture/database/vectordb/basevectordb.py (1 hunks)
cognitive_architecture/database/vectordb/vectordb.py (6 hunks)

Files not summarized due to errors (1)

cognitive_architecture/database/graph_database/graph.py: Error: Message exceeds token limit

Files skipped from review as they are similar to previous changes (2)

cognitive_architecture/database/vectordb/basevectordb.py
cognitive_architecture/database/vectordb/vectordb.py

coderabbitai · 2024-01-19T22:09:14Z

cognitive_architecture/database/graph_database/graph.py

+
+from neo4j import AsyncSession
+from neo4j.exceptions import Neo4jError
+


The debugging print statement should be removed or replaced with a logging statement if necessary.

- print(os.getcwd())

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

coderabbitai · 2024-01-19T22:09:14Z

cognitive_architecture/database/graph_database/graph.py

+from typing import List, Dict, Optional
+from ...utils import format_dict, append_uuid_to_variable_names, create_edge_variable_mapping, \
+    create_node_variable_mapping, get_unsumarized_vector_db_namespace


The relative import from ...utils should be replaced with an absolute import for clarity and to avoid issues with changing directory structures.

- from ...utils import format_dict, append_uuid_to_variable_names, create_edge_variable_mapping, \ - create_node_variable_mapping, get_unsumarized_vector_db_namespace + from cognitive_architecture.utils import format_dict, append_uuid_to_variable_names, create_edge_variable_mapping, \ + create_node_variable_mapping, get_unsumarized_vector_db_namespace

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

from typing import List, Dict, Optional

from ...utils import format_dict, append_uuid_to_variable_names, create_edge_variable_mapping, \

create_node_variable_mapping, get_unsumarized_vector_db_namespace

from typing import List, Dict, Optional

from cognitive_architecture.utils import format_dict, append_uuid_to_variable_names, create_edge_variable_mapping, \

create_node_variable_mapping, get_unsumarized_vector_db_namespace

coderabbitai · 2024-01-19T22:09:14Z

cognitive_architecture/database/graph_database/graph.py

+        try:
+            with self.driver.session() as session:
+                result = session.run(query, params).data()
+                return result
+        except Exception as e:
+            logging.error(f"An error occurred while executing the query: {e}")
+            raise e


The query method in the Neo4jGraphDB class catches a generic Exception. It should catch more specific exceptions to avoid hiding unexpected errors.

- except Exception as e: + except Neo4jError as e:

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

try:

with self.driver.session() as session:

result = session.run(query, params).data()

return result

except Exception as e:

logging.error(f"An error occurred while executing the query: {e}")

raise e

try:

with self.driver.session() as session:

result = session.run(query, params).data()

return result

except Neo4jError as e:

logging.error(f"An error occurred while executing the query: {e}")

raise e

coderabbitai · 2024-01-19T22:09:14Z

cognitive_architecture/database/graph_database/graph.py

+        logging.info("User Cypher Query raw: %s", cypher_statements_joined)
+        # corrected_cypher_statements = self.cypher_statement_correcting(input = cypher_statements_joined)
+        # logging.info("User Cypher Query: %s", corrected_cypher_statements.query)
+        # return corrected_cypher_statements.query


The commented-out code for correcting Cypher statements should be either uncommented and tested or removed to avoid confusion.

- # corrected_cypher_statements = self.cypher_statement_correcting(input = cypher_statements_joined) - # logging.info("User Cypher Query: %s", corrected_cypher_statements.query) - # return corrected_cypher_statements.query

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

logging.info("User Cypher Query raw: %s", cypher_statements_joined)

# corrected_cypher_statements = self.cypher_statement_correcting(input = cypher_statements_joined)

# logging.info("User Cypher Query: %s", corrected_cypher_statements.query)

# return corrected_cypher_statements.query

logging.info("User Cypher Query raw: %s", cypher_statements_joined)

coderabbitai · 2024-01-19T22:09:14Z

cognitive_architecture/database/graph_database/graph.py

+    async def create_memory_node(self, labels, topic=None):
+        """
+        Create or find a memory node of the specified type with labels and a description.
+
+        Args:
+            labels (List[str]): A list of labels for the node.
+            topic (str, optional): The type of memory node to create or find. Defaults to "PublicMemory".
+
+        Returns:
+            int: The ID of the created or found memory node.
+
+        Raises:
+            ValueError: If input parameters are invalid.
+            Neo4jError: If an error occurs during the database operation.
+        """
+        if topic is None:
+            topic = "PublicMemory"
+
+        # Prepare labels as a string
+        label_list = ', '.join(f"'{label}'" for label in labels)
+
+        # Cypher query to find or create the memory node with the given description and labels
+        memory_cypher = f"""
+        MERGE (memory:{topic} {{description: '{topic}', label: [{label_list}]}})
+        SET memory.memoryId = ID(memory)
+        RETURN id(memory) AS memoryId
+        """
+
+        try:
+            result = self.query(memory_cypher)
+            # Assuming the result is a list of records, where each record contains 'memoryId'
+            memory_id = result[0]['memoryId'] if result else None
+            self.close()
+            return memory_id
+        except Neo4jError as e:
+            logging.error(f"Error creating or finding memory node: {e}")
+            raise


The method create_memory_node uses string formatting to create a Cypher query, which could be vulnerable to injection attacks. It should be parameterized to ensure safety.

- MERGE (memory:{topic} {{description: '{topic}', label: [{label_list}]}}) + MERGE (memory:$topic {description: $description, label: $labels})

And update the query call to use parameters:

self.query(memory_cypher, params={"topic": topic, "description": topic, "labels": labels})

coderabbitai · 2024-01-19T22:09:14Z

cognitive_architecture/database/graph_database/graph.py

+    def link_public_memory_to_user(self, memory_id, user_id):
+        # Link an existing Public Memory node to a User node
+        link_cypher = f"""
+        MATCH (user:User {{userId: '{user_id}'}})
+        MATCH (publicMemory:PublicMemory) WHERE id(publicMemory) = {memory_id}
+        MERGE (user)-[:HAS_PUBLIC_MEMORY]->(publicMemory)
+        """
+        self.query(link_cypher)
+


The method link_public_memory_to_user does not validate the memory_id before using it in the Cypher query. It should ensure that memory_id is an integer to prevent potential issues.

- MATCH (publicMemory:PublicMemory) WHERE id(publicMemory) = {memory_id} + MATCH (publicMemory:PublicMemory) WHERE id(publicMemory) = $memory_id

And update the query call to use parameters:

self.query(link_cypher, params={"memory_id": memory_id})

coderabbitai · 2024-01-19T22:09:14Z

cognitive_architecture/database/graph_database/graph.py

+        link_cypher = f""" MATCH(publicMemory: {topic})
+        RETURN
+        id(publicMemory)
+        AS
+        memoryId """
+        node_ids = self.query(link_cypher)
+        return node_ids


The method retrieve_node_id_for_memory_type is using a Cypher query without parameterization. It should be updated to use parameters for topic to prevent potential injection issues.

- MATCH(publicMemory: {topic}) + MATCH(publicMemory: $topic)

And update the query call to use parameters:

self.query(link_cypher, params={"topic": topic})

coderabbitai · 2024-01-19T22:09:15Z

cognitive_architecture/database/graph_database/graph.py

+    #
+    # async def link_memory_to_user(self, memory_id: int, user_id: str, relationship_type: str = 'HAS_MEMORY') -> None:
+    #     """
+    #     Link a memory node to a user with a specified relationship type.
+    #
+    #     Args:
+    #         memory_id (int): The ID of the memory node.
+    #         user_id (str): The user ID to link the memory to.
+    #         relationship_type (str): The type of relationship.
+    #
+    #     Raises:
+    #         ValueError: If input parameters are invalid.
+    #         Neo4jError: If an error occurs during the database operation.
+    #     """
+    #     if not user_id or not memory_id:
+    #         raise ValueError("User ID and Memory ID are required for linking.")
+    #
+    #     try:
+    #         link_cypher = f"""
+    #         MATCH (user:User {{userId: '{user_id}'}})
+    #         MATCH (memory) WHERE id(memory) = {memory_id}
+    #         MERGE (user)-[:{relationship_type}]->(memory)
+    #         """
+    #         await self.query(link_cypher)
+    #     except Neo4jError as e:
+    #         logging.error(f"Error linking memory to user: {e}")
+    #         raise
+    #
+    # async def delete_memory_node(self, memory_id: int, memory_type: str) -> None:
+    #     """
+    #     Delete a memory node of a specified type.
+    #
+    #     Args:
+    #         memory_id (int): The ID of the memory node to delete.
+    #         memory_type (str): The type of the memory node.
+    #
+    #     Raises:
+    #         ValueError: If input parameters are invalid.
+    #         Neo4jError: If an error occurs during the database operation.
+    #     """
+    #     if not memory_id or not memory_type:
+    #         raise ValueError("Memory ID and Memory Type are required for deletion.")
+    #
+    #     try:
+    #         delete_cypher = f"""
+    #         MATCH (memory:{memory_type}) WHERE id(memory) = {memory_id}
+    #         DETACH DELETE memory
+    #         """
+    #         await self.query(delete_cypher)
+    #     except Neo4jError as e:
+    #         logging.error(f"Error deleting memory node: {e}")
+    #         raise
+    #
+    # async def unlink_memory_from_user(self, memory_id: int, user_id: str, relationship_type: str = 'HAS_MEMORY') -> None:
+    #     """
+    #     Unlink a memory node from a user.
+    #
+    #     Args:
+    #         memory_id (int): The ID of the memory node.
+    #         user_id (str): The user ID to unlink from the memory.
+    #         relationship_type (str): The type of relationship.
+    #
+    #     Raises:
+    #         ValueError: If input parameters are invalid.
+    #         Neo4jError: If an error occurs during the database operation.
+    #     """
+    #     if not user_id or not memory_id:
+    #         raise ValueError("User ID and Memory ID are required for unlinking.")
+    #
+    #     try:
+    #         unlink_cypher = f"""
+    #         MATCH (user:User {{userId: '{user_id}'}})-[r:{relationship_type}]->(memory) WHERE id(memory) = {memory_id}
+    #         DELETE r
+    #         """
+    #         await self.query(unlink_cypher)
+    #     except Neo4jError as e:
+    #         logging.error(f"Error unlinking memory from user: {e}")
+    #         raise


There are large blocks of commented-out code that seem to be previous versions of methods or alternative implementations. This should be cleaned up to avoid confusion and maintain code clarity.

coderabbitai · 2024-01-19T22:09:15Z

cognitive_architecture/database/graph_database/graph.py

+    # def create_public_memory(self, labels, topic=None):
+    #     if topic is None:
+    #         topic = "SerbianArchitecture"
+    #         topicMemoryId = topic + "MemoryId"
+    #     # Create an independent Architecture Memory node with countries as properties
+    #     label_list = ', '.join(f"'{label}'" for label in labels)  # Prepare countries list as a string
+    #     architecture_memory_cypher = f"""
+    #        CREATE ({topic.lower()}:{topic} {{description: '{topic}', label: [{label_list}]}})
+    #        RETURN id({topic.lower()}) AS {topicMemoryId}
+    #        """
+    #     return self.query(architecture_memory_cypher)
+    #
+    # def link_public_memory_to_user(self, public_memory_id, user_id):
+    #     # Link an existing Public Memory node to a User node
+    #     link_cypher = f"""
+    #     MATCH (user:User {{userId: '{user_id}'}})
+    #     MATCH (publicMemory:PublicMemory) WHERE id(publicMemory) = {public_memory_id}
+    #     MERGE (user)-[:HAS_PUBLIC_MEMORY]->(publicMemory)
+    #     """
+    #     self.query(link_cypher)
+    #
+    # def link_public_memory_to_architecture(self, public_memory_id):
+    #     # Link the Public Memory node to the Architecture Memory node
+    #     link_cypher = f"""
+    #     MATCH (publicMemory:PublicMemory) WHERE id(publicMemory) = {public_memory_id}
+    #     MATCH (architecture:Architecture {{description: 'Architecture'}})
+    #     MERGE (publicMemory)-[:INCLUDES]->(architecture)
+    #     """
+    #     self.query(link_cypher)
+    #
+    # def delete_public_memory(self, public_memory_id):
+    #     # Delete a Public Memory node by its ID
+    #     delete_cypher = f"""
+    #     MATCH (publicMemory:PublicMemory) WHERE id(publicMemory) = {public_memory_id}
+    #     DETACH DELETE publicMemory
+    #     """
+    #     self.query(delete_cypher)
+    #
+    # def delete_architecture_memory(self, architecture_memory_id):
+    #     # Delete an Architecture Memory node by its ID
+    #     delete_cypher = f"""
+    #     MATCH (architecture:Architecture) WHERE id(architecture) = {architecture_memory_id}
+    #     DETACH DELETE architecture
+    #     """
+    #     self.query(delete_cypher)
+    #
+    # def unlink_public_memory_from_user(self, public_memory_id, user_id):
+    #     # Unlink a Public Memory node from a User node
+    #     unlink_cypher = f"""
+    #     MATCH (user:User {{userId: '{user_id}'}})-[r:HAS_PUBLIC_MEMORY]->(publicMemory:PublicMemory) WHERE id(publicMemory) = {public_memory_id}
+    #     DELETE r
+    #     """
+    #     self.query(unlink_cypher)
+    #
+    # def unlink_public_memory_from_architecture(self, public_memory_id):
+    #     # Unlink the Public Memory node from the Architecture Memory node
+    #     unlink_cypher = f"""
+    #     MATCH (publicMemory:PublicMemory)-[r:INCLUDES]->(architecture:Architecture) WHERE id(publicMemory) = {public_memory_id}
+    #     DELETE r
+    #     """
+    #     self.query(unlink_cypher)
+


More commented-out code is present at the end of the file. If this code is not needed, it should be removed. If it's meant to be used in the future, it should be documented as to why it's being retained.

…cency

coderabbitai

Review Status

Actionable comments generated: 6

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 0b1cefb and dab514f.

Files selected for processing (7)

cognitive_architecture/database/graph_database/graph.py (1 hunks)
cognitive_architecture/database/graph_database/networkx_graph.py (1 hunks)
cognitive_architecture/database/vectordb/basevectordb.py (2 hunks)
cognitive_architecture/llm/prompts/generate_graph_prompt.txt (1 hunks)
cognitive_architecture/llm/queries.py (1 hunks)
cognitive_architecture/shared/data_models.py (1 hunks)
main.py (1 hunks)

Files skipped from review as they are similar to previous changes (2)

cognitive_architecture/database/vectordb/basevectordb.py
main.py

coderabbitai · 2024-01-22T17:24:25Z

cognitive_architecture/shared/data_models.py

+    id: int
+    description: str
+    category: str
+    color: str ="blue"


The default value for the color attribute in the Node class is set as a mutable default argument. This can lead to unexpected behavior if the object is mutated.

- color: str ="blue" + color: str = Field(default="blue")

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

color: str ="blue"

color: str = Field(default="blue")

coderabbitai · 2024-01-22T17:24:25Z

cognitive_architecture/shared/data_models.py

+    source: int
+    target: int
+    description: str
+    color: str= "blue"


The same issue with mutable default arguments is present in the Edge class for the color attribute.

- color: str= "blue" + color: str = Field(default="blue")

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

color: str= "blue"

color: str = Field(default="blue")

coderabbitai · 2024-01-22T17:24:25Z

cognitive_architecture/llm/queries.py

+
+
+def generate_graph(input) -> KnowledgeGraph:
+    model = "gpt-4-1106-preview"  # Define the model here


Hardcoding the model name within the generate_graph function limits flexibility and maintainability. Consider using a configuration setting for the model name.

- model = "gpt-4-1106-preview" # Define the model here + model = config.model # Use the model from the configuration

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

model = "gpt-4-1106-preview" # Define the model here

model = config.model # Use the model from the configuration

coderabbitai · 2024-01-22T17:24:25Z

cognitive_architecture/llm/queries.py

+
+
+
+async def generate_summary(input) -> MemorySummary:


The generate_summary function is defined as async but does not perform any await operations. If the aclient.chat.completions.create method is not asynchronous, the async keyword should be removed.

- async def generate_summary(input) -> MemorySummary: + def generate_summary(input) -> MemorySummary:

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

async def generate_summary(input) -> MemorySummary:

def generate_summary(input) -> MemorySummary:

coderabbitai · 2024-01-22T17:24:25Z

cognitive_architecture/database/graph_database/networkx_graph.py

+        except (FileNotFoundError, EOFError, pickle.UnpicklingError):
+            self.graph = nx.Graph()  # Create a new graph if loading failed


Catching broad exceptions can mask other unexpected issues. Specify the exceptions that you expect to handle.

- except (FileNotFoundError, EOFError, pickle.UnpicklingError): + except (FileNotFoundError, EOFError, pickle.UnpicklingError) as e: + logging.error(f"Error loading graph: {e}")

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

except (FileNotFoundError, EOFError, pickle.UnpicklingError):

self.graph = nx.Graph() # Create a new graph if loading failed

except (FileNotFoundError, EOFError, pickle.UnpicklingError) as e:

logging.error(f"Error loading graph: {e}")

self.graph = nx.Graph() # Create a new graph if loading failed

coderabbitai · 2024-01-22T17:24:25Z

cognitive_architecture/database/graph_database/networkx_graph.py

+    def generate_graph_semantic_memory_document_summary(self, document_summary, unique_graphdb_mapping_values, document_namespace, user_id):
+        for node, attributes in unique_graphdb_mapping_values.items():
+            self.graph.add_node(node, **attributes)
+            self.graph.add_edge(f"{user_id}_semantic", node, relation='HAS_KNOWLEDGE')
+        self.save_graph()


The method generate_graph_semantic_memory_document_summary does not use the document_summary parameter. If it's not needed, consider removing it.

- def generate_graph_semantic_memory_document_summary(self, document_summary, unique_graphdb_mapping_values, document_namespace, user_id): + def generate_graph_semantic_memory_document_summary(self, unique_graphdb_mapping_values, document_namespace, user_id):

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change

def generate_graph_semantic_memory_document_summary(self, document_summary, unique_graphdb_mapping_values, document_namespace, user_id):

for node, attributes in unique_graphdb_mapping_values.items():

self.graph.add_node(node, **attributes)

self.graph.add_edge(f"{user_id}_semantic", node, relation='HAS_KNOWLEDGE')

self.save_graph()

def generate_graph_semantic_memory_document_summary(self, unique_graphdb_mapping_values, document_namespace, user_id):

for node, attributes in unique_graphdb_mapping_values.items():

self.graph.add_node(node, **attributes)

self.graph.add_edge(f"{user_id}_semantic", node, relation='HAS_KNOWLEDGE')

self.save_graph()

Build the docker and push

c43536c

coderabbitai bot reviewed Dec 16, 2023

View reviewed changes

Build the docker and push

dfb13a2

coderabbitai bot reviewed Dec 16, 2023

View reviewed changes

Automate summary function

6e47c1f

coderabbitai bot reviewed Dec 17, 2023

View reviewed changes

Make sure that secrets would work on aws

5ed04de

coderabbitai bot reviewed Dec 21, 2023

View reviewed changes

Vasilije1990 added 2 commits December 21, 2023 18:54

Make sure that secrets would work on aws

985c459

Make sure that secrets would work on aws

9293c74

coderabbitai bot reviewed Dec 21, 2023

View reviewed changes

Vasilije1990 added 2 commits December 21, 2023 20:24

Make sure that secrets would work on aws

cbde083

Make sure that secrets would work on aws

078d697

coderabbitai bot reviewed Dec 21, 2023

View reviewed changes

Vasilije1990 added 2 commits December 21, 2023 20:35

Make sure that secrets would work on aws

fa1df60

Make sure that secrets would work on aws

523f893

coderabbitai bot reviewed Dec 21, 2023

View reviewed changes

Make a change to test neo4j

98383b7

coderabbitai bot reviewed Dec 21, 2023

View reviewed changes

Make a change to test neo4j

118613c

coderabbitai bot reviewed Dec 21, 2023

View reviewed changes

Add all functions for architecture to work + for regular user flows, …

2928f51

…refactor and reformat issues, add boilerplate for loading jobs logic and extend search functionalities.

coderabbitai bot reviewed Dec 27, 2023

View reviewed changes

Add all functions for architecture to work + for regular user flows, …

be3f179

…refactor and reformat issues, add boilerplate for loading jobs logic and extend search functionalities.

coderabbitai bot reviewed Dec 27, 2023

View reviewed changes

Vasilije1990 added 2 commits December 28, 2023 00:55

Add all functions for architecture to work + for regular user flows, …

60d5c74

…refactor and reformat issues, add boilerplate for loading jobs logic and extend search functionalities.

Add all functions for architecture to work + for regular user flows, …

3632a85

…refactor and reformat issues, add boilerplate for loading jobs logic and extend search functionalities.

coderabbitai bot reviewed Dec 28, 2023

View reviewed changes

Fix context retireval

0323ab6

coderabbitai bot reviewed Jan 10, 2024

View reviewed changes

Fix context retireval

50e4d7c

coderabbitai bot reviewed Jan 10, 2024

View reviewed changes

Fix context retireval

f41f091

coderabbitai bot reviewed Jan 11, 2024

View reviewed changes

Fix context retireval

fb542a3

coderabbitai bot reviewed Jan 11, 2024

View reviewed changes

Reduce size of context, hack for now

7b39813

coderabbitai bot reviewed Jan 13, 2024

View reviewed changes

Vasilije1990 added 2 commits January 14, 2024 08:58

Reduce size of context, hack for now

9b2a639

Reduce size of context, hack for now

3d6d0a0

coderabbitai bot reviewed Jan 14, 2024

View reviewed changes

Reduce size of context, hack for now

db18bd8

coderabbitai bot reviewed Jan 14, 2024

View reviewed changes

Reduce size of context, hack for now

eee6049

coderabbitai bot reviewed Jan 14, 2024

View reviewed changes

Reduce size of context, hack for now

444d4b3

coderabbitai bot reviewed Jan 17, 2024

View reviewed changes

Reduce size of context, hack for now

d8f621f

coderabbitai bot reviewed Jan 17, 2024

View reviewed changes

Reduce size of context, hack for now

94ef465

coderabbitai bot reviewed Jan 17, 2024

View reviewed changes

Vasilije1990 added 2 commits January 19, 2024 23:05

Reduce size of context, hack for now

f9f7d4a

Added timestamps to be able to filter and merge the nodes based on re…

0b1cefb

…cency

coderabbitai bot reviewed Jan 19, 2024

View reviewed changes

Added timestamps to be able to filter and merge the nodes based on re…

dab514f

…cency

coderabbitai bot reviewed Jan 22, 2024

View reviewed changes

Vasilije1990 merged commit e189213 into main Jan 23, 2024
3 of 4 checks passed

Vasilije1990 deleted the feature/test_deployment_of_docker branch March 25, 2024 13:43

	with session_scope(session) as s:
	async with session_scope(session) as s:

	logging.info("Docs are", docs)
	logging.info("Docs are %s", docs)

	logging.info("Memory names are", memory_name)
	logging.info("Memory names are %s", memory_name)

	logging.info("Classification content is", classification_content)
	logging.info("Classification content is %s", classification_content)

	logging.info("Classification is", str(classification))
	logging.info("Classification is %s", str(classification))

	logging.info("hERE IS THE OUT2323", doc_id)
	logging.info("hERE IS THE OUT2323 %s", doc_id)

	logging.info("Loading document with defautl loader settings %s", str(doc))
	logging.info("Loading document with default loader settings %s", str(doc))

		except Exception as e:
		return HTTPException(status_code=500, detail=str(e))

		elif os.getenv('ENV') == 'local':
		postgres_host: str = os.getenv('POSTGRES_HOST_LOCAL')

	# from cognitive_architecture.config import Config
	# config = Config()
	# config.load()

+# TODO: Add authorization checks before allowing deletion of memories
+try:
+    # existing deletion logic
+except Exception as e:
+    logging.error(f"Failed to delete memories: {e}")
+    raise


		from neo4j import AsyncSession
		from neo4j.exceptions import Neo4jError



		def generate_graph(input) -> KnowledgeGraph:
		model = "gpt-4-1106-preview" # Define the model here

	model = "gpt-4-1106-preview" # Define the model here
	model = config.model # Use the model from the configuration

	async def generate_summary(input) -> MemorySummary:
	def generate_summary(input) -> MemorySummary:

		except (FileNotFoundError, EOFError, pickle.UnpicklingError):
		self.graph = nx.Graph() # Create a new graph if loading failed

Build the docker and push #33

Build the docker and push #33

Conversation

Vasilije1990 commented Dec 16, 2023 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented Dec 16, 2023 • edited Loading

Walkthrough

Changes

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (.coderabbit.yaml)

CodeRabbit Discord Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Dec 16, 2023

Choose a reason for hiding this comment

coderabbitai bot Dec 16, 2023

Choose a reason for hiding this comment

coderabbitai bot Dec 16, 2023

Choose a reason for hiding this comment

coderabbitai bot Dec 16, 2023

Choose a reason for hiding this comment

coderabbitai bot Dec 16, 2023

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Dec 17, 2023

Choose a reason for hiding this comment

coderabbitai bot Dec 17, 2023

Choose a reason for hiding this comment

coderabbitai bot Dec 17, 2023

Choose a reason for hiding this comment

coderabbitai bot Dec 17, 2023

Choose a reason for hiding this comment

coderabbitai bot Dec 17, 2023

Choose a reason for hiding this comment

coderabbitai bot Dec 17, 2023

Choose a reason for hiding this comment

coderabbitai bot Dec 17, 2023

Choose a reason for hiding this comment

coderabbitai bot Dec 17, 2023

Choose a reason for hiding this comment

coderabbitai bot Dec 17, 2023

Choose a reason for hiding this comment

coderabbitai bot Dec 17, 2023

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Dec 21, 2023

Choose a reason for hiding this comment

coderabbitai bot Dec 21, 2023

Choose a reason for hiding this comment

coderabbitai bot Dec 21, 2023

Choose a reason for hiding this comment

coderabbitai bot Dec 21, 2023

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Dec 21, 2023

Choose a reason for hiding this comment

coderabbitai bot Dec 21, 2023

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Dec 21, 2023

Choose a reason for hiding this comment

coderabbitai bot Dec 21, 2023

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

Vasilije1990 commented Dec 16, 2023 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 16, 2023 •

edited

Loading

CodeRabbit Configration File (`.coderabbit.yaml`)