-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build the docker and push #33
Conversation
WalkthroughThe changes reflect a significant update to a software project, enhancing its CI/CD pipelines, refining database interactions, and expanding its cognitive architecture. The project shifts to the Apache License and introduces new functionalities for language processing, memory management, and vector database operations. It also improves Docker configurations and adds a FastAPI-based web service, signaling a broad enhancement of the system's capabilities and infrastructure. Changes
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 5
Configuration used: CodeRabbit UI
Files ignored due to filter (6)
- assets/Dashboard_example.png
- assets/User_graph.png
- assets/infographic_final.png
- docker-compose.yml
- poetry.lock
- pyproject.toml
Files selected for processing (7)
- .github/actions/image_builder/action.yaml (1 hunks)
- .github/workflows/cd.yaml (2 hunks)
- .github/workflows/cd_prd.yaml (2 hunks)
- .github/workflows/ci.yaml (1 hunks)
- LICENSE (1 hunks)
- bin/dockerize (2 hunks)
- cognitive_architecture/openai_tools.py (1 hunks)
Files skipped from review due to trivial changes (1)
- .github/actions/image_builder/action.yaml
Additional comments: 13
.github/workflows/cd.yaml (2)
50-51: The
ecr_image_repo_name
anddockerfile_location
have been updated. Verify that the new Docker image repository name and Dockerfile location are consistent with the intended Docker configurations and naming conventions used across the project.61-76: The new
apply_tf
job is introduced to trigger a Terraform apply workflow. Ensure that thePAT_FOR_CROSS_REPOS_CICD_TRIGGERING
secret is properly configured for cross-repository workflow dispatch and that the workflowterraform.apply.yml
exists and is set up to be triggered from this repository..github/workflows/cd_prd.yaml (3)
- 23-57: The steps added to the
publish_docker_to_ecr
job for setting environment variables, configuring AWS credentials, and creating and pushing Docker images appear to be correctly implemented. However, it's important to verify that the.github/actions/image_builder
action exists and is configured properly, as it is being used to build and push the Docker image.Verification successful
The verification of the
.github/actions/image_builder
action has been successful. The directory exists within the.github/actions/
path as confirmed by the output of the shell script.* 81-82: The output `promethai_docker_tag_backend` is correctly set to be used by downstream jobs or steps. This is a good practice for passing data between jobs in a GitHub Actions workflow.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Verify that the image_builder GitHub action exists and check its configuration. fd --exec ls -la '{}'/image_builder '*/.github/actions/'Length of output: 34680
Script:
#!/bin/bash # Search for the 'image_builder' directory within the '.github/actions/' path. fd 'image_builder' '.github/actions/'Length of output: 66
- 84-99: The
apply_tf
job is configured to trigger a Terraform apply workflow after the Docker image is published. The use ofactions/github-script
to programmatically dispatch a workflow is correct. Ensure that thePAT_FOR_CROSS_REPOS_CICD_TRIGGERING
secret is set with the appropriate permissions and that thePromethAI-Infra
repository andterraform.apply.yml
workflow exist and are configured to accept dispatch events..github/workflows/ci.yaml (2)
23-23: The
docker_login
andpublish
flags are set to "false" in thedockerize
command. Confirm that this is the intended behavior for the CI workflow and that there are no scenarios where these should be set to "true".24-24: The Docker image version is now being read from
/tmp/.DOCKER_IMAGE_VERSION
. Verify that thedockerize
script writes the Docker image version to this new path and that there are no conflicts with other processes that may use the/tmp
directory.LICENSE (1)
- 1-201: The change from the MIT License to the Apache License, Version 2.0, is a significant alteration to the terms under which the software is provided. Ensure that this change is intentional and that all contributors are aware of the implications of this change. Additionally, verify that all necessary documentation and code headers are updated to reflect the new licensing terms.
bin/dockerize (2)
11-11: The change to the
IMAGE_NAME
variable removes the "-memory" suffix from the Docker image name. Ensure that this new naming convention is consistent with the intended Docker image naming strategy across the project.15-15: The
APP_DIR
variable has been changed from "level_2/" to the current directory ".". Verify that this change aligns with the directory structure of the application and that the Docker build context is correctly set to include all necessary files.cognitive_architecture/openai_tools.py (3)
108-110: The use of the asynchronous retry decorator
@aretry_with_exponential_backoff
onacompletions_with_backoff
is correct, but ensure that all calls toacompletions_with_backoff
have been updated to be awaited, as this change affects the outward-facing interface of the codebase.119-125: The function
async_get_embedding_with_backoff
is correctly implemented as an asynchronous function. It's important to verify that this function is awaited wherever it's called, and that the introduction of this function does not conflict with any existing synchronous code paths.128-137: The synchronous functions
create_embedding_with_backoff
andget_embedding_with_backoff
remain unchanged and should continue to work as before. However, it's important to ensure that the introduction of their asynchronous counterparts does not lead to confusion or misuse in the codebase.
fi | ||
docker push "${FULL_IMAGE_NAME}" && | ||
echo "Successfully pushed docker image ${FULL_IMAGE_NAME} to ECR repository" | ||
fi | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nested if
statement checking the value of PUBLISH
is redundant and can be removed for clarity and simplicity.
if [ "${PUBLISH}" = true ]; then
- echo "logging in"
- aws ecr get-login-password --region "${AWS_REGION}" | docker login --username AWS --password-stdin "${AWS_REPOSITORY}"
- if [ "${PUBLISH}" = true ]; then
- echo "logging in"
- aws ecr get-login-password --region "${AWS_REGION}" | docker login --username AWS --password-stdin "${AWS_REPOSITORY}"
fi
Committable suggestion
❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
fi | |
docker push "${FULL_IMAGE_NAME}" && | |
echo "Successfully pushed docker image ${FULL_IMAGE_NAME} to ECR repository" | |
fi | |
fi | |
fi | |
docker push "${FULL_IMAGE_NAME}" && | |
echo "Successfully pushed docker image ${FULL_IMAGE_NAME} to ECR repository" | |
fi |
SHA_SHORT="$(git rev-parse --short HEAD)" | ||
CUR_DATE="$(date +%Y%m%d%H%M%S)" | ||
VERSION="$STAGE-$CUR_DATE-$SHA_SHORT" | ||
IMAGE_NAME=${image_name:-promethai-${STAGE}-promethai-backend-memory} | ||
IMAGE_NAME=${image_name:-promethai-${STAGE}-promethai-backend} | ||
|
||
REPO_NAME="${AWS_REPOSITORY}/${IMAGE_NAME}" | ||
FULL_IMAGE_NAME="${REPO_NAME}:${VERSION}" | ||
APP_DIR=${app_dir:-"level_2/"} # Updated this line | ||
APP_DIR=${app_dir:-"."} | ||
|
||
PUBLISH=${publish:-false} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: This review was outside the patches, and no patch overlapping with it was found. Original lines [19-19]
The echo statement should use the APP_DIR
variable instead of app_dir
to reflect the directory being used in the build process.
- echo "Building docker image ${FULL_IMAGE_NAME} located in dir ${app_dir}"
+ echo "Building docker image ${FULL_IMAGE_NAME} located in dir ${APP_DIR}"
# with: | ||
# tag_name: ${{ steps.bump_version_and_push_tag.outputs.tag }} | ||
# release_name: Release ${{ steps.bump_version_and_push_tag.outputs.tag }} | ||
# outputs: | ||
# promethai_docker_tag_backend: ${{ steps.export-promethai-docker-tag.outputs.promethai_docker_tag_backend }} | ||
outputs: | ||
promethai_docker_tag_backend: ${{ steps.export-promethai-docker-tag.outputs.promethai_docker_tag_backend }} | ||
|
||
# apply_tf: | ||
# name: Trigger terraform apply workflow | ||
# runs-on: ubuntu-latest | ||
# needs: publish_docker_to_ecr | ||
# steps: | ||
# - name: TF apply workflow triggers step | ||
# uses: actions/github-script@v6 | ||
# with: | ||
# github-token: ${{ secrets.PAT_FOR_CROSS_REPOS_CICD_TRIGGERING }} | ||
# script: | | ||
# await github.rest.actions.createWorkflowDispatch({ | ||
# owner: 'topoteretes', | ||
# repo: 'PromethAI-Infra', | ||
# workflow_id: 'terraform.apply.yml', | ||
# ref: 'main' | ||
# }) | ||
apply_tf: | ||
name: Trigger terraform apply workflow | ||
runs-on: ubuntu-latest | ||
needs: publish_docker_to_ecr | ||
steps: | ||
- name: TF apply workflow triggers step | ||
uses: actions/github-script@v6 | ||
with: | ||
github-token: ${{ secrets.PAT_FOR_CROSS_REPOS_CICD_TRIGGERING }} | ||
script: | | ||
await github.rest.actions.createWorkflowDispatch({ | ||
owner: 'topoteretes', | ||
repo: 'PromethAI-Infra', | ||
workflow_id: 'terraform.apply.yml', | ||
ref: 'main' | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [59-80]
The commented-out code for creating tags and releases is still present in the workflow file. If these steps have been implemented elsewhere, this commented-out code should be removed to avoid confusion and maintain cleanliness of the workflow file.
- # - name: Create Tag and Release
- # runs-on: ubuntu-latest
- # uses: actions/checkout@v3
- # needs: publish_docker_to_ecr # ensure this job runs after Docker image is pushed
- # steps:
- # - name: Check out code
- # uses: actions/checkout@v3
- # - name: Bump version and push tag
- # id: bump_version_and_push_tag
- # uses: anothrNick/[email protected]
- # env:
- # GITHUB_TOKEN: ${{ secrets.PAT_FOR_CROSS_REPOS_CICD_TRIGGERING }}
- # WITH_V: true
- # DEFAULT_BUMP: 'minor' # or 'minor' or 'major'
- # - name: Create Release
- # id: create_release
- # uses: actions/create-release@v1
- # env:
- # GITHUB_TOKEN: ${{ secrets.PAT_FOR_CROSS_REPOS_CICD_TRIGGERING }}
- # with:
- # tag_name: ${{ steps.bump_version_and_push_tag.outputs.tag }}
- # release_name: Release ${{ steps.bump_version_and_push_tag.outputs.tag }}
def aretry_with_exponential_backoff( | ||
func, | ||
initial_delay: float = 1, | ||
exponential_base: float = 2, | ||
jitter: bool = True, | ||
max_retries: int = 20, | ||
errors: tuple = (openai.error.RateLimitError,), | ||
): | ||
"""Retry a function with exponential backoff.""" | ||
|
||
async def wrapper(*args, **kwargs): | ||
# Initialize variables | ||
num_retries = 0 | ||
delay = initial_delay | ||
|
||
# Loop until a successful response or max_retries is hit or an exception is raised | ||
while True: | ||
try: | ||
return await func(*args, **kwargs) | ||
|
||
# Retry on specified errors | ||
except errors as e: | ||
print(f"acreate (backoff): caught error: {e}") | ||
# Increment retries | ||
num_retries += 1 | ||
|
||
# Check if max retries has been reached | ||
if num_retries > max_retries: | ||
raise Exception(f"Maximum number of retries ({max_retries}) exceeded.") | ||
|
||
# Increment the delay | ||
delay *= exponential_base * (1 + jitter * random.random()) | ||
|
||
# Sleep for the delay | ||
await asyncio.sleep(delay) | ||
|
||
# Raise exceptions for any errors not specified | ||
except Exception as e: | ||
raise e | ||
|
||
return wrapper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new asynchronous retry function aretry_with_exponential_backoff
seems correctly implemented, mirroring the logic of the synchronous version but using async
and await
keywords. However, the exception handling prints an error message for caught RateLimitError
but does not log or handle other exceptions, which could be inconsistent with the synchronous version's behavior. Consider adding logging for all exceptions or re-raising them to maintain consistency with the synchronous version.
- except errors as e:
+ except errors as rate_limit_error:
- print(f"acreate (backoff): caught error: {e}")
+ logging.error(f"Retrying due to rate limit: {rate_limit_error}")
- except Exception as e:
+ except Exception as general_error:
- raise e
+ logging.error(f"An unexpected error occurred: {general_error}")
+ raise
Committable suggestion
❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
def aretry_with_exponential_backoff( | |
func, | |
initial_delay: float = 1, | |
exponential_base: float = 2, | |
jitter: bool = True, | |
max_retries: int = 20, | |
errors: tuple = (openai.error.RateLimitError,), | |
): | |
"""Retry a function with exponential backoff.""" | |
async def wrapper(*args, **kwargs): | |
# Initialize variables | |
num_retries = 0 | |
delay = initial_delay | |
# Loop until a successful response or max_retries is hit or an exception is raised | |
while True: | |
try: | |
return await func(*args, **kwargs) | |
# Retry on specified errors | |
except errors as e: | |
print(f"acreate (backoff): caught error: {e}") | |
# Increment retries | |
num_retries += 1 | |
# Check if max retries has been reached | |
if num_retries > max_retries: | |
raise Exception(f"Maximum number of retries ({max_retries}) exceeded.") | |
# Increment the delay | |
delay *= exponential_base * (1 + jitter * random.random()) | |
# Sleep for the delay | |
await asyncio.sleep(delay) | |
# Raise exceptions for any errors not specified | |
except Exception as e: | |
raise e | |
return wrapper | |
def aretry_with_exponential_backoff( | |
func, | |
initial_delay: float = 1, | |
exponential_base: float = 2, | |
jitter: bool = True, | |
max_retries: int = 20, | |
errors: tuple = (openai.error.RateLimitError,), | |
): | |
"""Retry a function with exponential backoff.""" | |
async def wrapper(*args, **kwargs): | |
# Initialize variables | |
num_retries = 0 | |
delay = initial_delay | |
# Loop until a successful response or max_retries is hit or an exception is raised | |
while True: | |
try: | |
return await func(*args, **kwargs) | |
# Retry on specified errors | |
except errors as rate_limit_error: | |
logging.error(f"Retrying due to rate limit: {rate_limit_error}") | |
# Increment retries | |
num_retries += 1 | |
# Check if max retries has been reached | |
if num_retries > max_retries: | |
raise Exception(f"Maximum number of retries ({max_retries}) exceeded.") | |
# Increment the delay | |
delay *= exponential_base * (1 + jitter * random.random()) | |
# Sleep for the delay | |
await asyncio.sleep(delay) | |
# Raise exceptions for any errors not specified | |
except Exception as general_error: | |
logging.error(f"An unexpected error occurred: {general_error}") | |
raise | |
return wrapper |
@aretry_with_exponential_backoff | ||
async def acreate_embedding_with_backoff(**kwargs): | ||
"""Wrapper around Embedding.acreate w/ backoff""" | ||
return await openai.Embedding.acreate(**kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 NOTE
Codebase verification is a beta feature.
The verification process has identified that the asynchronous function acreate_embedding_with_backoff
is used correctly with the await
keyword in at least one instance within the codebase. However, there is also a synchronous call to create_embedding_with_backoff
present, which may indicate that not all parts of the code have been updated to use the new asynchronous function. It is recommended to verify whether the synchronous call is intentional or if it requires updating to the asynchronous pattern to maintain consistency and prevent potential issues.
- The asynchronous call is found at
./cognitive_architecture/openai_tools.py:123
. - The synchronous call is found at
./cognitive_architecture/openai_tools.py:135
.
Analysis chain
The function acreate_embedding_with_backoff
is correctly implemented as an asynchronous function with the retry decorator. Ensure that all calls to this function are awaited and that the function's introduction does not break any existing code that relies on the synchronous version.
Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for calls to `acreate_embedding_with_backoff` to ensure they are awaited.
ast-grep --lang python --pattern $'await acreate_embedding_with_backoff($$$)'
Length of output: 202
Script:
#!/bin/bash
# Search for all instances of `acreate_embedding_with_backoff` to ensure they are all awaited.
ast-grep --lang python --pattern $'await acreate_embedding_with_backoff($$$)'
Length of output: 202
Script:
#!/bin/bash
# Search for any calls to a synchronous version of the function, which might be named without the 'a' prefix.
ast-grep --lang python --pattern $'create_embedding_with_backoff($$$)'
Length of output: 188
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 13
Configuration used: CodeRabbit UI
Files ignored due to filter (2)
- poetry.lock
- pyproject.toml
Files selected for processing (5)
- .env.template (1 hunks)
- cognitive_architecture/database/postgres/database_crud.py (1 hunks)
- cognitive_architecture/database/vectordb/vectordb.py (5 hunks)
- cognitive_architecture/vectorstore_manager.py (1 hunks)
- main.py (6 hunks)
Files skipped from review due to trivial changes (2)
- .env.template
- cognitive_architecture/vectorstore_manager.py
Additional comments: 9
cognitive_architecture/database/postgres/database_crud.py (1)
- 54-54: Explicit commit inside the
update_entity
function is correct and aligns with the async nature of the function.cognitive_architecture/database/vectordb/vectordb.py (5)
154-154: The addition of
params['user_id']
ensures that the user ID is included in the parameters for memory operations, which is likely important for associating memories with the correct user.176-176: Incrementing
params['chunk_order']
within the loop correctly assigns an order to each chunk, which is important for maintaining the sequence of the data.258-289: The addition of a new
search_type == 'summary'
condition with a specific filter object is a significant change. Ensure that this new search type and its logic are documented and tested to verify that it behaves as expected.314-315: The return statement for the
fetch_memories
method is correctly placed after all conditional logic, ensuring that the method returns the query output regardless of thesearch_type
.320-321: The check for
namespace
beingNone
and the subsequent assignment toself.namespace
is consistent with the rest of the methods in the class, ensuring that the default namespace is used when none is provided.main.py (3)
8-8: The addition of
update_entity_graph_summary
to imports aligns with the PR objectives and summary.180-209: The signature of the
add_documents_to_graph_db
function has been updated by removing thestupid_local_testing_flag
parameter, which aligns with the PR objectives and summary.332-332: The update operation for
DocsModel
in themain
function aligns with the PR objectives and summary.
def update_entity_graph_summary(session, model, entity_id, new_value): | ||
with session_scope(session) as s: | ||
# Retrieve the entity from the database | ||
entity = s.query(model).filter_by(id=entity_id).first() | ||
if entity: | ||
entity.graph_summary = new_value | ||
s.commit() | ||
return "Successfully updated entity" | ||
else: | ||
return "Entity not found" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The update_entity_graph_summary
function is synchronous, which is inconsistent with the rest of the database functions that are designed to be asynchronous. Consider refactoring this function to be asynchronous for consistency and to avoid potential integration issues with other parts of the system.
- def update_entity_graph_summary(session, model, entity_id, new_value):
+ async def update_entity_graph_summary(session, model, entity_id, new_value):
- with session_scope(session) as s:
+ async with session_scope(session) as s:
- s.commit()
+ await s.commit()
Committable suggestion
❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
def update_entity_graph_summary(session, model, entity_id, new_value): | |
with session_scope(session) as s: | |
# Retrieve the entity from the database | |
entity = s.query(model).filter_by(id=entity_id).first() | |
if entity: | |
entity.graph_summary = new_value | |
s.commit() | |
return "Successfully updated entity" | |
else: | |
return "Entity not found" | |
async def update_entity_graph_summary(session, model, entity_id, new_value): | |
async with session_scope(session) as s: | |
# Retrieve the entity from the database | |
entity = s.query(model).filter_by(id=entity_id).first() | |
if entity: | |
entity.graph_summary = new_value | |
await s.commit() | |
return "Successfully updated entity" | |
else: | |
return "Entity not found" |
@@ -34,16 +34,24 @@ async def add_entity(session, entity): | |||
return "Successfully added entity" | |||
|
|||
|
|||
|
|||
def update_entity_graph_summary(session, model, entity_id, new_value): | |||
with session_scope(session) as s: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The session_scope
context manager is asynchronous and should be used with async with
, not with
. This will cause a runtime error as it stands.
- with session_scope(session) as s:
+ async with session_scope(session) as s:
Committable suggestion
❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
with session_scope(session) as s: | |
async with session_scope(session) as s: |
entity = s.query(model).filter_by(id=entity_id).first() | ||
if entity: | ||
entity.graph_summary = new_value | ||
s.commit() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The session.commit()
call inside update_entity_graph_summary
is redundant because the session_scope
context manager already takes care of committing the session. This line should be removed to avoid confusion and redundancy.
- s.commit()
Committable suggestion
❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
s.commit() |
# neo4j_graph_db.update_document_node_with_namespace(user_id, vectordb_namespace=memory_name, document_id=doc_id) | ||
# else: | ||
try: | ||
logging.info("Docs are", docs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect logging statement syntax; it will cause a runtime error. Use proper string formatting.
- logging.info("Docs are", docs)
+ logging.info("Docs are %s", docs)
Committable suggestion
❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
logging.info("Docs are", docs) | |
logging.info("Docs are %s", docs) |
main.py
Outdated
try: | ||
logging.info("Docs are", docs) | ||
for doc, memory_name in zip(docs, memory_names): | ||
logging.info("Memory names are", memory_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect logging statement syntax; it will cause a runtime error. Use proper string formatting.
- logging.info("Memory names are", memory_name)
+ logging.info("Memory names are %s", memory_name)
Committable suggestion
❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
logging.info("Memory names are", memory_name) | |
logging.info("Memory names are %s", memory_name) |
main.py
Outdated
await update_entity(session, DocsModel, doc_id, True) | ||
# except: | ||
# pass | ||
logging.info("Classification content is", classification_content) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect logging statement syntax; it will cause a runtime error. Use proper string formatting.
- logging.info("Classification content is", classification_content)
+ logging.info("Classification content is %s", classification_content)
Committable suggestion
❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
logging.info("Classification content is", classification_content) | |
logging.info("Classification content is %s", classification_content) |
doc_name, doc_id = doc | ||
classification = await classify_documents(doc_name, document_id =doc_id, content=concatenated_retrievals) | ||
|
||
logging.info("Classification is", str(classification)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect logging statement syntax; it will cause a runtime error. Use proper string formatting.
- logging.info("Classification is", str(classification))
+ logging.info("Classification is %s", str(classification))
Committable suggestion
❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
logging.info("Classification is", str(classification)) | |
logging.info("Classification is %s", str(classification)) |
main.py
Outdated
neo4j_graph_db.query(rs, classification) | ||
neo4j_graph_db.update_document_node_with_namespace(user_id, vectordb_namespace=memory_name, | ||
document_id=doc_id) | ||
logging.info("hERE IS THE OUT2323", doc_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect logging statement syntax; it will cause a runtime error. Use proper string formatting.
- logging.info("hERE IS THE OUT2323", doc_id)
+ logging.info("hERE IS THE OUT2323 %s", doc_id)
Committable suggestion
❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
logging.info("hERE IS THE OUT2323", doc_id) | |
logging.info("hERE IS THE OUT2323 %s", doc_id) |
main.py
Outdated
await update_entity(session, DocsModel, "8cd9a022-5a7a-4af5-815a-f988415536ae", True) | ||
# out = await get_vectordb_namespace(session, user_id) | ||
params = { | ||
"version": "1.0", | ||
"agreement_id": "AG123456", | ||
"privacy_policy": "https://example.com/privacy", | ||
"terms_of_service": "https://example.com/terms", | ||
"format": "json", | ||
"schema_version": "1.1", | ||
"checksum": "a1b2c3d4e5f6", | ||
"owner": "John Doe", | ||
"license": "MIT", | ||
"validity_start": "2023-08-01", | ||
"validity_end": "2024-07-31", | ||
} | ||
loader_settings = { | ||
"format": "PDF", | ||
"source": "DEVICE", | ||
"path": [".data"], | ||
"strategy": "SUMMARY", | ||
} | ||
await load_documents_to_vectorstore(session, user_id, loader_settings=loader_settings) | ||
await user_query_to_graph_db(session, user_id, "I walked in the forest yesterday and added to my list I need to buy some milk in the store and get a summary from a classical book i read yesterday") | ||
await add_documents_to_graph_db(session, user_id, loader_settings=loader_settings) | ||
await user_context_enrichment(session, user_id, query="Tell me about the book I read yesterday") | ||
# params = { | ||
# "version": "1.0", | ||
# "agreement_id": "AG123456", | ||
# "privacy_policy": "https://example.com/privacy", | ||
# "terms_of_service": "https://example.com/terms", | ||
# "format": "json", | ||
# "schema_version": "1.1", | ||
# "checksum": "a1b2c3d4e5f6", | ||
# "owner": "John Doe", | ||
# "license": "MIT", | ||
# "validity_start": "2023-08-01", | ||
# "validity_end": "2024-07-31", | ||
# } | ||
# loader_settings = { | ||
# "format": "PDF", | ||
# "source": "DEVICE", | ||
# "path": [".data"], | ||
# "strategy": "SUMMARY", | ||
# } | ||
# await load_documents_to_vectorstore(session, user_id, loader_settings=loader_settings) | ||
# await user_query_to_graph_db(session, user_id, "I walked in the forest yesterday and added to my list I need to buy some milk in the store and get a summary from a classical book i read yesterday") | ||
# await add_documents_to_graph_db(session, user_id, loader_settings=loader_settings) | ||
# await user_context_enrichment(session, user_id, query="Tell me about the book I read yesterday") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented-out code should be removed or clarified with a comment explaining why it's retained.
search_type = 'hybrid' | ||
logging.info("The search type is s%", search_type) | ||
|
||
if search_type == 'summary': | ||
from weaviate.classes import Filter | ||
client = weaviate.connect_to_wcs( | ||
cluster_url=config.weaviate_url, | ||
auth_credentials=weaviate.AuthApiKey(config.weaviate_api_key) | ||
) | ||
|
||
summary_collection = client.collections.get(self.namespace) | ||
response = summary_collection.query.fetch_objects( | ||
filters=Filter("user_id").equal(self.user_id) & | ||
Filter("chunk_order").less_than(25), | ||
limit=15 | ||
) | ||
return response | ||
|
||
if not namespace: | ||
namespace = self.namespace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check for search_type
being None
and the subsequent assignment to 'hybrid'
should be moved outside of the fetch_memories
method to ensure that the default value is set before the method is called. This will prevent the method from being called with an undefined search_type
.
- if search_type is None:
- search_type = 'hybrid'
+ # Default search_type should be set before calling fetch_memories
Committable suggestion
❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
search_type = 'hybrid' | |
logging.info("The search type is s%", search_type) | |
if search_type == 'summary': | |
from weaviate.classes import Filter | |
client = weaviate.connect_to_wcs( | |
cluster_url=config.weaviate_url, | |
auth_credentials=weaviate.AuthApiKey(config.weaviate_api_key) | |
) | |
summary_collection = client.collections.get(self.namespace) | |
response = summary_collection.query.fetch_objects( | |
filters=Filter("user_id").equal(self.user_id) & | |
Filter("chunk_order").less_than(25), | |
limit=15 | |
) | |
return response | |
if not namespace: | |
namespace = self.namespace | |
search_type = 'hybrid' | |
logging.info("The search type is s%", search_type) | |
# Default search_type should be set before calling fetch_memories | |
if not namespace: | |
namespace = self.namespace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 6
Configuration used: CodeRabbit UI
Files selected for processing (7)
- .env.template (1 hunks)
- cognitive_architecture/config.py (1 hunks)
- cognitive_architecture/database/create_database.py (2 hunks)
- cognitive_architecture/database/postgres/database.py (1 hunks)
- cognitive_architecture/database/vectordb/vectordb.py (5 hunks)
- entrypoint.sh (1 hunks)
- main.py (7 hunks)
Files skipped from review due to trivial changes (1)
- .env.template
Additional comments: 8
entrypoint.sh (1)
- 1-6: The addition of the
ENV
environment variable should be verified to ensure it's utilized within the Python scripts or the Gunicorn service that this script is starting. IfENV
is not used, consider removing it to avoid unnecessary environment variables.cognitive_architecture/config.py (1)
- 34-53: The conditional assignments for configuration parameters based on environment variables are correctly implemented. Ensure that the environment variables used here are documented and that their values are set in the environment where the application runs.
cognitive_architecture/database/vectordb/vectordb.py (3)
154-154: The addition of
user_id
toparams
within theadd_memories
method should be verified to ensure that it aligns with the method's expected behavior and that theuser_id
is indeed required for the operation.172-177: The change in
chunk_size
to 300 should be verified to ensure it aligns with the expected behavior of thechunk_data
function and that it does not negatively impact the performance or accuracy of the memory addition process.258-289: The addition of the 'summary' search type in the
fetch_memories
method should be verified to ensure that it aligns with the expected behavior and that the logic for fetching summaries is correctly implemented.main.py (3)
8-8: The addition of
update_entity_graph_summary
to the import statement should be verified to ensure that it is used within the file and that the function is correctly implemented where it is called.180-209: The modifications in the
add_documents_to_graph_db
function, including the addition of a try-except block and changes in the classification process, should be verified to ensure that they are correctly implemented and that they align with the expected behavior of the function.332-332: The addition of the
update_entity
call within themain
function should be verified to ensure that it is correctly implemented and that it aligns with the expected behavior of the function.
from cognitive_architecture.config import Config | ||
config = Config() | ||
config.load() | ||
|
||
|
||
# in seconds | ||
MAX_RETRIES = 3 | ||
RETRY_DELAY = 5 | ||
|
||
username = os.getenv('POSTGRES_USER') | ||
password = os.getenv('POSTGRES_PASSWORD') | ||
database_name = os.getenv('POSTGRES_DB') | ||
import os | ||
|
||
environment = os.environ.get("ENVIRONMENT") | ||
|
||
if environment == "local": | ||
host= os.getenv('POSTGRES_HOST') | ||
host = config.postgres_host | ||
username = config.postgres_user | ||
password = config.postgres_password | ||
database_name = config.postgres_db |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The refactoring to use a Config
object for database configuration parameters is a good practice for maintainability and centralization of configuration. Ensure that all instances in the codebase that previously accessed these environment variables directly are updated to use the Config
object.
from cognitive_architecture.config import Config | ||
config = Config() | ||
config.load() | ||
|
||
|
||
|
||
from postgres.database import Base |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of the Config
object to load and assign database connection parameters is a good practice for maintainability and centralization of configuration. Ensure that all instances in the codebase that previously accessed these environment variables directly are updated to use the Config
object.
# | ||
# username = os.getenv('POSTGRES_USER') | ||
# password = os.getenv('POSTGRES_PASSWORD') | ||
# database_name = os.getenv('POSTGRES_DB') | ||
# environment = os.environ.get("ENV") | ||
host = config.postgres_host | ||
username = config.postgres_user | ||
password = config.postgres_password | ||
database_name = config.postgres_db | ||
|
||
engine = create_admin_engine(username, password, host, database_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure that the create_admin_engine
function is updated to use the new config
object for database connection parameters, as the old environment variable access has been replaced.
chunk_overlap=20)] | ||
for doc in documents[0]: | ||
chunk_count += 1 | ||
params['chunk_order'] = chunk_count | ||
|
||
# document_to_load = self._stuct(observation, params, metadata_schema_class) | ||
|
||
logging.info("Loading document with defautl loader settings %s", str(doc)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a typo in the log message: "defautl" should be corrected to "default".
- logging.info("Loading document with defautl loader settings %s", str(doc))
+ logging.info("Loading document with default loader settings %s", str(doc))
Committable suggestion
❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
logging.info("Loading document with defautl loader settings %s", str(doc)) | |
logging.info("Loading document with default loader settings %s", str(doc)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files selected for processing (2)
- cognitive_architecture/database/create_database.py (2 hunks)
- cognitive_architecture/database/postgres/database.py (1 hunks)
Additional comments: 7
cognitive_architecture/database/postgres/database.py (4)
18-20: The introduction of the
Config
object and the call toconfig.load()
are good practices for centralizing configuration management.29-32: The assignment of database connection parameters from the
Config
object is correct and improves maintainability by centralizing configuration settings.37-37: Verify that the
SQLALCHEMY_DATABASE_URL
is correctly formatted and that theConfig
object is loaded before this line to ensure database connectivity.36-37: The retry mechanism in
safe_db_operation
appears to be correctly implemented, using theMAX_RETRIES
andRETRY_DELAY
constants.cognitive_architecture/database/create_database.py (3)
12-14: The introduction of the
Config
object and the call toconfig.load()
are good practices for centralizing configuration management.66-69: The assignment of database connection parameters from the
Config
object is correct and improves maintainability by centralizing configuration settings.61-71: Ensure that the
create_admin_engine
,database_exists
,create_database
, andcreate_tables
functions are updated to use the newConfig
object for database connection parameters, as the old environment variable access has been replaced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files selected for processing (1)
- cognitive_architecture/database/postgres/database.py (1 hunks)
Files skipped from review as they are similar to previous changes (1)
- cognitive_architecture/database/postgres/database.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files selected for processing (1)
- cognitive_architecture/database/postgres/database.py (1 hunks)
Files skipped from review due to trivial changes (1)
- cognitive_architecture/database/postgres/database.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files selected for processing (1)
- cognitive_architecture/database/create_database.py (3 hunks)
Files skipped from review as they are similar to previous changes (1)
- cognitive_architecture/database/create_database.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 2
Configuration used: CodeRabbit UI
Files selected for processing (1)
- api.py (2 hunks)
Additional comments: 1
api.py (1)
- 157-158: The exception handling in the
user_query_classfier
function has been changed to raise anHTTPException
directly instead of returning aJSONResponse
. This is a positive change as it leverages FastAPI's built-in exception handling to return the appropriate response.
api.py
Outdated
@app.post("/drop-db") | ||
async def drop_db(payload: Payload): | ||
try: | ||
decoded_payload = payload.payload | ||
return JSONResponse(content={"response": "dropped"}, status_code=200) | ||
|
||
except Exception as e: | ||
return HTTPException(status_code=500, detail=str(e)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 NOTE
Codebase verification is a beta feature.
The drop_db
function in api.py
does not appear to have any active authorization checks associated with it. The references to authentication are commented out, indicating that the function could be executed without any security checks, which is a serious security concern.
- The
drop_db
function lacks active authorization checks, potentially allowing unauthorized database operations.
Analysis chain
The drop_db
function currently returns a static JSONResponse with the message "dropped" and does not perform any action. This could be a placeholder, but it's important to ensure that the actual database drop logic is implemented securely and with proper authorization checks to prevent accidental or malicious data loss.
Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Verify if there are any authorization checks for sensitive operations like dropping the database.
rg --type py 'drop_db'
Length of output: 66
Script:
#!/bin/bash
# Search for any references to authorization or authentication in the api.py file.
rg --type py 'auth|Auth|authorize|Authorize' api.py
Length of output: 154
except Exception as e: | ||
return HTTPException(status_code=500, detail=str(e)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding more detailed error handling for the drop_db
function. Returning a generic 500 error for all exceptions may not be informative for the client. It would be beneficial to distinguish between different types of errors and handle them accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 2
Configuration used: CodeRabbit UI
Files selected for processing (1)
- cognitive_architecture/config.py (1 hunks)
Additional comments: 3
cognitive_architecture/config.py (3)
34-37: The environment variable
GRAPH_DB_PW_PROD
is used for the production password, but there is no equivalent for the development password. Verify if this is intentional or an oversight.57-57: The
load
method overrides class attributes with environment variables and then with config file values. Ensure that the order of precedence (environment variables over config file) is intentional and documented.57-57: The
to_dict
andfrom_dict
methods are correctly implemented for converting the Config object to and from a dictionary representation. This is good for interoperability with other parts of the system that may require configuration data in dictionary format.
cognitive_architecture/config.py
Outdated
if os.getenv('ENV') == 'prod' or os.getenv('ENV') == 'dev' or os.getenv('AWS_ENV') == 'dev' or os.getenv('AWS_ENV') == 'prd': | ||
graph_database_url: str = os.getenv('GRAPH_DB_URL_PROD') | ||
graph_database_username: str = os.getenv('GRAPH_DB_USER') | ||
graph_database_password: str = os.getenv('GRAPH_DB_PW_PROD') | ||
else: | ||
graph_database_url: str = os.getenv('GRAPH_DB_URL') | ||
graph_database_username: str = os.getenv('GRAPH_DB_USER') | ||
graph_database_password: str = os.getenv('GRAPH_DB_PW') | ||
weaviate_url: str = os.getenv('WEAVIATE_URL') | ||
weaviate_api_key: str = os.getenv('WEAVIATE_API_KEY') | ||
postgres_user: str = os.getenv('POSTGRES_USER') | ||
postgres_password: str = os.getenv('POSTGRES_PASSWORD') | ||
postgres_db: str = os.getenv('POSTGRES_DB') | ||
if os.getenv('ENV') == 'prod' or os.getenv('ENV') == 'dev' or os.getenv('AWS_ENV') == 'dev' or os.getenv('AWS_ENV') == 'prd': | ||
postgres_host: str = os.getenv('POSTGRES_PROD_HOST') | ||
elif os.getenv('ENV') == 'docker': | ||
postgres_host: str = os.getenv('POSTGRES_HOST_DOCKER') | ||
elif os.getenv('ENV') == 'local': | ||
postgres_host: str = os.getenv('POSTGRES_HOST_LOCAL') | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The conditional logic for setting database parameters based on environment variables is complex and may be error-prone. Consider refactoring to improve readability and maintainability.
- if os.getenv('ENV') == 'prod' or os.getenv('ENV') == 'dev' or os.getenv('AWS_ENV') == 'dev' or os.getenv('AWS_ENV') == 'prd':
+ env = os.getenv('ENV')
+ aws_env = os.getenv('AWS_ENV')
+ is_prod_or_dev = env in ['prod', 'dev'] or aws_env in ['dev', 'prd']
+ if is_prod_or_dev:
Committable suggestion
❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
if os.getenv('ENV') == 'prod' or os.getenv('ENV') == 'dev' or os.getenv('AWS_ENV') == 'dev' or os.getenv('AWS_ENV') == 'prd': | |
graph_database_url: str = os.getenv('GRAPH_DB_URL_PROD') | |
graph_database_username: str = os.getenv('GRAPH_DB_USER') | |
graph_database_password: str = os.getenv('GRAPH_DB_PW_PROD') | |
else: | |
graph_database_url: str = os.getenv('GRAPH_DB_URL') | |
graph_database_username: str = os.getenv('GRAPH_DB_USER') | |
graph_database_password: str = os.getenv('GRAPH_DB_PW') | |
weaviate_url: str = os.getenv('WEAVIATE_URL') | |
weaviate_api_key: str = os.getenv('WEAVIATE_API_KEY') | |
postgres_user: str = os.getenv('POSTGRES_USER') | |
postgres_password: str = os.getenv('POSTGRES_PASSWORD') | |
postgres_db: str = os.getenv('POSTGRES_DB') | |
if os.getenv('ENV') == 'prod' or os.getenv('ENV') == 'dev' or os.getenv('AWS_ENV') == 'dev' or os.getenv('AWS_ENV') == 'prd': | |
postgres_host: str = os.getenv('POSTGRES_PROD_HOST') | |
elif os.getenv('ENV') == 'docker': | |
postgres_host: str = os.getenv('POSTGRES_HOST_DOCKER') | |
elif os.getenv('ENV') == 'local': | |
postgres_host: str = os.getenv('POSTGRES_HOST_LOCAL') | |
env = os.getenv('ENV') | |
aws_env = os.getenv('AWS_ENV') | |
is_prod_or_dev = env in ['prod', 'dev'] or aws_env in ['dev', 'prd'] | |
if is_prod_or_dev: | |
graph_database_url: str = os.getenv('GRAPH_DB_URL_PROD') | |
graph_database_username: str = os.getenv('GRAPH_DB_USER') | |
graph_database_password: str = os.getenv('GRAPH_DB_PW_PROD') | |
else: | |
graph_database_url: str = os.getenv('GRAPH_DB_URL') | |
graph_database_username: str = os.getenv('GRAPH_DB_USER') | |
graph_database_password: str = os.getenv('GRAPH_DB_PW') | |
weaviate_url: str = os.getenv('WEAVIATE_URL') | |
weaviate_api_key: str = os.getenv('WEAVIATE_API_KEY') | |
postgres_user: str = os.getenv('POSTGRES_USER') | |
postgres_password: str = os.getenv('POSTGRES_PASSWORD') | |
postgres_db: str = os.getenv('POSTGRES_DB') | |
if is_prod_or_dev: | |
postgres_host: str = os.getenv('POSTGRES_PROD_HOST') | |
elif env == 'docker': | |
postgres_host: str = os.getenv('POSTGRES_HOST_DOCKER') | |
elif env == 'local': | |
postgres_host: str = os.getenv('POSTGRES_HOST_LOCAL') |
elif os.getenv('ENV') == 'local': | ||
postgres_host: str = os.getenv('POSTGRES_HOST_LOCAL') | ||
|
||
|
||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The save
method splits attribute names on the first underscore to determine the section name for the config file. This could lead to unexpected behavior if attribute names contain more than one underscore. Consider a more robust approach to determine section names.
…refactor and reformat issues, add boilerplate for loading jobs logic and extend search functionalities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 21
Configuration used: CodeRabbit UI
Files ignored due to filter (2)
- poetry.lock
- pyproject.toml
Files selected for processing (21)
- .env.template (1 hunks)
- Dockerfile (1 hunks)
- api.py (1 hunks)
- cognitive_architecture/classifiers/classifier.py (1 hunks)
- cognitive_architecture/database/create_database.py (3 hunks)
- cognitive_architecture/database/graph_database/graph.py (14 hunks)
- cognitive_architecture/database/postgres/models/docs.py (1 hunks)
- cognitive_architecture/database/postgres/models/memory.py (1 hunks)
- cognitive_architecture/database/vectordb/chunkers/chunkers.py (2 hunks)
- cognitive_architecture/database/vectordb/cognee_manager.py (1 hunks)
- cognitive_architecture/database/vectordb/embeddings/embeddings.py (1 hunks)
- cognitive_architecture/database/vectordb/embeddings/embeddings_type.py (1 hunks)
- cognitive_architecture/database/vectordb/job.py (1 hunks)
- cognitive_architecture/database/vectordb/loaders/loaders.py (1 hunks)
- cognitive_architecture/database/vectordb/response.py (1 hunks)
- cognitive_architecture/database/vectordb/vector_db_type.py (1 hunks)
- cognitive_architecture/database/vectordb/vectordb.py (5 hunks)
- cognitive_architecture/shared/language_processing.py (1 hunks)
- cognitive_architecture/utils.py (3 hunks)
- cognitive_architecture/vectorstore_manager.py (3 hunks)
- main.py (1 hunks)
Files skipped from review due to trivial changes (1)
- Dockerfile
Additional comments: 56
cognitive_architecture/database/vectordb/embeddings/embeddings_type.py (1)
- 1-8: The
EmbeddingsType
enum is well-defined and covers a variety of embedding sources.cognitive_architecture/database/vectordb/vector_db_type.py (1)
- 1-13: The
VectorDBType
enum is well-defined and covers a variety of vector database types.cognitive_architecture/database/vectordb/job.py (1)
- 1-18: The
Job
class is well-defined with clear string representation methods.cognitive_architecture/database/postgres/models/memory.py (1)
- 12-18: The addition of the
memory_category
field to theMemoryModel
is correctly implemented. Ensure that corresponding database migration scripts are in place to handle this schema change.cognitive_architecture/database/vectordb/embeddings/embeddings.py (1)
- 5-29: The
Embeddings
class is updated with new parameters and aserialize
method. Ensure that the serialization logic correctly handles all new parameters and that theserialize
method is tested for various embedding configurations.cognitive_architecture/database/vectordb/response.py (1)
- 4-72: The
Response
class is updated with new fields and class methods for JSON conversion. Ensure that the JSON conversion logic correctly handles all new fields and that the class methods are tested for various response scenarios.cognitive_architecture/shared/language_processing.py (1)
- 50-81: The
translate_text
function uses AWS Translate and has error handling forBotoCoreError
andClientError
. Ensure that the error messages are logged and that the function is tested for various error scenarios.cognitive_architecture/database/create_database.py (2)
27-44: The environment variable handling has been updated to use
AWS_ENV
. Ensure that this change is reflected throughout the codebase and that theAWS_ENV
variable is set in all required environments.74-83: The
drop_database
function has been added to handle database deletion. Ensure that this function is protected against accidental execution in production environments and that it is tested thoroughly.+ # Add a safety check to prevent accidental drops in production + if os.environ.get('AWS_ENV') != 'local': + raise RuntimeError("Database dropping is not allowed in non-local environments")cognitive_architecture/database/vectordb/chunkers/chunkers.py (1)
- 1-1: The import path for
ChunkStrategy
has been updated. Verify that this new path is correct and that theChunkStrategy
is accessible from this new location.Verification successful
The shell script output confirms the presence of the
ChunkStrategy
class within thechunk_strategy.py
file. This verifies that the updated import path inchunkers.py
is correct and that theChunkStrategy
class is indeed accessible from the new location.
- : The import path for
ChunkStrategy
inchunkers.py
is correct and the class is accessible.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Verify that the ChunkStrategy import is correct and accessible rg --type py 'from cognitive_architecture.database.vectordb.chunkers.chunk_strategy import ChunkStrategy' # Expecting to find the ChunkStrategy definition in the specified pathLength of output: 260
Script:
#!/bin/bash # Check for the presence of the ChunkStrategy class within the chunk_strategy.py file rg --type py 'class ChunkStrategy' cognitive_architecture/database/vectordb/chunkers/chunk_strategy.py # Expecting to find the class definition to confirm the import is correctLength of output: 131
cognitive_architecture/classifiers/classifier.py (1)
- 76-104: The
classify_call
function has been updated with new parameters. Verify that all calls to this function have been updated to pass the correct arguments and that the function's logic correctly handles the newdocument_summaries
parameter.Verification successful
The verification process has confirmed that the calls to the
classify_call
function within the codebase have been updated to use the newdocument_summaries
parameter. The function's logic inclassifier.py
and its invocation inmain.py
are consistent with the changes.
- The
classify_call
function update has been properly handled in the codebase.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Verify that all calls to classify_call have been updated rg --type py 'classify_call' --files-with-matches | xargs rg 'document_summaries' # Expecting to find the updated calls with the new parameterLength of output: 759
cognitive_architecture/database/vectordb/cognee_manager.py (4)
33-42: The
serialize
method correctly converts instance attributes to a dictionary, excluding any attributes withNone
values. This is a clean and efficient way to prepare data for serialization.68-93: The
get_job_statuses
method is well-structured and handles different HTTP status codes appropriately. It uses therequests
library to make a POST request and constructs the request payload and headers correctly.121-142: The
get_job_status
method correctly handles HTTP status codes and uses therequests
library to make a GET request. It constructs the URL and headers appropriately and returns aResponse
object based on the JSON response.144-151: The
generate_headers
function constructs request headers and filters out anyNone
values. Ensure that API keys and other sensitive information are securely managed and not hardcoded or exposed.cognitive_architecture/database/vectordb/loaders/loaders.py (3)
23-25: The
fetch_text_content
function uses theUnstructuredURLLoader
to load text content from a URL. This is a straightforward use of an external loader class.27-46: The
process_content
function chunks data and translates it if necessary. It's important to ensure that the language detection and translation services are secure and efficient. Additionally, consider handling exceptions that may arise from these operations.48-92: The
_document_loader
function is an asynchronous function that handles document loading and processing. It correctly awaits other async functions and raises aValueError
for unsupported formats. Ensure that the error handling is consistent and that all resources are managed correctly.cognitive_architecture/utils.py (2)
10-13: The addition of
AsyncSessionLocal
andload_dotenv
imports followed by the invocation ofload_dotenv()
is a common pattern for loading environment variables. Ensure that.env
files are not included in version control for security reasons.222-229: The modification in the
get_unsumarized_vector_db_namespace
function to returnmemory_details
anddocs
instead of justmemory_names
is a significant change. Ensure that all callers of this function are updated to handle the new return type.api.py (12)
1-17: The imports and initial setup of the FastAPI application look standard and correct. The use of environment variables for configuration is a good practice.
19-23: The logging configuration is set globally, which is a common practice. However, it's important to ensure that this configuration does not override other logging configurations elsewhere in the application.
48-55: The root endpoint is correctly implemented and returns a simple welcome message.
58-63: The health check endpoint is implemented correctly and follows best practices by providing a simple status check.
128-141: The
/user-query-to-graph
endpoint's error handling raises an HTTPException directly, which is consistent with FastAPI's practices.144-162: The
/document-to-graph-db
endpoint is implemented correctly, but ensure that theadd_documents_to_graph_db
function handles exceptions appropriately.164-176: The
/cognitive-context-enrichment
endpoint is implemented correctly, but ensure that theuser_context_enrichment
function handles exceptions appropriately.179-192: The
/user-query-classifier
endpoint is implemented correctly, but ensure that theclassify_user_query
function handles exceptions appropriately.238-266: The
/create-public-memory
endpoint is implemented correctly, but ensure that thecreate_public_memory
function handles exceptions appropriately.269-292: The
/attach-user-to-public-memory
endpoint is implemented correctly, but ensure that theattach_user_to_memory
function handles exceptions appropriately.295-313: The
/unlink-user-from-public-memory
endpoint is implemented correctly, but ensure that theunlink_user_from_memory
function handles exceptions appropriately.315-329: The
start_api_server
function is implemented correctly, and the logging of exceptions is a good practice.cognitive_architecture/database/vectordb/vectordb.py (3)
149-181: The
add_memories
method has been updated with new logic for chunk processing and document loading. Ensure that theOpenAIEmbeddings
and_document_loader
functions are tested with the new logic.201-207: > Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [183-206]
The
fetch_memories
method has been extended to support different search types. The default assignment ofsearch_type
to'hybrid'
should be moved outside the method to ensure that the default value is set before the method is called, as previously commented.
- 255-313: New search types 'summary' and 'summary_filter_by_object_name' have been added. Ensure that these new search types are properly tested and that the filtering logic is correct.
cognitive_architecture/vectorstore_manager.py (3)
173-173: The addition of default values for
job_id
andmemory_label
in thecreate_memory
method signature is a good practice for optional parameters. Ensure that all invocations of this method are updated to reflect these changes if they rely on the previous signature.254-254: The new
memory_category
parameter with a default value of 'PUBLIC' in thehandle_new_memory
method is a sensible addition for categorization purposes. Confirm that this default aligns with the business logic and that all calls to this method are compatible with the new parameter.Verification successful
The search with
rg
has provided us with the locations where thehandle_new_memory
method is called within thevectorstore_manager.py
file. From the output, we can see that the method is called twice, and in both instances, it is called with four parameters:user_id
,session
,job_id
, andmemory_name
. The calls do not explicitly pass the newmemory_category
parameter, which means they rely on the default value of'PUBLIC'
as specified in the method's definition.Given this information, we can conclude the following:
- The
handle_new_memory
method has been updated to include a new parametermemory_category
with a default value of'PUBLIC'
.- The method is called twice in the codebase, and both calls do not specify the
memory_category
parameter, thus defaulting to'PUBLIC'
.- There is no evidence from the provided output that the new parameter is causing any issues with existing calls to the method.
Based on the above points, the addition of the
memory_category
parameter with a default value seems to be correctly handled in the current codebase.* 278-278: Adding a new field `memory_category` to the `MemoryModel` instantiation within the `handle_new_memory` method. Ensure that the database schema has been updated to include this new field and that it is properly indexed if necessary for performance.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Verify that the `handle_new_memory` method is called correctly with the new parameter. ast-grep --lang python --pattern $'handle_new_memory($_, $_, $_, $_, $_)'Length of output: 74
Script:
#!/bin/bash # Search for calls to the `handle_new_memory` method to check if it's being called with the correct number of parameters. ast-grep --lang python --pattern $'handle_new_memory($_, $_, $$$)'Length of output: 67
Script:
#!/bin/bash # Use ripgrep to search for the string "handle_new_memory" to find any calls to the method. rg "handle_new_memory"Length of output: 554
Verification successful
I will generate a new script to verify the inclusion of the
memory_category
field in theMemoryModel
class usingast-grep
with the correct language specification and pattern syntax.
I will generate a new script to search for the
memory_category
field directly within the files that define theMemoryModel
class usingrg
.
The verification process has confirmed the presence of the
memory_category
field within theMemoryModel
class definition in the filecognitive_architecture/database/postgres/models/memory.py
. This aligns with the changes mentioned in the pull request.
- The
memory_category
field is a column in theMemoryModel
with the typeString
and is nullable.Based on this information, the database schema has indeed been updated to include the new
memory_category
field as required by the changes in thevectorstore_manager.py
file.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Verify that the database schema includes the new `memory_category` field. rg --type python 'memory_category' --files-with-matches | xargs cat | ast-grep --lang python --pattern $'class MemoryModel { $$$ memory_category: str $$$ }'Length of output: 531
Script:
#!/bin/bash # Verify that the database schema includes the new `memory_category` field. rg --type py 'class MemoryModel' --files-with-matches | xargs cat | ast-grep --lang python --pattern $'class MemoryModel { $$$ memory_category: str $$$ }'Length of output: 500
Script:
#!/bin/bash # Search for the 'memory_category' field within the 'MemoryModel' class definition. rg --type py 'class MemoryModel' -A 10 | rg 'memory_category'Length of output: 171
main.py (6)
- 1-40: The import section is generally clean, but it's important to ensure that all imported modules and functions are used within the file to avoid unnecessary imports which can lead to confusion and slight performance issues.
Verification successful
The executed scripts have checked the usage of
Optional
andList
from thetyping
module, as well asNeo4jError
from theneo4j.exceptions
module withinmain.py
. The results show that these imports are indeed used inmain.py
.However, the script output also includes matches from other files, which are not relevant to the verification of
main.py
. For the purpose of this task, we are only concerned with the usage of imports inmain.py
.Based on the provided output:
Optional
andList
are used in function signatures and type hints withinmain.py
.Neo4jError
is used in exception handling withinmain.py
.Given this information, the imports of
Optional
,List
, andNeo4jError
are justified and correctly used inmain.py
.* 45-45:Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Verify that all imported modules are used in the file. rg --type python --files-with-matches 'from typing import Optional, List' main.py | xargs -L1 rg 'Optional|List' rg --type python --files-with-matches 'from neo4j.exceptions import Neo4jError' main.py | xargs -L1 rg 'Neo4jError' # ... continue for all importsLength of output: 21390
71-71:
256-256:
278-278:
331-453: The function
user_context_enrichment
is complex and integrates various memory systems. Ensure that the logic is thoroughly tested, especially the dynamic memory class management and the exception handling.cognitive_architecture/database/graph_database/graph.py (12)
8-9: The addition of
AsyncSession
andNeo4jError
imports indicates the use of asynchronous database sessions and improved error handling. Ensure that the rest of the codebase is compatible with these changes.121-122: The
GraphQLQuery
class is a straightforward model for encapsulating GraphQL queries. This is a good use of Pydantic models for type safety and validation.194-212: The
Neo4jGraphDB
class now usesGraphDatabase.driver
for database connections, which is a standard practice. The exception handling in thequery
method has been improved to log errors and re-raise them, which is good for debugging and control flow.277-290: The
cypher_statement_correcting
method uses an AI client to check and correct Cypher query syntax. Ensure that the AI client is properly configured and that this functionality is covered by tests to verify its effectiveness.439-445: The
retrieve_public_memory
method is well-defined for fetching public memory nodes. It's important to ensure that the method is used consistently across the codebase and that the necessary indexes are in place for performance.736-772: The
create_memory_node
method is designed to create or find a memory node with specific labels and a description. It's crucial to validate the input parameters and handle potential database errors as done here.790-803: The
delete_memory_node
method correctly uses a Cypher query to delete a memory node by ID. It's good practice to validate inputs and handle exceptions as implemented.805-834: The
unlink_memory_from_user
method provides functionality to unlink a memory node from a user, which is a necessary feature for managing user data. The input validation and error handling are appropriate.627-705: > Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [604-673]
The
create_document_node_cypher
method generates a Cypher query to create a Document node and link it to a memory node. Ensure that the method is used correctly and that the input parameters are validated before the query is executed.Verification successful
The
create_document_node_cypher
method is used inmain.py
and is defined ingraph.py
. To verify its correct usage, we need to check if the method calls inmain.py
align with the updated method signature and intended functionality.
The method calls to
create_document_node_cypher
inmain.py
match the updated method signature ingraph.py
. The calls correctly pass the required parameters, includingdocument_summary
,user_id
, and conditionallypublic_memory_id
ormemory_type
based on the context.
- The first call passes
classification
,user_id
, andpublic_memory_id
whenmemory_type
is 'PUBLIC'.- The second call passes
classification
,user_id
, and setsmemory_type
to 'SemanticMemory'.- The commented-out call at the bottom suggests a previous or alternative usage where
memory_type
is set to "PUBLIC".Based on the provided context, the usage of
create_document_node_cypher
inmain.py
appears to be consistent with the method's signature and intended functionality.* 675-704: The `update_document_node_with_db_ids` method updates the namespace of a Document node in the database. It's important to ensure that the document ID and namespace are properly sanitized before being included in the Cypher query to prevent injection attacks.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Verify that the `create_document_node_cypher` method is used correctly in the codebase. fd '.*\.py' --exec grep -Hn 'create_document_node_cypher'Length of output: 735
Script:
#!/bin/bash # Check the usage of `create_document_node_cypher` in main.py and ensure it aligns with the updated method signature. cat main.py | grep -A 5 -B 5 'create_document_node_cypher'Length of output: 2126
- 733-1010: > Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [705-734]
The
get_namespaces_by_document_category
method is a useful addition for retrieving Vectordb namespaces for documents of a specified category. Ensure that the method is used consistently and that the category parameter is validated.
- 736-854: The methods for managing memory nodes and their relationships, such as
create_memory_node
,link_user_to_public
,delete_memory_node
, andunlink_memory_from_user
, are well-implemented with proper error handling and input validation.
main.py
Outdated
|
||
async def fetch_document_vectordb_namespace(session: AsyncSession, user_id: str, namespace_id:str, doc_id:str=None): | ||
logging.info("user id is", user_id) | ||
memory = await Memory.create_memory(user_id, session, namespace=namespace_id, memory_label=namespace_id) | ||
|
||
|
||
# Managing memory attributes | ||
existing_user = await Memory.check_existing_user(user_id, session) | ||
print("here is the existing user", existing_user) | ||
await memory.manage_memory_attributes(existing_user) | ||
print("Namespace id is %s", namespace_id) | ||
await memory.add_dynamic_memory_class(namespace_id.lower(), namespace_id) | ||
namespace_class = namespace_id + "_class" | ||
|
||
dynamic_memory_class = getattr(memory, namespace_class.lower(), None) | ||
|
||
methods_to_add = ["add_memories", "fetch_memories", "delete_memories"] | ||
|
||
if dynamic_memory_class is not None: | ||
for method_name in methods_to_add: | ||
await memory.add_method_to_class(dynamic_memory_class, method_name) | ||
print(f"Memory method {method_name} has been added") | ||
else: | ||
print(f"No attribute named in memory.") | ||
|
||
print("Available memory classes:", await memory.list_memory_classes()) | ||
result = await memory.dynamic_method_call(dynamic_memory_class, 'fetch_memories', | ||
observation="placeholder", search_type="summary_filter_by_object_name", params=doc_id) | ||
logging.info("Result is", result) | ||
|
||
return result, namespace_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider replacing print
statements with logging
for better control over log levels and outputs, which is more suitable for production environments.
- print("here is the existing user", existing_user)
+ logging.debug("here is the existing user %s", existing_user)
# ... apply this change to all print statements
Committable suggestion
❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
async def fetch_document_vectordb_namespace(session: AsyncSession, user_id: str, namespace_id:str, doc_id:str=None): | |
logging.info("user id is", user_id) | |
memory = await Memory.create_memory(user_id, session, namespace=namespace_id, memory_label=namespace_id) | |
# Managing memory attributes | |
existing_user = await Memory.check_existing_user(user_id, session) | |
print("here is the existing user", existing_user) | |
await memory.manage_memory_attributes(existing_user) | |
print("Namespace id is %s", namespace_id) | |
await memory.add_dynamic_memory_class(namespace_id.lower(), namespace_id) | |
namespace_class = namespace_id + "_class" | |
dynamic_memory_class = getattr(memory, namespace_class.lower(), None) | |
methods_to_add = ["add_memories", "fetch_memories", "delete_memories"] | |
if dynamic_memory_class is not None: | |
for method_name in methods_to_add: | |
await memory.add_method_to_class(dynamic_memory_class, method_name) | |
print(f"Memory method {method_name} has been added") | |
else: | |
print(f"No attribute named in memory.") | |
print("Available memory classes:", await memory.list_memory_classes()) | |
result = await memory.dynamic_method_call(dynamic_memory_class, 'fetch_memories', | |
observation="placeholder", search_type="summary_filter_by_object_name", params=doc_id) | |
logging.info("Result is", result) | |
return result, namespace_id | |
async def fetch_document_vectordb_namespace(session: AsyncSession, user_id: str, namespace_id:str, doc_id:str=None): | |
logging.info("user id is", user_id) | |
memory = await Memory.create_memory(user_id, session, namespace=namespace_id, memory_label=namespace_id) | |
# Managing memory attributes | |
existing_user = await Memory.check_existing_user(user_id, session) | |
logging.debug("here is the existing user %s", existing_user) | |
await memory.manage_memory_attributes(existing_user) | |
logging.debug("Namespace id is %s", namespace_id) | |
await memory.add_dynamic_memory_class(namespace_id.lower(), namespace_id) | |
namespace_class = namespace_id + "_class" | |
dynamic_memory_class = getattr(memory, namespace_class.lower(), None) | |
methods_to_add = ["add_memories", "fetch_memories", "delete_memories"] | |
if dynamic_memory_class is not None: | |
for method_name in methods_to_add: | |
await memory.add_method_to_class(dynamic_memory_class, method_name) | |
logging.debug(f"Memory method {method_name} has been added") | |
else: | |
logging.debug(f"No attribute named in memory.") | |
logging.debug("Available memory classes: %s", await memory.list_memory_classes()) | |
result = await memory.dynamic_method_call(dynamic_memory_class, 'fetch_memories', | |
observation="placeholder", search_type="summary_filter_by_object_name", params=doc_id) | |
logging.info("Result is %s", result) | |
return result, namespace_id |
@app.post("/add-memory", response_model=dict) | ||
async def add_memory( | ||
payload: Payload, | ||
# files: List[UploadFile] = File(...), | ||
): | ||
try: | ||
logging.info(" Adding to Memory ") | ||
decoded_payload = payload.payload | ||
async with session_scope(session=AsyncSessionLocal()) as session: | ||
from main import load_documents_to_vectorstore | ||
|
||
if 'settings' in decoded_payload and decoded_payload['settings'] is not None: | ||
settings_for_loader = decoded_payload['settings'] | ||
else: | ||
settings_for_loader = None | ||
|
||
if 'content' in decoded_payload and decoded_payload['content'] is not None: | ||
content = decoded_payload['content'] | ||
else: | ||
content = None | ||
|
||
output = await load_documents_to_vectorstore(session, decoded_payload['user_id'], content=content, loader_settings=settings_for_loader) | ||
return JSONResponse(content={"response": output}, status_code=200) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The /add-memory
endpoint is well-structured. However, the error handling could be improved by providing more specific HTTP status codes and messages based on the exception type.
- return JSONResponse(content={"response": {"error": str(e)}}, status_code=503)
+ if isinstance(e, SomeSpecificException):
+ status_code = 400 # Bad Request or other appropriate status code
+ else:
+ status_code = 500 # Internal Server Error
+ return JSONResponse(content={"response": {"error": str(e)}}, status_code=status_code)
Committable suggestion
❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
@app.post("/add-memory", response_model=dict) | |
async def add_memory( | |
payload: Payload, | |
# files: List[UploadFile] = File(...), | |
): | |
try: | |
logging.info(" Adding to Memory ") | |
decoded_payload = payload.payload | |
async with session_scope(session=AsyncSessionLocal()) as session: | |
from main import load_documents_to_vectorstore | |
if 'settings' in decoded_payload and decoded_payload['settings'] is not None: | |
settings_for_loader = decoded_payload['settings'] | |
else: | |
settings_for_loader = None | |
if 'content' in decoded_payload and decoded_payload['content'] is not None: | |
content = decoded_payload['content'] | |
else: | |
content = None | |
output = await load_documents_to_vectorstore(session, decoded_payload['user_id'], content=content, loader_settings=settings_for_loader) | |
return JSONResponse(content={"response": output}, status_code=200) | |
@app.post("/add-memory", response_model=dict) | |
async def add_memory( | |
payload: Payload, | |
# files: List[UploadFile] = File(...), | |
): | |
try: | |
logging.info(" Adding to Memory ") | |
decoded_payload = payload.payload | |
async with session_scope(session=AsyncSessionLocal()) as session: | |
from main import load_documents_to_vectorstore | |
if 'settings' in decoded_payload and decoded_payload['settings'] is not None: | |
settings_for_loader = decoded_payload['settings'] | |
else: | |
settings_for_loader = None | |
if 'content' in decoded_payload and decoded_payload['content'] is not None: | |
content = decoded_payload['content'] | |
else: | |
content = None | |
output = await load_documents_to_vectorstore(session, decoded_payload['user_id'], content=content, loader_settings=settings_for_loader) | |
return JSONResponse(content={"response": output}, status_code=200) | |
except Exception as e: | |
if isinstance(e, SomeSpecificException): | |
status_code = 400 # Bad Request or other appropriate status code | |
else: | |
status_code = 500 # Internal Server Error | |
return JSONResponse(content={"response": {"error": str(e)}}, status_code=status_code) |
class CogneeManager: | ||
def __init__(self, embeddings: Embeddings = None, | ||
vector_db: VectorDB = None, | ||
vector_db_key: str = None, | ||
embedding_api_key: str = None, | ||
webhook_url: str = None, | ||
lines_per_batch: int = 1000, | ||
webhook_key: str = None, | ||
document_id: str = None, | ||
chunk_validation_url: str = None, | ||
internal_api_key: str = "test123", | ||
base_url="http://localhost:8000"): | ||
self.embeddings = embeddings if embeddings else Embeddings() | ||
self.vector_db = vector_db if vector_db else VectorDB() | ||
self.webhook_url = webhook_url | ||
self.lines_per_batch = lines_per_batch | ||
self.webhook_key = webhook_key | ||
self.document_id = document_id | ||
self.chunk_validation_url = chunk_validation_url | ||
self.vector_db_key = vector_db_key | ||
self.embeddings_api_key = embedding_api_key | ||
self.internal_api_key = internal_api_key | ||
self.base_url = base_url |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The constructor of CogneeManager
class initializes several instance variables. It's important to ensure that the default values provided, such as internal_api_key
and base_url
, are appropriate and secure. The use of a default API key ("test123"
) in production code is a security risk and should be replaced with a secure method of configuration.
- internal_api_key: str = "test123",
+ internal_api_key: str = None,
[SECURITY]
Committable suggestion
❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
class CogneeManager: | |
def __init__(self, embeddings: Embeddings = None, | |
vector_db: VectorDB = None, | |
vector_db_key: str = None, | |
embedding_api_key: str = None, | |
webhook_url: str = None, | |
lines_per_batch: int = 1000, | |
webhook_key: str = None, | |
document_id: str = None, | |
chunk_validation_url: str = None, | |
internal_api_key: str = "test123", | |
base_url="http://localhost:8000"): | |
self.embeddings = embeddings if embeddings else Embeddings() | |
self.vector_db = vector_db if vector_db else VectorDB() | |
self.webhook_url = webhook_url | |
self.lines_per_batch = lines_per_batch | |
self.webhook_key = webhook_key | |
self.document_id = document_id | |
self.chunk_validation_url = chunk_validation_url | |
self.vector_db_key = vector_db_key | |
self.embeddings_api_key = embedding_api_key | |
self.internal_api_key = internal_api_key | |
self.base_url = base_url | |
class CogneeManager: | |
def __init__(self, embeddings: Embeddings = None, | |
vector_db: VectorDB = None, | |
vector_db_key: str = None, | |
embedding_api_key: str = None, | |
webhook_url: str = None, | |
lines_per_batch: int = 1000, | |
webhook_key: str = None, | |
document_id: str = None, | |
chunk_validation_url: str = None, | |
internal_api_key: str = None, | |
base_url="http://localhost:8000"): | |
self.embeddings = embeddings if embeddings else Embeddings() | |
self.vector_db = vector_db if vector_db else VectorDB() | |
self.webhook_url = webhook_url | |
self.lines_per_batch = lines_per_batch | |
self.webhook_key = webhook_key | |
self.document_id = document_id | |
self.chunk_validation_url = chunk_validation_url | |
self.vector_db_key = vector_db_key | |
self.embeddings_api_key = embedding_api_key | |
self.internal_api_key = internal_api_key | |
self.base_url = base_url |
ENV = docker | ||
POSTGRES_USER = bla | ||
POSTGRES_PASSWORD = bla | ||
POSTGRES_DB = bubu | ||
POSTGRES_HOST = localhost | ||
POSTGRES_HOST_DOCKER = postgres | ||
SEGMENT_KEY = Etl4WJwzOkeDPAjaOXOMgyU16hO7mV7B | ||
COG_ARCH_DIR = cognitive_architecture | ||
GRAPH_DB_URL = | ||
GRAPH_DB_PW = | ||
GRAPH_DB_USER = | ||
GRAPH_DB_URL = | ||
GRAPH_DB_PW = | ||
GRAPH_DB_USER = | ||
AWS_ACCESS_KEY_ID = | ||
AWS_SECRET_ACCESS_KEY = | ||
QDRANT_API_KEY | ||
QDRANT_API_URL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 NOTE
Codebase verification is a beta feature.
The search results indicate that the ENVIRONMENT
variable is still being used in several places within the codebase:
entrypoint.sh
README.md
(twice, in instructions)iterations/level_3/create_database.py
iterations/level_3/entrypoint.sh
iterations/level_3/database/database.py
iterations/level_2/entrypoint.sh
Given this information, the renaming of ENVIRONMENT
to ENV
in the .env.template
file is not consistently applied across the entire project. The references to ENVIRONMENT
in the README.md
are likely instructional and may need to be updated to reflect the changes in the .env.template
file. The occurrences in the scripts and Python files suggest that the code still expects an ENVIRONMENT
variable to be set.
Based on these findings, the following issues have been identified:
- The
ENVIRONMENT
variable is still being used in multiple places, which could lead to inconsistencies or errors if the environment variable is expected to beENV
now. - The
README.md
file contains outdated instructions regarding theENVIRONMENT
variable.
Analysis chain
Ensure that the new AWS and QDRANT API keys are handled securely and are not exposed in the codebase or version control. Verify that the renaming of ENVIRONMENT
to ENV
is consistently applied across the entire project.
Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Verify that ENV is used instead of ENVIRONMENT throughout the codebase
rg --type env 'ENVIRONMENT'
# Expecting no results for ENVIRONMENT, only ENV should be present
Length of output: 55
Script:
#!/bin/bash
# Verify that ENV is used instead of ENVIRONMENT throughout the codebase
# This time, search all text files without specifying a file type
rg 'ENVIRONMENT'
# Expecting no results for ENVIRONMENT, only ENV should be present
Length of output: 495
def upload(self, file_paths: list[str], base_url=None): | ||
if base_url: | ||
url = base_url + "/jobs" | ||
else: | ||
url = self.base_url + "/jobs" | ||
|
||
data = self.serialize() | ||
headers = self.generate_headers() | ||
multipart_form_data = [('file', (os.path.basename(filepath), open(filepath, 'rb'), 'application/octet-stream')) | ||
for filepath in file_paths] | ||
|
||
print(f"embedding {len(file_paths)} documents at {url}") | ||
response = requests.post(url, files=multipart_form_data, headers=headers, stream=True, data=data) | ||
|
||
if response.status_code == 500: | ||
print(response.text) | ||
return Response(error=response.text, status_code=response.status_code) | ||
|
||
response_json = response.json() | ||
if response.status_code >= 400 and response.status_code < 500: | ||
print(f"Error: {response_json['error']}") | ||
|
||
return Response.from_json(response_json, response.status_code) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The upload
method opens files for reading without ensuring they are closed, which can lead to resource leaks. Use a context manager to ensure files are properly closed.
- multipart_form_data = [('file', (os.path.basename(filepath), open(filepath, 'rb'), 'application/octet-stream'))
+ multipart_form_data = [('file', (os.path.basename(filepath), open(filepath, 'rb'), 'application/octet-stream')) for filepath in file_paths]
+ with contextlib.ExitStack() as stack:
+ files = [stack.enter_context(open(filepath, 'rb')) for filepath in file_paths]
+ multipart_form_data = [('file', (os.path.basename(filepath), file, 'application/octet-stream')) for filepath, file in zip(file_paths, files)]
Committable suggestion
❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
def upload(self, file_paths: list[str], base_url=None): | |
if base_url: | |
url = base_url + "/jobs" | |
else: | |
url = self.base_url + "/jobs" | |
data = self.serialize() | |
headers = self.generate_headers() | |
multipart_form_data = [('file', (os.path.basename(filepath), open(filepath, 'rb'), 'application/octet-stream')) | |
for filepath in file_paths] | |
print(f"embedding {len(file_paths)} documents at {url}") | |
response = requests.post(url, files=multipart_form_data, headers=headers, stream=True, data=data) | |
if response.status_code == 500: | |
print(response.text) | |
return Response(error=response.text, status_code=response.status_code) | |
response_json = response.json() | |
if response.status_code >= 400 and response.status_code < 500: | |
print(f"Error: {response_json['error']}") | |
return Response.from_json(response_json, response.status_code) | |
def upload(self, file_paths: list[str], base_url=None): | |
if base_url: | |
url = base_url + "/jobs" | |
else: | |
url = self.base_url + "/jobs" | |
data = self.serialize() | |
headers = self.generate_headers() | |
with contextlib.ExitStack() as stack: | |
files = [stack.enter_context(open(filepath, 'rb')) for filepath in file_paths] | |
multipart_form_data = [('file', (os.path.basename(filepath), file, 'application/octet-stream')) for filepath, file in zip(file_paths, files)] | |
print(f"embedding {len(file_paths)} documents at {url}") | |
response = requests.post(url, files=multipart_form_data, headers=headers, stream=True, data=data) | |
if response.status_code == 500: | |
print(response.text) | |
return Response(error=response.text, status_code=response.status_code) | |
response_json = response.json() | |
if response.status_code >= 400 and response.status_code < 500: | |
print(f"Error: {response_json['error']}") | |
return Response.from_json(response_json, response.status_code) |
|
||
|
||
async def delete_memories(self, namespace:str, params: dict = None): | ||
if namespace is None: | ||
namespace = self.namespace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The delete_memories
method should handle exceptions and provide appropriate logging or error messages.
+ try:
+ # existing deletion logic
+ except Exception as e:
+ logging.error(f"Failed to delete memories: {e}")
+ raise
Committable suggestion
❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
async def delete_memories(self, namespace:str, params: dict = None): | |
if namespace is None: | |
namespace = self.namespace | |
async def delete_memories(self, namespace:str, params: dict = None): | |
try: | |
if namespace is None: | |
namespace = self.namespace | |
# existing deletion logic | |
except Exception as e: | |
logging.error(f"Failed to delete memories: {e}") | |
raise |
main.py
Outdated
async def create_public_memory(user_id: str=None, labels:list=None, topic:str=None) -> Optional[int]: | ||
""" | ||
Create a public memory node associated with a user in a Neo4j graph database. | ||
If Public Memory exists, it will return the id of the memory. | ||
This is intended as standalone node that can be attached to any user. | ||
It is not attached to any user by default. | ||
|
||
Args: | ||
user_id (str): The unique identifier for the user. | ||
session (AsyncSession): An asynchronous session for database operations. | ||
|
||
Returns: | ||
Optional[int]: The ID of the created public memory node or None if an error occurs. | ||
:param labels: Label for the memory, to help filter for different countries | ||
:param topic: Topic for the memory, to help provide a name | ||
|
||
""" | ||
# Validate input parameters | ||
if not labels: | ||
labels = ['sr'] # Labels for the memory node | ||
|
||
if not topic: | ||
topic = "PublicMemory" | ||
|
||
|
||
try: | ||
neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url, | ||
username=config.graph_database_username, | ||
password=config.graph_database_password) | ||
|
||
# Assuming the topic for public memory is predefined, e.g., "PublicMemory" | ||
# Create the memory node | ||
memory_id = await neo4j_graph_db.create_memory_node(labels=labels, topic=topic) | ||
return memory_id | ||
except Neo4jError as e: | ||
logging.error(f"Error creating public memory node: {e}") | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function create_public_memory
returns None
in case of a Neo4jError
. Consider if this is the best error handling strategy for the context in which this function is used. It might be better to re-raise the exception or handle it differently.
main.py
Outdated
# CREATE (episodic)-[:HAS_EVENT]->(location5_5e43f4c45b3c44ea897c12220db4c051) | ||
# CREATE (buffer)-[:CURRENTLY_HOLDING]->(location5_5e43f4c45b3c44ea897c12220db4c051) | ||
# CREATE (episodic)-[:HAS_EVENT]->(object6_5cdb87ad488c450c9dbce07b7daf3d8d) | ||
# CREATE (buffer)-[:CURRENTLY_HOLDING]->(object6_5cdb87ad488c450c9dbce07b7daf3d8d) | ||
# CREATE (episodic)-[:HAS_EVENT]->(information7_f756e3f3720c4fe5aeb01287badaf088) | ||
# CREATE (buffer)-[:CURRENTLY_HOLDING]->(information7_f756e3f3720c4fe5aeb01287badaf088) | ||
# CREATE (episodic)-[:HAS_EVENT]->(event8_da6334e744454264900296319e14b532) | ||
# CREATE (buffer)-[:CURRENTLY_HOLDING]->(event8_da6334e744454264900296319e14b532) | ||
# CREATE (episodic)-[:HAS_EVENT]->(action9_48e45419604e4d66b3e718ee1d6c095f) | ||
# CREATE (buffer)-[:CURRENTLY_HOLDING]->(action9_48e45419604e4d66b3e718ee1d6c095f) | ||
# CREATE (episodic)-[:HAS_EVENT]->(action10_f48acb1db4da4934afbe17363e9e63a4) | ||
# CREATE (buffer)-[:CURRENTLY_HOLDING]->(action10_f48acb1db4da4934afbe17363e9e63a4)""" | ||
# | ||
# out = cypher_statement_correcting(query) | ||
# print(out) | ||
# | ||
# out = await user_query_to_graph_db(session, user_id, "I walked in the forest yesterday and added to my list I need to buy some milk in the store and get a summary from a classical book i read yesterday") | ||
# print(out) | ||
# load_doc_to_graph = await add_documents_to_graph_db(session, user_id) | ||
# print(load_doc_to_graph) | ||
# user_id = 'user' | ||
# loader_settings = { | ||
# "format": "PDF", | ||
# "source": "DEVICE", | ||
# "path": [".data"] | ||
# } | ||
# await load_documents_to_vectorstore(session, user_id, loader_settings=loader_settings) | ||
# await create_public_memory(user_id=user_id, labels=['sr'], topic="PublicMemory") | ||
# await add_documents_to_graph_db(session, user_id) | ||
# | ||
neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url, username=config.graph_database_username, | ||
password=config.graph_database_password) | ||
|
||
# await attach_user_to_memory(user_id=user_id, labels=['sr'], topic="PublicMemory") | ||
|
||
# return_ = await user_context_enrichment(user_id=user_id, query="Koja je minimalna visina ograde na balkonu na stambenom objektu", session=session) | ||
# print(return_) | ||
|
||
# document_summary = { | ||
# 'DocumentCategory': 'Science', | ||
# 'Title': 'The Future of AI', | ||
# 'Summary': 'An insightful article about the advancements in AI.', | ||
# 'd_id': 'doc123' | ||
# } | ||
# | ||
# # Example user ID | ||
# user_id = 'user' | ||
# | ||
# # value = await neo4j_graph_db.create_memory_node(labels=['sr']) | ||
# # print(value) | ||
# # neo4j_graph_db.close() | ||
# | ||
# await add_documents_to_graph_db(session, user_id) | ||
# neo4j_graph_db.link_public_memory_to_user(memory_id = 17,user_id=user_id) | ||
# | ||
ids = neo4j_graph_db.retrieve_node_id_for_memory_type(topic="Document") | ||
print(ids) | ||
|
||
for id in ids: | ||
print(id.get('memoryId')) | ||
|
||
neo4j_graph_db.delete_memory_node(memory_id = id.get('memoryId'), topic="Document") | ||
# | ||
# neo4j_graph_db.delete_memory_node(memory_id=16, topic="PublicSerbianArchitecture") | ||
# neo4j_graph_db.unlink_memory_from_user(memory_id = 17,user_id=user_id) | ||
# cypher_query_public = neo4j_graph_db.create_document_node_cypher(document_summary, user_id, memory_type="PUBLIC") | ||
# neo4j_graph_db.query(cypher_query_public) | ||
# link_memory_to_user(user_id, session) | ||
|
||
# neo4j_graph_db.create_memory_node(labels=['sr']) | ||
# out = await get_vectordb_namespace(session, user_id) | ||
# params = { | ||
# "version": "1.0", | ||
# "agreement_id": "AG123456", | ||
# "privacy_policy": "https://example.com/privacy", | ||
# "terms_of_service": "https://example.com/terms", | ||
# "format": "json", | ||
# "schema_version": "1.1", | ||
# "checksum": "a1b2c3d4e5f6", | ||
# "owner": "John Doe", | ||
# "license": "MIT", | ||
# "validity_start": "2023-08-01", | ||
# "validity_end": "2024-07-31", | ||
# } | ||
# loader_settings = { | ||
# "format": "PDF", | ||
# "source": "DEVICE", | ||
# "path": [".data"], | ||
# "strategy": "SUMMARY", | ||
# } | ||
# await load_documents_to_vectorstore(session, user_id, loader_settings=loader_settings) | ||
# await user_query_to_graph_db(session, user_id, "I walked in the forest yesterday and added to my list I need to buy some milk in the store and get a summary from a classical book i read yesterday") | ||
# await add_documents_to_graph_db(session, user_id, loader_settings=loader_settings) | ||
# await user_context_enrichment(session, user_id, query="Tell me about the book I read yesterday") | ||
|
||
|
||
if __name__ == "__main__": | ||
import asyncio | ||
|
||
asyncio.run(main()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main
function is commented out. If this is intentional and the code is not needed, consider removing it to avoid confusion. If it's meant to be part of the application, uncomment and ensure it's being used correctly.
def detect_language(text): | ||
""" | ||
Detect the language of the given text and return its ISO 639-1 language code. | ||
If the detected language is Croatian ('hr'), it maps to Serbian ('sr'). | ||
The text is trimmed to the first 100 characters for efficient processing. | ||
|
||
Parameters: | ||
text (str): The text for language detection. | ||
|
||
Returns: | ||
str: The ISO 639-1 language code of the detected language, or 'None' in case of an error. | ||
""" | ||
|
||
# Trim the text to the first 100 characters | ||
trimmed_text = text[:100] | ||
|
||
try: | ||
# Detect the language using langdetect | ||
detected_lang_iso639_1 = detect(trimmed_text) | ||
logging.info(f"Detected ISO 639-1 code: {detected_lang_iso639_1}") | ||
|
||
# Special case: map 'hr' (Croatian) to 'sr' (Serbian ISO 639-2) | ||
if detected_lang_iso639_1 == 'hr': | ||
return 'sr' | ||
return detected_lang_iso639_1 | ||
|
||
except LangDetectException as e: | ||
logging.error(f"Language detection error: {e}") | ||
except Exception as e: | ||
logging.error(f"Unexpected error: {e}") | ||
|
||
return -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The detect_language
function has an error path that returns -1
, which is inconsistent with the documented return type of a string. Consider returning None
or an empty string instead to maintain consistency with the return type.
- return -1
+ return None # or return an empty string if that's more appropriate
Committable suggestion
❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
def detect_language(text): | |
""" | |
Detect the language of the given text and return its ISO 639-1 language code. | |
If the detected language is Croatian ('hr'), it maps to Serbian ('sr'). | |
The text is trimmed to the first 100 characters for efficient processing. | |
Parameters: | |
text (str): The text for language detection. | |
Returns: | |
str: The ISO 639-1 language code of the detected language, or 'None' in case of an error. | |
""" | |
# Trim the text to the first 100 characters | |
trimmed_text = text[:100] | |
try: | |
# Detect the language using langdetect | |
detected_lang_iso639_1 = detect(trimmed_text) | |
logging.info(f"Detected ISO 639-1 code: {detected_lang_iso639_1}") | |
# Special case: map 'hr' (Croatian) to 'sr' (Serbian ISO 639-2) | |
if detected_lang_iso639_1 == 'hr': | |
return 'sr' | |
return detected_lang_iso639_1 | |
except LangDetectException as e: | |
logging.error(f"Language detection error: {e}") | |
except Exception as e: | |
logging.error(f"Unexpected error: {e}") | |
return -1 | |
def detect_language(text): | |
""" | |
Detect the language of the given text and return its ISO 639-1 language code. | |
If the detected language is Croatian ('hr'), it maps to Serbian ('sr'). | |
The text is trimmed to the first 100 characters for efficient processing. | |
Parameters: | |
text (str): The text for language detection. | |
Returns: | |
str: The ISO 639-1 language code of the detected language, or 'None' in case of an error. | |
""" | |
# Trim the text to the first 100 characters | |
trimmed_text = text[:100] | |
try: | |
# Detect the language using langdetect | |
detected_lang_iso639_1 = detect(trimmed_text) | |
logging.info(f"Detected ISO 639-1 code: {detected_lang_iso639_1}") | |
# Special case: map 'hr' (Croatian) to 'sr' (Serbian ISO 639-2) | |
if detected_lang_iso639_1 == 'hr': | |
return 'sr' | |
return detected_lang_iso639_1 | |
except LangDetectException as e: | |
logging.error(f"Language detection error: {e}") | |
except Exception as e: | |
logging.error(f"Unexpected error: {e}") | |
return None # or return an empty string if that's more appropriate |
# try: | ||
# pages = text_splitter.create_documents([source_data]) | ||
# except: | ||
# try: | ||
pages = text_splitter.create_documents([source_data]) | ||
# except: | ||
# pages = text_splitter.create_documents(source_data.content) | ||
# pages = source_data.load_and_split() | ||
return pages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commented-out exception handling block in the vanilla_chunker
function should be removed or updated to reflect the current exception handling strategy.
- # try:
- # pages = text_splitter.create_documents([source_data])
- # except:
- # try:
- # pages = text_splitter.create_documents(source_data.content)
- # except:
- # pages = text_splitter.create_documents(source_data.content)
Committable suggestion
❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
# try: | |
# pages = text_splitter.create_documents([source_data]) | |
# except: | |
# try: | |
pages = text_splitter.create_documents([source_data]) | |
# except: | |
# pages = text_splitter.create_documents(source_data.content) | |
# pages = source_data.load_and_split() | |
return pages | |
pages = text_splitter.create_documents([source_data]) | |
return pages |
…refactor and reformat issues, add boilerplate for loading jobs logic and extend search functionalities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files selected for processing (1)
- Dockerfile (1 hunks)
Files skipped from review as they are similar to previous changes (1)
- Dockerfile
…refactor and reformat issues, add boilerplate for loading jobs logic and extend search functionalities.
…refactor and reformat issues, add boilerplate for loading jobs logic and extend search functionalities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files ignored due to filter (2)
- poetry.lock
- pyproject.toml
Files selected for processing (1)
- entrypoint.sh (1 hunks)
Files skipped from review due to trivial changes (1)
- entrypoint.sh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 22
Configuration used: CodeRabbit UI
Files selected for processing (5)
- cognitive_architecture/classifiers/classifier.py (1 hunks)
- cognitive_architecture/database/create_database_tst.py (1 hunks)
- cognitive_architecture/database/graph_database/graph.py (14 hunks)
- cognitive_architecture/database/vectordb/basevectordb.py (1 hunks)
- cognitive_architecture/database/vectordb/vectordb.py (6 hunks)
Additional comments: 8
cognitive_architecture/classifiers/classifier.py (1)
- 76-109: The
classify_call
function has been modified to acceptdocument_summaries
instead ofquery
,context
, anddocument_types
. The JSON structure and function invocation have been updated accordingly. The return value has been changed from the classifier value to the classifier ID. Ensure that all calls to this function throughout the codebase have been updated to match the new signature and that the change in return value is handled appropriately.Verification successful
Based on the information gathered from the executed shell scripts, the
classify_call
function has been updated incognitive_architecture/classifiers/classifier.py
and is correctly used with the new signature inmain.py
. There are no other occurrences of the function in the codebase, which indicates that the changes have been properly reflected where necessary. The old function signature does not appear to be used anymore.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search to find files that import classifier module. Expecting list of dependent files. rg -C3 --type python $'from cognitive_architecture.classifiers import classifier' # Search old signature to find omissions. Expecting no results. ast-grep --lang python --pattern $'classify_call($_, $_, $_)' # Search new signature to validate usage. Expecting list of locations where the new signature is used. ast-grep --lang python --pattern $'classify_call($_, $_)'Length of output: 342
Script:
#!/bin/bash # Correct the search to find files that import classifier module without specifying the file type. rg -C3 'from cognitive_architecture.classifiers import classifier' # Search for any other usages of the new `classify_call` function signature. ast-grep --lang python --pattern $'classify_call($_, $_)' # Search for any usages of the old `classify_call` function signature to ensure it's not used anymore. ast-grep --lang python --pattern $'classify_call($_, $_, $_)'Length of output: 297
Script:
#!/bin/bash # Search for any file that might be using the `classify_call` function, regardless of the import statement. rg 'classify_call' --type pyLength of output: 310
cognitive_architecture/database/vectordb/basevectordb.py (1)
- 281-281: The change in the hunk modifies the logging statement for the search type by adding
str()
around thesearch_type
variable. This is a good practice to ensure that the variable is converted to a string before being concatenated in the logging message.cognitive_architecture/database/vectordb/vectordb.py (3)
63-63: The
init_weaviate
method has been updated to include aretriever_type
parameter. Ensure that this change is reflected wherever the method is called and that the new parameter is used correctly.152-179: The
add_memories
method has been updated to handle document loading and chunking. Ensure that the changes are compatible with the expected document structure and that thechunk_data
function is implemented correctly.252-316: > Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [183-313]
The
fetch_memories
method has been modified to include different search types. Verify that the new search types are implemented correctly and that the method's callers are updated to handle the new functionality.cognitive_architecture/database/graph_database/graph.py (3)
8-9: New imports from
neo4j
have been added, which suggests that the application is interacting with a Neo4j database. Ensure that these new dependencies are included in the project's dependency management files (e.g.,requirements.txt
orPipfile
).361-364: The commented-out code for
corrected_cypher_statements
is related to a previous comment about uncommenting or removing the code. Since the comment is still valid, we can skip generating a new comment.805-834: The method
unlink_memory_from_user
is correctly using parameterized queries, which is a good practice for security. However, ensure that thememory_id
is validated as an integer before using it in the query to prevent type-related issues.
# this is needed to import classes from other modules | ||
# script_dir = os.path.dirname(os.path.abspath(__file__)) | ||
# # Get the parent directory of your script and add it to sys.path | ||
# parent_dir = os.path.dirname(script_dir) | ||
# sys.path.append(parent_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commented-out code for setting the script directory and appending the parent directory to sys.path
is still present. As previously mentioned, if this code is no longer needed, it should be removed to avoid confusion.
- # script_dir = os.path.dirname(os.path.abspath(__file__))
- # # Get the parent directory of your script and add it to sys.path
- # parent_dir = os.path.dirname(script_dir)
- # sys.path.append(parent_dir)
Committable suggestion
❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
# this is needed to import classes from other modules | |
# script_dir = os.path.dirname(os.path.abspath(__file__)) | |
# # Get the parent directory of your script and add it to sys.path | |
# parent_dir = os.path.dirname(script_dir) | |
# sys.path.append(parent_dir) | |
# this is needed to import classes from other modules |
# from cognitive_architecture.config import Config | ||
# config = Config() | ||
# config.load() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commented-out code for loading configuration using a Config
object is still present. As previously mentioned, if the new pattern is to use environment variables directly, this commented-out code should be removed to prevent confusion.
- # from cognitive_architecture.config import Config
- # config = Config()
- # config.load()
Committable suggestion
❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
# from cognitive_architecture.config import Config | |
# config = Config() | |
# config.load() |
if os.environ.get('AWS_ENV') == 'prd' or os.environ.get('AWS_ENV') == 'dev': | ||
host = os.environ.get('POSTGRES_HOST') | ||
username = os.environ.get('POSTGRES_USER') | ||
password = os.environ.get('POSTGRES_PASSWORD') | ||
database_name = os.environ.get('POSTGRES_DB') | ||
elif os.environ.get('AWS_ENV') == 'local': | ||
host = os.environ.get('POSTGRES_HOST') | ||
username = os.environ.get('POSTGRES_USER') | ||
password = os.environ.get('POSTGRES_PASSWORD') | ||
database_name = os.environ.get('POSTGRES_DB') | ||
else: | ||
host = os.environ.get('POSTGRES_HOST') | ||
username = os.environ.get('POSTGRES_USER') | ||
password = os.environ.get('POSTGRES_PASSWORD') | ||
database_name = os.environ.get('POSTGRES_DB') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code for setting database connection parameters based on the AWS_ENV
environment variable is still redundant across the prd
, dev
, and local
cases. This can be simplified since the assignments are identical in each case.
- if os.environ.get('AWS_ENV') == 'prd' or os.environ.get('AWS_ENV') == 'dev':
- host = os.environ.get('POSTGRES_HOST')
- username = os.environ.get('POSTGRES_USER')
- password = os.environ.get('POSTGRES_PASSWORD')
- database_name = os.environ.get('POSTGRES_DB')
- elif os.environ.get('AWS_ENV') == 'local':
- host = os.environ.get('POSTGRES_HOST')
- username = os.environ.get('POSTGRES_USER')
- password = os.environ.get('POSTGRES_PASSWORD')
- database_name = os.environ.get('POSTGRES_DB')
- else:
- host = os.environ.get('POSTGRES_HOST')
- username = os.environ.get('POSTGRES_USER')
- password = os.environ.get('POSTGRES_PASSWORD')
- database_name = os.environ.get('POSTGRES_DB')
+ host = os.environ.get('POSTGRES_HOST')
+ username = os.environ.get('POSTGRES_USER')
+ password = os.environ.get('POSTGRES_PASSWORD')
+ database_name = os.environ.get('POSTGRES_DB')
Committable suggestion
❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
if os.environ.get('AWS_ENV') == 'prd' or os.environ.get('AWS_ENV') == 'dev': | |
host = os.environ.get('POSTGRES_HOST') | |
username = os.environ.get('POSTGRES_USER') | |
password = os.environ.get('POSTGRES_PASSWORD') | |
database_name = os.environ.get('POSTGRES_DB') | |
elif os.environ.get('AWS_ENV') == 'local': | |
host = os.environ.get('POSTGRES_HOST') | |
username = os.environ.get('POSTGRES_USER') | |
password = os.environ.get('POSTGRES_PASSWORD') | |
database_name = os.environ.get('POSTGRES_DB') | |
else: | |
host = os.environ.get('POSTGRES_HOST') | |
username = os.environ.get('POSTGRES_USER') | |
password = os.environ.get('POSTGRES_PASSWORD') | |
database_name = os.environ.get('POSTGRES_DB') | |
host = os.environ.get('POSTGRES_HOST') | |
username = os.environ.get('POSTGRES_USER') | |
password = os.environ.get('POSTGRES_PASSWORD') | |
database_name = os.environ.get('POSTGRES_DB') |
def database_exists(username, password, host, db_name): | ||
engine = create_admin_engine(username, password, host, db_name) | ||
connection = engine.connect() | ||
query = text(f"SELECT 1 FROM pg_database WHERE datname='{db_name}'") | ||
result = connection.execute(query).fetchone() | ||
connection.close() | ||
engine.dispose() | ||
return result is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The database_exists
function constructs a raw SQL query using string formatting, which could be vulnerable to SQL injection if the db_name
variable is not properly sanitized. As previously mentioned, use query parameters to mitigate this risk.
- query = text(f"SELECT 1 FROM pg_database WHERE datname='{db_name}'")
+ query = text("SELECT 1 FROM pg_database WHERE datname=:db_name")
+ result = connection.execute(query, {'db_name': db_name}).fetchone()
Committable suggestion
❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
def database_exists(username, password, host, db_name): | |
engine = create_admin_engine(username, password, host, db_name) | |
connection = engine.connect() | |
query = text(f"SELECT 1 FROM pg_database WHERE datname='{db_name}'") | |
result = connection.execute(query).fetchone() | |
connection.close() | |
engine.dispose() | |
return result is not None | |
def database_exists(username, password, host, db_name): | |
engine = create_admin_engine(username, password, host, db_name) | |
connection = engine.connect() | |
query = text("SELECT 1 FROM pg_database WHERE datname=:db_name") | |
result = connection.execute(query, {'db_name': db_name}).fetchone() | |
connection.close() | |
engine.dispose() | |
return result is not None |
def create_database(username, password, host, db_name): | ||
engine = create_admin_engine(username, password, host, db_name) | ||
connection = engine.raw_connection() | ||
connection.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT) | ||
cursor = connection.cursor() | ||
cursor.execute(f"CREATE DATABASE {db_name}") | ||
cursor.close() | ||
connection.close() | ||
engine.dispose() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The create_database
function uses string formatting to construct a SQL command, which could be vulnerable to SQL injection. As previously mentioned, consider using safer methods to execute SQL commands.
- cursor.execute(f"CREATE DATABASE {db_name}")
+ cursor.execute("CREATE DATABASE %s", (db_name,))
Committable suggestion
❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
def create_database(username, password, host, db_name): | |
engine = create_admin_engine(username, password, host, db_name) | |
connection = engine.raw_connection() | |
connection.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT) | |
cursor = connection.cursor() | |
cursor.execute(f"CREATE DATABASE {db_name}") | |
cursor.close() | |
connection.close() | |
engine.dispose() | |
def create_database(username, password, host, db_name): | |
engine = create_admin_engine(username, password, host, db_name) | |
connection = engine.raw_connection() | |
connection.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT) | |
cursor = connection.cursor() | |
cursor.execute("CREATE DATABASE %s", (db_name,)) | |
cursor.close() | |
connection.close() | |
engine.dispose() |
# self.graph = Neo4jGraph(url=url, username=username, password=password) | ||
from neo4j import GraphDatabase | ||
self.driver = GraphDatabase.driver(url, auth=(username, password)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commented-out line # self.graph = Neo4jGraph(url=url, username=username, password=password)
suggests that there was a previous implementation that has been replaced. Ensure that this commented-out code is no longer needed and remove it to avoid confusion.
def delete_all_user_memories(self, user_id): | ||
try: | ||
# Check if the user exists | ||
user_exists = self.graph.query(f"MATCH (user:User {{userId: '{user_id}'}}) RETURN user") | ||
user_exists = self.query(f"MATCH (user:User {{userId: '{user_id}'}}) RETURN user") | ||
if not user_exists: | ||
return f"No user found with ID: {user_id}" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [375-389]
The method delete_all_user_memories
uses string formatting to inject the user_id
directly into the Cypher query, which could lead to Cypher injection if the user_id
is not properly sanitized. Use parameterization to prevent potential security issues.
- user_exists = self.query(f"MATCH (user:User {{userId: '{user_id}'}}) RETURN user")
+ user_exists = self.query("MATCH (user:User {userId: $user_id}) RETURN user", params={"user_id": user_id})
MATCH (user)-[:HAS_BUFFER]->(buffer) | ||
DETACH DELETE semantic, episodic, buffer | ||
""" | ||
self.graph.query(delete_query) | ||
self.query(delete_query) | ||
return f"All memories deleted for user ID: {user_id}" | ||
except Exception as e: | ||
return f"An error occurred: {str(e)}" | ||
|
||
def delete_specific_memory_type(self, user_id, memory_type): | ||
try: | ||
# Check if the user exists | ||
user_exists = self.graph.query(f"MATCH (user:User {{userId: '{user_id}'}}) RETURN user") | ||
user_exists = self.query(f"MATCH (user:User {{userId: '{user_id}'}}) RETURN user") | ||
if not user_exists: | ||
return f"No user found with ID: {user_id}" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [397-410]
Similar to the previous comment, the delete_specific_memory_type
method should use parameterized queries to prevent Cypher injection.
- user_exists = self.query(f"MATCH (user:User {{userId: '{user_id}'}}) RETURN user")
+ user_exists = self.query("MATCH (user:User {userId: $user_id}) RETURN user", params={"user_id": user_id})
def update_document_node_with_db_ids(self, vectordb_namespace: str, document_id: str, user_id: str = None): | ||
""" | ||
Update the namespace of a Document node in the database. The document can be linked | ||
either to a SemanticMemory node (if a user ID is provided) or to a PublicMemory node. | ||
|
||
Parameters: | ||
- vectordb_namespace (str): The namespace to set for the vectordb. | ||
- document_id (str): The unique identifier of the document. | ||
- user_id (str, optional): The unique identifier for the user. Default is None. | ||
|
||
Returns: | ||
- str: A Cypher query string to perform the update. | ||
""" | ||
|
||
if user_id: | ||
# Update for a document linked to a SemanticMemory node | ||
cypher_query = f''' | ||
MATCH (user:User {{userId: '{user_id}' }})-[:HAS_SEMANTIC_MEMORY]->(:SemanticMemory)-[:HAS_DOCUMENT]->(document:Document {{d_id: '{document_id}'}}) | ||
SET document.vectordbNamespace = '{vectordb_namespace}' | ||
RETURN document | ||
''' | ||
else: | ||
# Update for a document linked to a PublicMemory node | ||
cypher_query = f''' | ||
MATCH (:PublicMemory)-[:HAS_DOCUMENT]->(document:Document {{d_id: '{document_id}'}}) | ||
SET document.vectordbNamespace = '{vectordb_namespace}' | ||
RETURN document | ||
''' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method update_document_node_with_db_ids
has a conditional that checks if user_id
is provided, but the method signature does not mark user_id
as optional, which could lead to a TypeError
if user_id
is not passed. The method signature should be updated to reflect that user_id
is optional.
- def update_document_node_with_db_ids(self, vectordb_namespace: str, document_id: str, user_id: str = None):
+ def update_document_node_with_db_ids(self, vectordb_namespace: str, document_id: str, user_id: Optional[str] = None):
Committable suggestion
❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
def update_document_node_with_db_ids(self, vectordb_namespace: str, document_id: str, user_id: str = None): | |
""" | |
Update the namespace of a Document node in the database. The document can be linked | |
either to a SemanticMemory node (if a user ID is provided) or to a PublicMemory node. | |
Parameters: | |
- vectordb_namespace (str): The namespace to set for the vectordb. | |
- document_id (str): The unique identifier of the document. | |
- user_id (str, optional): The unique identifier for the user. Default is None. | |
Returns: | |
- str: A Cypher query string to perform the update. | |
""" | |
if user_id: | |
# Update for a document linked to a SemanticMemory node | |
cypher_query = f''' | |
MATCH (user:User {{userId: '{user_id}' }})-[:HAS_SEMANTIC_MEMORY]->(:SemanticMemory)-[:HAS_DOCUMENT]->(document:Document {{d_id: '{document_id}'}}) | |
SET document.vectordbNamespace = '{vectordb_namespace}' | |
RETURN document | |
''' | |
else: | |
# Update for a document linked to a PublicMemory node | |
cypher_query = f''' | |
MATCH (:PublicMemory)-[:HAS_DOCUMENT]->(document:Document {{d_id: '{document_id}'}}) | |
SET document.vectordbNamespace = '{vectordb_namespace}' | |
RETURN document | |
''' | |
def update_document_node_with_db_ids(self, vectordb_namespace: str, document_id: str, user_id: Optional[str] = None): | |
""" | |
Update the namespace of a Document node in the database. The document can be linked | |
either to a SemanticMemory node (if a user ID is provided) or to a PublicMemory node. | |
Parameters: | |
- vectordb_namespace (str): The namespace to set for the vectordb. | |
- document_id (str): The unique identifier of the document. | |
- user_id (str, optional): The unique identifier for the user. Default is None. | |
Returns: | |
- str: A Cypher query string to perform the update. | |
""" | |
if user_id: | |
# Update for a document linked to a SemanticMemory node | |
cypher_query = f''' | |
MATCH (user:User {{userId: '{user_id}' }})-[:HAS_SEMANTIC_MEMORY]->(:SemanticMemory)-[:HAS_DOCUMENT]->(document:Document {{d_id: '{document_id}'}}) | |
SET document.vectordbNamespace = '{vectordb_namespace}' | |
RETURN document | |
''' | |
else: | |
# Update for a document linked to a PublicMemory node | |
cypher_query = f''' | |
MATCH (:PublicMemory)-[:HAS_DOCUMENT]->(document:Document {{d_id: '{document_id}'}}) | |
SET document.vectordbNamespace = '{vectordb_namespace}' | |
RETURN document | |
''' |
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The delete_memories
method allows for deletion of memories from the vector database. As previously mentioned, ensure that proper authorization checks are in place to prevent unauthorized deletions and that exceptions are handled appropriately.
+ # TODO: Add authorization checks before allowing deletion of memories
+ try:
+ # existing deletion logic
+ except Exception as e:
+ logging.error(f"Failed to delete memories: {e}")
+ raise
Committable suggestion
❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
# TODO: Add authorization checks before allowing deletion of memories | |
try: | |
# existing deletion logic | |
except Exception as e: | |
logging.error(f"Failed to delete memories: {e}") | |
raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 9
Configuration used: CodeRabbit UI
Files selected for processing (1)
- main.py (1 hunks)
Additional comments: 1
main.py (1)
- 26-26: The instantiation of
aclient
usinginstructor.patch(OpenAI())
is not explained. It's unclear whatinstructor.patch
does. If it's a custom wrapper or patch for the OpenAI client, ensure that it's functioning as expected and that the OpenAI client is properly configured.
main.py
Outdated
async def fetch_document_vectordb_namespace(session: AsyncSession, user_id: str, namespace_id:str, doc_id:str=None): | ||
logging.info("user id is", user_id) | ||
memory = await Memory.create_memory(user_id, session, namespace=namespace_id, memory_label=namespace_id) | ||
|
||
|
||
# Managing memory attributes | ||
existing_user = await Memory.check_existing_user(user_id, session) | ||
print("here is the existing user", existing_user) | ||
await memory.manage_memory_attributes(existing_user) | ||
print("Namespace id is %s", namespace_id) | ||
await memory.add_dynamic_memory_class(namespace_id.lower(), namespace_id) | ||
namespace_class = namespace_id + "_class" | ||
|
||
dynamic_memory_class = getattr(memory, namespace_class.lower(), None) | ||
|
||
methods_to_add = ["add_memories", "fetch_memories", "delete_memories"] | ||
|
||
if dynamic_memory_class is not None: | ||
for method_name in methods_to_add: | ||
await memory.add_method_to_class(dynamic_memory_class, method_name) | ||
print(f"Memory method {method_name} has been added") | ||
else: | ||
print(f"No attribute named in memory.") | ||
|
||
print("Available memory classes:", await memory.list_memory_classes()) | ||
result = await memory.dynamic_method_call(dynamic_memory_class, 'fetch_memories', | ||
observation="placeholder", search_type="summary_filter_by_object_name", params=doc_id) | ||
logging.info("Result is", result) | ||
|
||
return result, namespace_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function fetch_document_vectordb_namespace
has several print statements which are generally not recommended for production code as they can clutter logs and are not easily manageable. Consider using a logging framework with appropriate log levels.
async def load_documents_to_vectorstore(session: AsyncSession, user_id: str, content:str=None, job_id:str=None, loader_settings:dict=None, memory_type:str="PRIVATE"): | ||
namespace_id = str(generate_letter_uuid()) + "_" + "SEMANTICMEMORY" | ||
namespace_class = namespace_id + "_class" | ||
|
||
logging.info("Namespace created with id %s", namespace_id) | ||
try: | ||
new_user = User(id=user_id) | ||
await add_entity(session, new_user) | ||
except: | ||
pass | ||
if job_id is None: | ||
job_id = str(uuid.uuid4()) | ||
|
||
await add_entity( | ||
session, | ||
Operation( | ||
id=job_id, | ||
user_id=user_id, | ||
operation_status="RUNNING", | ||
operation_type="DATA_LOAD", | ||
), | ||
) | ||
memory = await Memory.create_memory(user_id, session, namespace=namespace_id, job_id=job_id, memory_label=namespace_id) | ||
if content is not None: | ||
document_names = [content[:30]] | ||
if loader_settings is not None: | ||
document_source = loader_settings.get("document_names") if isinstance(loader_settings.get("document_names"), | ||
list) else loader_settings.get("path", "None") | ||
logging.info("Document source is %s", document_source) | ||
# try: | ||
document_names = get_document_names(document_source[0]) | ||
logging.info(str(document_names)) | ||
# except: | ||
# document_names = document_source | ||
for doc in document_names: | ||
from cognitive_architecture.shared.language_processing import translate_text, detect_language | ||
#translates doc titles to english | ||
if loader_settings is not None: | ||
logging.info("Detecting language of document %s", doc) | ||
loader_settings["single_document_path"]= loader_settings.get("path", "None")[0] +"/"+doc | ||
logging.info("Document path is %s", loader_settings.get("single_document_path", "None")) | ||
memory_category = loader_settings.get("memory_category", "PUBLIC") | ||
if loader_settings is None: | ||
memory_category = "CUSTOM" | ||
if detect_language(doc) != "en": | ||
doc_ = doc.strip(".pdf").replace("-", " ") | ||
doc_ = translate_text(doc_, "sr", "en") | ||
else: | ||
doc_=doc | ||
doc_id = str(uuid.uuid4()) | ||
|
||
logging.info("Document name is %s", doc_) | ||
await add_entity( | ||
session, | ||
DocsModel( | ||
id=doc_id, | ||
operation_id=job_id, | ||
graph_summary= False, | ||
memory_category= memory_category, | ||
doc_name=doc_ | ||
) | ||
) | ||
# Managing memory attributes | ||
existing_user = await Memory.check_existing_user(user_id, session) | ||
await memory.manage_memory_attributes(existing_user) | ||
params = { | ||
"doc_id":doc_id | ||
} | ||
print("Namespace id is %s", namespace_id) | ||
await memory.add_dynamic_memory_class(namespace_id.lower(), namespace_id) | ||
|
||
dynamic_memory_class = getattr(memory, namespace_class.lower(), None) | ||
|
||
methods_to_add = ["add_memories", "fetch_memories", "delete_memories"] | ||
|
||
if dynamic_memory_class is not None: | ||
for method_name in methods_to_add: | ||
await memory.add_method_to_class(dynamic_memory_class, method_name) | ||
print(f"Memory method {method_name} has been added") | ||
else: | ||
print(f"No attribute named in memory.") | ||
|
||
print("Available memory classes:", await memory.list_memory_classes()) | ||
result = await memory.dynamic_method_call(dynamic_memory_class, 'add_memories', | ||
observation=content, params=params, loader_settings=loader_settings) | ||
await update_entity(session, Operation, job_id, "SUCCESS") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function load_documents_to_vectorstore
is complex and does a lot of work. It could benefit from being broken down into smaller, more focused functions for better maintainability and readability. Additionally, there are commented-out try-except blocks and print statements that should be cleaned up or replaced with proper logging.
async def user_query_to_graph_db(session: AsyncSession, user_id: str, query_input: str): | ||
|
||
try: | ||
new_user = User(id=user_id) | ||
await add_entity(session, new_user) | ||
except: | ||
pass | ||
|
||
job_id = str(uuid.uuid4()) | ||
|
||
await add_entity( | ||
session, | ||
Operation( | ||
id=job_id, | ||
user_id=user_id, | ||
operation_status="RUNNING", | ||
operation_type="USER_QUERY_TO_GRAPH_DB", | ||
), | ||
) | ||
|
||
neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url, username=config.graph_database_username, password=config.graph_database_password) | ||
cypher_query = await neo4j_graph_db.generate_cypher_query_for_user_prompt_decomposition(user_id,query_input) | ||
result = neo4j_graph_db.query(cypher_query) | ||
neo4j_graph_db.close() | ||
|
||
await update_entity(session, Operation, job_id, "SUCCESS") | ||
|
||
return result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function user_query_to_graph_db
seems to be handling database operations without any error handling. It's important to include error handling to manage any exceptions that might occur during the database operations.
async def add_documents_to_graph_db(session: AsyncSession, user_id: str= None, document_memory_types:list=None): | ||
"""""" | ||
if document_memory_types is None: | ||
document_memory_types = ['PUBLIC'] | ||
|
||
logging.info("Document memory types are", document_memory_types) | ||
try: | ||
# await update_document_vectordb_namespace(postgres_session, user_id) | ||
memory_details, docs = await get_unsumarized_vector_db_namespace(session, user_id) | ||
|
||
logging.info("Docs are", docs) | ||
memory_details= [detail for detail in memory_details if detail[1] in document_memory_types] | ||
logging.info("Memory details", memory_details) | ||
for doc in docs: | ||
logging.info("Memory names are", memory_details) | ||
doc_name, doc_id = doc | ||
logging.info("Doc id is", doc_id) | ||
try: | ||
classification_content = await fetch_document_vectordb_namespace(session, user_id, memory_details[0][0], doc_id) | ||
retrieval_chunks = [item['text'] for item in | ||
classification_content[0]['data']['Get'][memory_details[0][0]]] | ||
logging.info("Classification content is", classification_content) | ||
except: | ||
classification_content = "" | ||
retrieval_chunks = "" | ||
# retrieval_chunks = [item['text'] for item in classification_content[0]['data']['Get'][memory_details[0]]] | ||
# Concatenating the extracted text values | ||
concatenated_retrievals = ' '.join(retrieval_chunks) | ||
print(concatenated_retrievals) | ||
logging.info("Retrieval chunks are", retrieval_chunks) | ||
classification = await classify_documents(doc_name, document_id =doc_id, content=concatenated_retrievals) | ||
|
||
logging.info("Classification is", str(classification)) | ||
neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url, username=config.graph_database_username, | ||
password=config.graph_database_password) | ||
if document_memory_types == ['PUBLIC']: | ||
await create_public_memory(user_id=user_id, labels=['sr'], topic="PublicMemory") | ||
ids = neo4j_graph_db.retrieve_node_id_for_memory_type(topic="PublicMemory") | ||
neo4j_graph_db.close() | ||
print(ids) | ||
else: | ||
ids = neo4j_graph_db.retrieve_node_id_for_memory_type(topic="SemanticMemory") | ||
neo4j_graph_db.close() | ||
print(ids) | ||
|
||
for id in ids: | ||
print(id.get('memoryId')) | ||
neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url, username=config.graph_database_username, | ||
password=config.graph_database_password) | ||
if document_memory_types == ['PUBLIC']: | ||
|
||
rs = neo4j_graph_db.create_document_node_cypher(classification, user_id, public_memory_id=id.get('memoryId')) | ||
neo4j_graph_db.close() | ||
else: | ||
rs = neo4j_graph_db.create_document_node_cypher(classification, user_id, memory_type='SemanticMemory') | ||
neo4j_graph_db.close() | ||
logging.info("Cypher query is", rs) | ||
neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url, username=config.graph_database_username, | ||
password=config.graph_database_password) | ||
neo4j_graph_db.query(rs) | ||
neo4j_graph_db.close() | ||
logging.info("WE GOT HERE") | ||
neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url, username=config.graph_database_username, | ||
password=config.graph_database_password) | ||
if memory_details[0][1] == "PUBLIC": | ||
|
||
neo4j_graph_db.update_document_node_with_db_ids( vectordb_namespace=memory_details[0][0], | ||
document_id=doc_id) | ||
neo4j_graph_db.close() | ||
else: | ||
neo4j_graph_db.update_document_node_with_db_ids( vectordb_namespace=memory_details[0][0], | ||
document_id=doc_id, user_id=user_id) | ||
neo4j_graph_db.close() | ||
# await update_entity_graph_summary(session, DocsModel, doc_id, True) | ||
except Exception as e: | ||
return e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function add_documents_to_graph_db
has a large try-except block that catches all exceptions and returns the exception itself. This is not a good practice as it can obscure the source of errors and make debugging difficult. It's better to handle specific exceptions and log them appropriately.
main.py
Outdated
enriched_context = await user_context_enrichment(session, "user123", "How does cognitive architecture work?") | ||
``` | ||
""" | ||
neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url, username=config.graph_database_username, | ||
password=config.graph_database_password) | ||
|
||
await user_query_to_graph_db(session, user_id, query) | ||
|
||
semantic_mem = neo4j_graph_db.retrieve_semantic_memory(user_id=user_id) | ||
neo4j_graph_db.close() | ||
neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url, username=config.graph_database_username, | ||
password=config.graph_database_password) | ||
episodic_mem = neo4j_graph_db.retrieve_episodic_memory(user_id=user_id) | ||
neo4j_graph_db.close() | ||
# public_mem = neo4j_graph_db.retrieve_public_memory(user_id=user_id) | ||
|
||
|
||
|
||
if detect_language(query) != "en": | ||
query = translate_text(query, "sr", "en") | ||
logging.info("Translated query is", query) | ||
|
||
neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url, username=config.graph_database_username, | ||
password=config.graph_database_password) | ||
summaries = await neo4j_graph_db.get_memory_linked_document_summaries(user_id=user_id, memory_type=memory_type) | ||
neo4j_graph_db.close() | ||
logging.info("Summaries are is %s", summaries) | ||
# logging.info("Context from graphdb is %s", context) | ||
# result = neo4j_graph_db.query(document_categories_query) | ||
# summaries = [record.get("summary") for record in result] | ||
# logging.info('Possible document categories are', str(result)) | ||
# logging.info('Possible document categories are', str(categories)) | ||
relevant_summary_id = await classify_call( query= query, document_summaries=str(summaries)) | ||
|
||
|
||
# logging.info("Relevant categories after the classifier are %s", relevant_categories) | ||
neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url, username=config.graph_database_username, | ||
password=config.graph_database_password) | ||
postgres_id = await neo4j_graph_db.get_memory_linked_document_ids(user_id, summary_id = relevant_summary_id, memory_type=memory_type) | ||
neo4j_graph_db.close() | ||
# postgres_id = neo4j_graph_db.query(get_doc_ids) | ||
logging.info("Postgres ids are %s", postgres_id) | ||
namespace_id = await get_memory_name_by_doc_id(session, postgres_id[0]) | ||
logging.info("Namespace ids are %s", namespace_id) | ||
params= {"doc_id":postgres_id[0]} | ||
namespace_id = namespace_id[0] | ||
namespace_class = namespace_id + "_class" | ||
if memory_type =='PublicMemory': | ||
user_id = 'system_user' | ||
|
||
memory = await Memory.create_memory(user_id, session, namespace=namespace_id, job_id="23232", | ||
memory_label=namespace_id) | ||
|
||
existing_user = await Memory.check_existing_user(user_id, session) | ||
print("here is the existing user", existing_user) | ||
await memory.manage_memory_attributes(existing_user) | ||
|
||
print("Namespace id is %s", namespace_id) | ||
await memory.add_dynamic_memory_class(namespace_id.lower(), namespace_id) | ||
|
||
dynamic_memory_class = getattr(memory, namespace_class.lower(), None) | ||
|
||
methods_to_add = ["add_memories", "fetch_memories", "delete_memories"] | ||
|
||
if dynamic_memory_class is not None: | ||
for method_name in methods_to_add: | ||
await memory.add_method_to_class(dynamic_memory_class, method_name) | ||
print(f"Memory method {method_name} has been added") | ||
else: | ||
print(f"No attribute named in memory.") | ||
|
||
print("Available memory classes:", await memory.list_memory_classes()) | ||
results = await memory.dynamic_method_call(dynamic_memory_class, 'fetch_memories', | ||
observation=query, params=postgres_id[0], search_type="summary_filter_by_object_name") | ||
logging.info("Result is", str(results)) | ||
|
||
search_context = "" | ||
|
||
for result in results['data']['Get'][namespace_id]: | ||
# Assuming 'result' is a dictionary and has keys like 'source', 'text' | ||
source = result['source'] | ||
text = result['text'] | ||
search_context += f"Document source: {source}, Document text: {text} \n" | ||
|
||
context = f""" You are a memory system that uses cognitive architecture to enrich the | ||
LLM context and provide better query response. | ||
You have access to the following information: | ||
EPISODIC MEMORY: {episodic_mem} | ||
SEMANTIC MEMORY: {semantic_mem} | ||
PROCEDURAL MEMORY: NULL | ||
SEARCH CONTEXT: The following documents provided with sources they were | ||
extracted from could be used to provide an answer {search_context} | ||
The original user query: {query} | ||
""" | ||
if generative_response is not True: | ||
return context | ||
else: | ||
generative_result = generate_graph(context) | ||
translation_to_srb = translate_text(generative_result.response, "en", "sr") | ||
return translation_to_srb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function user_context_enrichment
is very long and does multiple things. It should be refactored into smaller functions, each with a single responsibility. Also, the function uses print statements for debugging, which should be replaced with proper logging.
async def create_public_memory(user_id: str=None, labels:list=None, topic:str=None) -> Optional[int]: | ||
""" | ||
Create a public memory node associated with a user in a Neo4j graph database. | ||
If Public Memory exists, it will return the id of the memory. | ||
This is intended as standalone node that can be attached to any user. | ||
It is not attached to any user by default. | ||
|
||
Args: | ||
user_id (str): The unique identifier for the user. | ||
session (AsyncSession): An asynchronous session for database operations. | ||
|
||
Returns: | ||
Optional[int]: The ID of the created public memory node or None if an error occurs. | ||
:param labels: Label for the memory, to help filter for different countries | ||
:param topic: Topic for the memory, to help provide a name | ||
|
||
""" | ||
# Validate input parameters | ||
if not labels: | ||
labels = ['sr'] # Labels for the memory node | ||
|
||
if not topic: | ||
topic = "PublicMemory" | ||
|
||
|
||
try: | ||
neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url, | ||
username=config.graph_database_username, | ||
password=config.graph_database_password) | ||
|
||
# Assuming the topic for public memory is predefined, e.g., "PublicMemory" | ||
# Create the memory node | ||
memory_id = await neo4j_graph_db.create_memory_node(labels=labels, topic=topic) | ||
neo4j_graph_db.close() | ||
return memory_id | ||
except Neo4jError as e: | ||
logging.error(f"Error creating public memory node: {e}") | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function create_public_memory
has a try-except block that catches a Neo4jError
but only logs the error without re-raising it or handling it in a way that the caller can react to. This could lead to silent failures where the caller is unaware that an error has occurred.
async def attach_user_to_memory(user_id: str=None, labels:list=None, topic:str=None) -> Optional[int]: | ||
""" | ||
Link user to public memory | ||
|
||
Args: | ||
user_id (str): The unique identifier for the user. | ||
topic (str): Memory name | ||
|
||
|
||
Returns: | ||
Optional[int]: The ID of the created public memory node or None if an error occurs. | ||
:param labels: Label for the memory, to help filter for different countries | ||
:param topic: Topic for the memory, to help provide a name | ||
|
||
""" | ||
# Validate input parameters | ||
if not user_id: | ||
raise ValueError("User ID is required.") | ||
if not labels: | ||
labels = ['sr'] # Labels for the memory node | ||
|
||
if not topic: | ||
topic = "PublicMemory" | ||
|
||
|
||
try: | ||
neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url, | ||
username=config.graph_database_username, | ||
password=config.graph_database_password) | ||
|
||
# Assuming the topic for public memory is predefined, e.g., "PublicMemory" | ||
ids = neo4j_graph_db.retrieve_node_id_for_memory_type(topic=topic) | ||
neo4j_graph_db.close() | ||
|
||
for id in ids: | ||
neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url, | ||
username=config.graph_database_username, | ||
password=config.graph_database_password) | ||
linked_memory = neo4j_graph_db.link_public_memory_to_user(memory_id=id.get('memoryId'), user_id=user_id) | ||
neo4j_graph_db.close() | ||
return 1 | ||
except Neo4jError as e: | ||
logging.error(f"Error creating public memory node: {e}") | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function attach_user_to_memory
raises a ValueError
if the user_id
is not provided, which is good for input validation. However, the rest of the function is wrapped in a try-except that catches Neo4jError
and logs it, similar to the previous function. This could also lead to silent failures.
async def unlink_user_from_memory(user_id: str=None, labels:list=None, topic:str=None) -> Optional[int]: | ||
""" | ||
Unlink user from memory | ||
|
||
Args: | ||
user_id (str): The unique identifier for the user. | ||
topic (str): Memory name | ||
|
||
Returns: | ||
Optional[int]: The ID of the created public memory node or None if an error occurs. | ||
:param labels: Label for the memory, to help filter for different countries | ||
:param topic: Topic for the memory, to help provide a name | ||
|
||
""" | ||
# Validate input parameters | ||
if not user_id: | ||
raise ValueError("User ID is required.") | ||
if not labels: | ||
labels = ['sr'] # Labels for the memory node | ||
|
||
if not topic: | ||
topic = "PublicMemory" | ||
|
||
|
||
try: | ||
neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url, | ||
username=config.graph_database_username, | ||
password=config.graph_database_password) | ||
|
||
# Assuming the topic for public memory is predefined, e.g., "PublicMemory" | ||
ids = neo4j_graph_db.retrieve_node_id_for_memory_type(topic=topic) | ||
neo4j_graph_db.close() | ||
|
||
for id in ids: | ||
neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url, | ||
username=config.graph_database_username, | ||
password=config.graph_database_password) | ||
linked_memory = neo4j_graph_db.unlink_memory_from_user(memory_id=id.get('memoryId'), user_id=user_id) | ||
neo4j_graph_db.close() | ||
return 1 | ||
except Neo4jError as e: | ||
logging.error(f"Error creating public memory node: {e}") | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function unlink_user_from_memory
has the same issue as the attach_user_to_memory
function, where it catches and logs Neo4jError
without proper error handling.
main.py
Outdated
# CREATE (episodic)-[:HAS_EVENT]->(location5_5e43f4c45b3c44ea897c12220db4c051) | ||
# CREATE (buffer)-[:CURRENTLY_HOLDING]->(location5_5e43f4c45b3c44ea897c12220db4c051) | ||
# CREATE (episodic)-[:HAS_EVENT]->(object6_5cdb87ad488c450c9dbce07b7daf3d8d) | ||
# CREATE (buffer)-[:CURRENTLY_HOLDING]->(object6_5cdb87ad488c450c9dbce07b7daf3d8d) | ||
# CREATE (episodic)-[:HAS_EVENT]->(information7_f756e3f3720c4fe5aeb01287badaf088) | ||
# CREATE (buffer)-[:CURRENTLY_HOLDING]->(information7_f756e3f3720c4fe5aeb01287badaf088) | ||
# CREATE (episodic)-[:HAS_EVENT]->(event8_da6334e744454264900296319e14b532) | ||
# CREATE (buffer)-[:CURRENTLY_HOLDING]->(event8_da6334e744454264900296319e14b532) | ||
# CREATE (episodic)-[:HAS_EVENT]->(action9_48e45419604e4d66b3e718ee1d6c095f) | ||
# CREATE (buffer)-[:CURRENTLY_HOLDING]->(action9_48e45419604e4d66b3e718ee1d6c095f) | ||
# CREATE (episodic)-[:HAS_EVENT]->(action10_f48acb1db4da4934afbe17363e9e63a4) | ||
# CREATE (buffer)-[:CURRENTLY_HOLDING]->(action10_f48acb1db4da4934afbe17363e9e63a4)""" | ||
# | ||
# out = cypher_statement_correcting(query) | ||
# print(out) | ||
# | ||
# out = await user_query_to_graph_db(session, user_id, "I walked in the forest yesterday and added to my list I need to buy some milk in the store and get a summary from a classical book i read yesterday") | ||
# print(out) | ||
# load_doc_to_graph = await add_documents_to_graph_db(session, user_id) | ||
# print(load_doc_to_graph) | ||
user_id = 'test_user' | ||
# loader_settings = { | ||
# "format": "PDF", | ||
# "source": "DEVICE", | ||
# "path": [".data"] | ||
# } | ||
# await load_documents_to_vectorstore(session, user_id, loader_settings=loader_settings) | ||
# await create_public_memory(user_id=user_id, labels=['sr'], topic="PublicMemory") | ||
# await add_documents_to_graph_db(session, user_id) | ||
# | ||
neo4j_graph_db = Neo4jGraphDB(url=config.graph_database_url, username=config.graph_database_username, | ||
password=config.graph_database_password) | ||
|
||
# await attach_user_to_memory(user_id=user_id, labels=['sr'], topic="PublicMemory") | ||
|
||
return_ = await user_context_enrichment(user_id=user_id, query="Koja je minimalna visina ograde na balkonu na stambenom objektu", session=session, memory_type="PublicMemory", generative_response=True) | ||
print(return_) | ||
# document_summary = { | ||
# 'DocumentCategory': 'Science', | ||
# 'Title': 'The Future of AI', | ||
# 'Summary': 'An insightful article about the advancements in AI.', | ||
# 'd_id': 'doc123' | ||
# } | ||
# | ||
# # Example user ID | ||
# user_id = 'user' | ||
# | ||
# # value = await neo4j_graph_db.create_memory_node(labels=['sr']) | ||
# # print(value) | ||
# # neo4j_graph_db.close() | ||
# | ||
# await add_documents_to_graph_db(session, user_id) | ||
# neo4j_graph_db.link_public_memory_to_user(memory_id = 17,user_id=user_id) | ||
# | ||
# ids = neo4j_graph_db.retrieve_node_id_for_memory_type(topic="Document") | ||
# print(ids) | ||
# | ||
# for id in ids: | ||
# print(id.get('memoryId')) | ||
# | ||
# neo4j_graph_db.delete_memory_node(memory_id = id.get('memoryId'), topic="Document") | ||
# | ||
# neo4j_graph_db.delete_memory_node(memory_id=16, topic="PublicSerbianArchitecture") | ||
# neo4j_graph_db.unlink_memory_from_user(memory_id = 17,user_id=user_id) | ||
# cypher_query_public = neo4j_graph_db.create_document_node_cypher(document_summary, user_id, memory_type="PUBLIC") | ||
# neo4j_graph_db.query(cypher_query_public) | ||
# link_memory_to_user(user_id, session) | ||
|
||
# neo4j_graph_db.create_memory_node(labels=['sr']) | ||
# out = await get_vectordb_namespace(session, user_id) | ||
# params = { | ||
# "version": "1.0", | ||
# "agreement_id": "AG123456", | ||
# "privacy_policy": "https://example.com/privacy", | ||
# "terms_of_service": "https://example.com/terms", | ||
# "format": "json", | ||
# "schema_version": "1.1", | ||
# "checksum": "a1b2c3d4e5f6", | ||
# "owner": "John Doe", | ||
# "license": "MIT", | ||
# "validity_start": "2023-08-01", | ||
# "validity_end": "2024-07-31", | ||
# } | ||
# loader_settings = { | ||
# "format": "PDF", | ||
# "source": "DEVICE", | ||
# "path": [".data"], | ||
# "strategy": "SUMMARY", | ||
# } | ||
# await load_documents_to_vectorstore(session, user_id, loader_settings=loader_settings) | ||
# await user_query_to_graph_db(session, user_id, "I walked in the forest yesterday and added to my list I need to buy some milk in the store and get a summary from a classical book i read yesterday") | ||
# await add_documents_to_graph_db(session, user_id, loader_settings=loader_settings) | ||
# await user_context_enrichment(session, user_id, query="Tell me about the book I read yesterday") | ||
|
||
|
||
if __name__ == "__main__": | ||
import asyncio | ||
|
||
|
||
asyncio.run(main()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main
function is mostly commented out, which suggests that it's either a work in progress or that the commented code is no longer needed. If it's the latter, the commented code should be removed to avoid confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 5
Configuration used: CodeRabbit UI
Files selected for processing (3)
- api.py (1 hunks)
- cognitive_architecture/classifiers/classifier.py (1 hunks)
- main.py (1 hunks)
Files skipped from review as they are similar to previous changes (2)
- cognitive_architecture/classifiers/classifier.py
- main.py
Additional comments: 1
api.py (1)
- 325-337: The
start_api_server
function is a utility to start the FastAPI server using uvicorn. The exception handling within this function is appropriate, logging the exception and potentially allowing for cleanup or recovery code to be added.
load_dotenv() | ||
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY", "") | ||
app = FastAPI(debug=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The environment variable OPENAI_API_KEY
is being loaded with a default empty string. This contradicts the previous review comment which suggested adding error handling if the key is not set. The application should not start without this key.
- OPENAI_API_KEY = os.getenv("OPENAI_API_KEY", "")
+ OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
+ if not OPENAI_API_KEY:
+ raise RuntimeError("OPENAI_API_KEY is not set in the environment variables.")
Committable suggestion
❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
load_dotenv() | |
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY", "") | |
app = FastAPI(debug=True) | |
load_dotenv() | |
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY") | |
if not OPENAI_API_KEY: | |
raise RuntimeError("OPENAI_API_KEY is not set in the environment variables.") | |
app = FastAPI(debug=True) |
@app.post("/add-memory", response_model=dict) | ||
async def add_memory( | ||
payload: Payload, | ||
# files: List[UploadFile] = File(...), | ||
): | ||
try: | ||
logging.info(" Adding to Memory ") | ||
decoded_payload = payload.payload | ||
async with session_scope(session=AsyncSessionLocal()) as session: | ||
from main import load_documents_to_vectorstore | ||
|
||
if 'settings' in decoded_payload and decoded_payload['settings'] is not None: | ||
settings_for_loader = decoded_payload['settings'] | ||
else: | ||
settings_for_loader = None | ||
|
||
if 'content' in decoded_payload and decoded_payload['content'] is not None: | ||
content = decoded_payload['content'] | ||
else: | ||
content = None | ||
|
||
output = await load_documents_to_vectorstore(session, decoded_payload['user_id'], content=content, loader_settings=settings_for_loader) | ||
return JSONResponse(content={"response": output}, status_code=200) | ||
|
||
except Exception as e: | ||
return JSONResponse( | ||
content={"response": {"error": str(e)}}, status_code=503 | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The /add-memory
endpoint's error handling could be improved by providing more specific HTTP status codes and messages based on the exception type, as previously suggested.
- return JSONResponse(content={"response": {"error": str(e)}}, status_code=503)
+ if isinstance(e, SomeSpecificException):
+ status_code = 400 # Bad Request or other appropriate status code
+ else:
+ status_code = 500 # Internal Server Error
+ return JSONResponse(content={"response": {"error": str(e)}}, status_code=status_code)
Committable suggestion
❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
@app.post("/add-memory", response_model=dict) | |
async def add_memory( | |
payload: Payload, | |
# files: List[UploadFile] = File(...), | |
): | |
try: | |
logging.info(" Adding to Memory ") | |
decoded_payload = payload.payload | |
async with session_scope(session=AsyncSessionLocal()) as session: | |
from main import load_documents_to_vectorstore | |
if 'settings' in decoded_payload and decoded_payload['settings'] is not None: | |
settings_for_loader = decoded_payload['settings'] | |
else: | |
settings_for_loader = None | |
if 'content' in decoded_payload and decoded_payload['content'] is not None: | |
content = decoded_payload['content'] | |
else: | |
content = None | |
output = await load_documents_to_vectorstore(session, decoded_payload['user_id'], content=content, loader_settings=settings_for_loader) | |
return JSONResponse(content={"response": output}, status_code=200) | |
except Exception as e: | |
return JSONResponse( | |
content={"response": {"error": str(e)}}, status_code=503 | |
) | |
except Exception as e: | |
if isinstance(e, SomeSpecificException): | |
status_code = 400 # Bad Request or other appropriate status code | |
else: | |
status_code = 500 # Internal Server Error | |
return JSONResponse(content={"response": {"error": str(e)}}, status_code=status_code) |
@app.post("/add-architecture-public-memory", response_model=dict) | ||
async def add_memory( | ||
payload: Payload, | ||
# files: List[UploadFile] = File(...), | ||
): | ||
try: | ||
logging.info(" Adding to Memory ") | ||
decoded_payload = payload.payload | ||
async with session_scope(session=AsyncSessionLocal()) as session: | ||
from main import load_documents_to_vectorstore | ||
if 'content' in decoded_payload and decoded_payload['content'] is not None: | ||
content = decoded_payload['content'] | ||
else: | ||
content = None | ||
|
||
user_id = 'system_user' | ||
loader_settings = { | ||
"format": "PDF", | ||
"source": "DEVICE", | ||
"path": [".data"] | ||
} | ||
|
||
output = await load_documents_to_vectorstore(session, user_id=user_id, content=content, loader_settings=loader_settings) | ||
return JSONResponse(content={"response": output}, status_code=200) | ||
|
||
except Exception as e: | ||
return JSONResponse( | ||
content={"response": {"error": str(e)}}, status_code=503 | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The /add-architecture-public-memory
endpoint seems to duplicate the /add-memory
endpoint's functionality with minor differences. The function name add_memory
is used for both endpoints, which will cause a conflict. The function name should be unique to avoid collision.
- async def add_memory(
+ async def add_architecture_public_memory(
Committable suggestion
❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
@app.post("/add-architecture-public-memory", response_model=dict) | |
async def add_memory( | |
payload: Payload, | |
# files: List[UploadFile] = File(...), | |
): | |
try: | |
logging.info(" Adding to Memory ") | |
decoded_payload = payload.payload | |
async with session_scope(session=AsyncSessionLocal()) as session: | |
from main import load_documents_to_vectorstore | |
if 'content' in decoded_payload and decoded_payload['content'] is not None: | |
content = decoded_payload['content'] | |
else: | |
content = None | |
user_id = 'system_user' | |
loader_settings = { | |
"format": "PDF", | |
"source": "DEVICE", | |
"path": [".data"] | |
} | |
output = await load_documents_to_vectorstore(session, user_id=user_id, content=content, loader_settings=loader_settings) | |
return JSONResponse(content={"response": output}, status_code=200) | |
except Exception as e: | |
return JSONResponse( | |
content={"response": {"error": str(e)}}, status_code=503 | |
) | |
@app.post("/add-architecture-public-memory", response_model=dict) | |
async def add_architecture_public_memory( | |
payload: Payload, | |
# files: List[UploadFile] = File(...), | |
): | |
try: | |
logging.info(" Adding to Memory ") | |
decoded_payload = payload.payload | |
async with session_scope(session=AsyncSessionLocal()) as session: | |
from main import load_documents_to_vectorstore | |
if 'content' in decoded_payload and decoded_payload['content'] is not None: | |
content = decoded_payload['content'] | |
else: | |
content = None | |
user_id = 'system_user' | |
loader_settings = { | |
"format": "PDF", | |
"source": "DEVICE", | |
"path": [".data"] | |
} | |
output = await load_documents_to_vectorstore(session, user_id=user_id, content=content, loader_settings=loader_settings) | |
return JSONResponse(content={"response": output}, status_code=200) | |
except Exception as e: | |
return JSONResponse( | |
content={"response": {"error": str(e)}}, status_code=503 | |
) |
@app.post("/drop-db") | ||
async def drop_db(payload: Payload): | ||
try: | ||
decoded_payload = payload.payload | ||
|
||
if decoded_payload['operation'] == 'drop': | ||
|
||
if os.environ.get('AWS_ENV') == 'dev': | ||
host = os.environ.get('POSTGRES_HOST') | ||
username = os.environ.get('POSTGRES_USER') | ||
password = os.environ.get('POSTGRES_PASSWORD') | ||
database_name = os.environ.get('POSTGRES_DB') | ||
else: | ||
pass | ||
|
||
from cognitive_architecture.database.create_database_tst import drop_database, create_admin_engine | ||
|
||
engine = create_admin_engine(username, password, host, database_name) | ||
drop_database(engine) | ||
return JSONResponse(content={"response": "DB dropped"}, status_code=200) | ||
else: | ||
|
||
if os.environ.get('AWS_ENV') == 'dev': | ||
host = os.environ.get('POSTGRES_HOST') | ||
username = os.environ.get('POSTGRES_USER') | ||
password = os.environ.get('POSTGRES_PASSWORD') | ||
database_name = os.environ.get('POSTGRES_DB') | ||
else: | ||
pass | ||
|
||
from cognitive_architecture.database.create_database_tst import create_database, create_admin_engine | ||
|
||
engine = create_admin_engine(username, password, host, database_name) | ||
create_database(engine) | ||
return JSONResponse(content={"response": " DB created"}, status_code=200) | ||
|
||
|
||
|
||
except Exception as e: | ||
return HTTPException(status_code=500, detail=str(e)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The /drop-db
endpoint has been updated to include logic for both dropping and creating the database based on the operation specified in the payload. However, the exception handling at the end of the function uses HTTPException
incorrectly. It should be raised, not returned.
- return HTTPException(status_code=500, detail=str(e))
+ raise HTTPException(status_code=500, detail=str(e))
Committable suggestion
❗ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
@app.post("/drop-db") | |
async def drop_db(payload: Payload): | |
try: | |
decoded_payload = payload.payload | |
if decoded_payload['operation'] == 'drop': | |
if os.environ.get('AWS_ENV') == 'dev': | |
host = os.environ.get('POSTGRES_HOST') | |
username = os.environ.get('POSTGRES_USER') | |
password = os.environ.get('POSTGRES_PASSWORD') | |
database_name = os.environ.get('POSTGRES_DB') | |
else: | |
pass | |
from cognitive_architecture.database.create_database_tst import drop_database, create_admin_engine | |
engine = create_admin_engine(username, password, host, database_name) | |
drop_database(engine) | |
return JSONResponse(content={"response": "DB dropped"}, status_code=200) | |
else: | |
if os.environ.get('AWS_ENV') == 'dev': | |
host = os.environ.get('POSTGRES_HOST') | |
username = os.environ.get('POSTGRES_USER') | |
password = os.environ.get('POSTGRES_PASSWORD') | |
database_name = os.environ.get('POSTGRES_DB') | |
else: | |
pass | |
from cognitive_architecture.database.create_database_tst import create_database, create_admin_engine | |
engine = create_admin_engine(username, password, host, database_name) | |
create_database(engine) | |
return JSONResponse(content={"response": " DB created"}, status_code=200) | |
except Exception as e: | |
return HTTPException(status_code=500, detail=str(e)) | |
@app.post("/drop-db") | |
async def drop_db(payload: Payload): | |
try: | |
decoded_payload = payload.payload | |
if decoded_payload['operation'] == 'drop': | |
if os.environ.get('AWS_ENV') == 'dev': | |
host = os.environ.get('POSTGRES_HOST') | |
username = os.environ.get('POSTGRES_USER') | |
password = os.environ.get('POSTGRES_PASSWORD') | |
database_name = os.environ.get('POSTGRES_DB') | |
else: | |
pass | |
from cognitive_architecture.database.create_database_tst import drop_database, create_admin_engine | |
engine = create_admin_engine(username, password, host, database_name) | |
drop_database(engine) | |
return JSONResponse(content={"response": "DB dropped"}, status_code=200) | |
else: | |
if os.environ.get('AWS_ENV') == 'dev': | |
host = os.environ.get('POSTGRES_HOST') | |
username = os.environ.get('POSTGRES_USER') | |
password = os.environ.get('POSTGRES_PASSWORD') | |
database_name = os.environ.get('POSTGRES_DB') | |
else: | |
pass | |
from cognitive_architecture.database.create_database_tst import create_database, create_admin_engine | |
engine = create_admin_engine(username, password, host, database_name) | |
create_database(engine) | |
return JSONResponse(content={"response": " DB created"}, status_code=200) | |
except Exception as e: | |
raise HTTPException(status_code=500, detail=str(e)) |
@app.post("/create-public-memory") | ||
async def create_public_memory(payload: Payload): | ||
try: | ||
decoded_payload = payload.payload | ||
|
||
if 'user_id' in decoded_payload and decoded_payload['user_id'] is not None: | ||
user_id = decoded_payload['user_id'] | ||
else: | ||
user_id = None | ||
|
||
if 'labels' in decoded_payload and decoded_payload['labels'] is not None: | ||
labels = decoded_payload['labels'] | ||
else: | ||
labels = None | ||
|
||
if 'topic' in decoded_payload and decoded_payload['topic'] is not None: | ||
topic = decoded_payload['topic'] | ||
else: | ||
topic = None | ||
|
||
# Execute the query - replace this with the actual execution method | ||
# async with session_scope(session=AsyncSessionLocal()) as session: | ||
# from main import create_public_memory | ||
# Assuming you have a method in Neo4jGraphDB to execute the query | ||
result = await create_public_memory(user_id=user_id, labels=labels, topic=topic) | ||
return JSONResponse(content={"response": result}, status_code=200) | ||
|
||
except Exception as e: | ||
raise HTTPException(status_code=500, detail=str(e)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The /create-public-memory
endpoint seems to have been updated with new logic. However, the commented-out code from lines 259-261 should be removed if it's no longer needed to keep the codebase clean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 9
Configuration used: CodeRabbit UI
Files selected for processing (3)
- cognitive_architecture/database/graph_database/graph.py (1 hunks)
- cognitive_architecture/database/vectordb/basevectordb.py (1 hunks)
- cognitive_architecture/database/vectordb/vectordb.py (6 hunks)
Files not summarized due to errors (1)
- cognitive_architecture/database/graph_database/graph.py: Error: Message exceeds token limit
Files skipped from review as they are similar to previous changes (2)
- cognitive_architecture/database/vectordb/basevectordb.py
- cognitive_architecture/database/vectordb/vectordb.py
|
||
from neo4j import AsyncSession | ||
from neo4j.exceptions import Neo4jError | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The debugging print statement should be removed or replaced with a logging statement if necessary.
- print(os.getcwd())
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
from typing import List, Dict, Optional | ||
from ...utils import format_dict, append_uuid_to_variable_names, create_edge_variable_mapping, \ | ||
create_node_variable_mapping, get_unsumarized_vector_db_namespace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The relative import from ...utils
should be replaced with an absolute import for clarity and to avoid issues with changing directory structures.
- from ...utils import format_dict, append_uuid_to_variable_names, create_edge_variable_mapping, \
- create_node_variable_mapping, get_unsumarized_vector_db_namespace
+ from cognitive_architecture.utils import format_dict, append_uuid_to_variable_names, create_edge_variable_mapping, \
+ create_node_variable_mapping, get_unsumarized_vector_db_namespace
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
from typing import List, Dict, Optional | |
from ...utils import format_dict, append_uuid_to_variable_names, create_edge_variable_mapping, \ | |
create_node_variable_mapping, get_unsumarized_vector_db_namespace | |
from typing import List, Dict, Optional | |
from cognitive_architecture.utils import format_dict, append_uuid_to_variable_names, create_edge_variable_mapping, \ | |
create_node_variable_mapping, get_unsumarized_vector_db_namespace |
try: | ||
with self.driver.session() as session: | ||
result = session.run(query, params).data() | ||
return result | ||
except Exception as e: | ||
logging.error(f"An error occurred while executing the query: {e}") | ||
raise e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The query
method in the Neo4jGraphDB
class catches a generic Exception
. It should catch more specific exceptions to avoid hiding unexpected errors.
- except Exception as e:
+ except Neo4jError as e:
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
try: | |
with self.driver.session() as session: | |
result = session.run(query, params).data() | |
return result | |
except Exception as e: | |
logging.error(f"An error occurred while executing the query: {e}") | |
raise e | |
try: | |
with self.driver.session() as session: | |
result = session.run(query, params).data() | |
return result | |
except Neo4jError as e: | |
logging.error(f"An error occurred while executing the query: {e}") | |
raise e |
logging.info("User Cypher Query raw: %s", cypher_statements_joined) | ||
# corrected_cypher_statements = self.cypher_statement_correcting(input = cypher_statements_joined) | ||
# logging.info("User Cypher Query: %s", corrected_cypher_statements.query) | ||
# return corrected_cypher_statements.query |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commented-out code for correcting Cypher statements should be either uncommented and tested or removed to avoid confusion.
- # corrected_cypher_statements = self.cypher_statement_correcting(input = cypher_statements_joined)
- # logging.info("User Cypher Query: %s", corrected_cypher_statements.query)
- # return corrected_cypher_statements.query
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
logging.info("User Cypher Query raw: %s", cypher_statements_joined) | |
# corrected_cypher_statements = self.cypher_statement_correcting(input = cypher_statements_joined) | |
# logging.info("User Cypher Query: %s", corrected_cypher_statements.query) | |
# return corrected_cypher_statements.query | |
logging.info("User Cypher Query raw: %s", cypher_statements_joined) |
async def create_memory_node(self, labels, topic=None): | ||
""" | ||
Create or find a memory node of the specified type with labels and a description. | ||
|
||
Args: | ||
labels (List[str]): A list of labels for the node. | ||
topic (str, optional): The type of memory node to create or find. Defaults to "PublicMemory". | ||
|
||
Returns: | ||
int: The ID of the created or found memory node. | ||
|
||
Raises: | ||
ValueError: If input parameters are invalid. | ||
Neo4jError: If an error occurs during the database operation. | ||
""" | ||
if topic is None: | ||
topic = "PublicMemory" | ||
|
||
# Prepare labels as a string | ||
label_list = ', '.join(f"'{label}'" for label in labels) | ||
|
||
# Cypher query to find or create the memory node with the given description and labels | ||
memory_cypher = f""" | ||
MERGE (memory:{topic} {{description: '{topic}', label: [{label_list}]}}) | ||
SET memory.memoryId = ID(memory) | ||
RETURN id(memory) AS memoryId | ||
""" | ||
|
||
try: | ||
result = self.query(memory_cypher) | ||
# Assuming the result is a list of records, where each record contains 'memoryId' | ||
memory_id = result[0]['memoryId'] if result else None | ||
self.close() | ||
return memory_id | ||
except Neo4jError as e: | ||
logging.error(f"Error creating or finding memory node: {e}") | ||
raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method create_memory_node
uses string formatting to create a Cypher query, which could be vulnerable to injection attacks. It should be parameterized to ensure safety.
- MERGE (memory:{topic} {{description: '{topic}', label: [{label_list}]}})
+ MERGE (memory:$topic {description: $description, label: $labels})
And update the query call to use parameters:
self.query(memory_cypher, params={"topic": topic, "description": topic, "labels": labels})
def link_public_memory_to_user(self, memory_id, user_id): | ||
# Link an existing Public Memory node to a User node | ||
link_cypher = f""" | ||
MATCH (user:User {{userId: '{user_id}'}}) | ||
MATCH (publicMemory:PublicMemory) WHERE id(publicMemory) = {memory_id} | ||
MERGE (user)-[:HAS_PUBLIC_MEMORY]->(publicMemory) | ||
""" | ||
self.query(link_cypher) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method link_public_memory_to_user
does not validate the memory_id
before using it in the Cypher query. It should ensure that memory_id
is an integer to prevent potential issues.
- MATCH (publicMemory:PublicMemory) WHERE id(publicMemory) = {memory_id}
+ MATCH (publicMemory:PublicMemory) WHERE id(publicMemory) = $memory_id
And update the query call to use parameters:
self.query(link_cypher, params={"memory_id": memory_id})
link_cypher = f""" MATCH(publicMemory: {topic}) | ||
RETURN | ||
id(publicMemory) | ||
AS | ||
memoryId """ | ||
node_ids = self.query(link_cypher) | ||
return node_ids |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method retrieve_node_id_for_memory_type
is using a Cypher query without parameterization. It should be updated to use parameters for topic
to prevent potential injection issues.
- MATCH(publicMemory: {topic})
+ MATCH(publicMemory: $topic)
And update the query call to use parameters:
self.query(link_cypher, params={"topic": topic})
# | ||
# async def link_memory_to_user(self, memory_id: int, user_id: str, relationship_type: str = 'HAS_MEMORY') -> None: | ||
# """ | ||
# Link a memory node to a user with a specified relationship type. | ||
# | ||
# Args: | ||
# memory_id (int): The ID of the memory node. | ||
# user_id (str): The user ID to link the memory to. | ||
# relationship_type (str): The type of relationship. | ||
# | ||
# Raises: | ||
# ValueError: If input parameters are invalid. | ||
# Neo4jError: If an error occurs during the database operation. | ||
# """ | ||
# if not user_id or not memory_id: | ||
# raise ValueError("User ID and Memory ID are required for linking.") | ||
# | ||
# try: | ||
# link_cypher = f""" | ||
# MATCH (user:User {{userId: '{user_id}'}}) | ||
# MATCH (memory) WHERE id(memory) = {memory_id} | ||
# MERGE (user)-[:{relationship_type}]->(memory) | ||
# """ | ||
# await self.query(link_cypher) | ||
# except Neo4jError as e: | ||
# logging.error(f"Error linking memory to user: {e}") | ||
# raise | ||
# | ||
# async def delete_memory_node(self, memory_id: int, memory_type: str) -> None: | ||
# """ | ||
# Delete a memory node of a specified type. | ||
# | ||
# Args: | ||
# memory_id (int): The ID of the memory node to delete. | ||
# memory_type (str): The type of the memory node. | ||
# | ||
# Raises: | ||
# ValueError: If input parameters are invalid. | ||
# Neo4jError: If an error occurs during the database operation. | ||
# """ | ||
# if not memory_id or not memory_type: | ||
# raise ValueError("Memory ID and Memory Type are required for deletion.") | ||
# | ||
# try: | ||
# delete_cypher = f""" | ||
# MATCH (memory:{memory_type}) WHERE id(memory) = {memory_id} | ||
# DETACH DELETE memory | ||
# """ | ||
# await self.query(delete_cypher) | ||
# except Neo4jError as e: | ||
# logging.error(f"Error deleting memory node: {e}") | ||
# raise | ||
# | ||
# async def unlink_memory_from_user(self, memory_id: int, user_id: str, relationship_type: str = 'HAS_MEMORY') -> None: | ||
# """ | ||
# Unlink a memory node from a user. | ||
# | ||
# Args: | ||
# memory_id (int): The ID of the memory node. | ||
# user_id (str): The user ID to unlink from the memory. | ||
# relationship_type (str): The type of relationship. | ||
# | ||
# Raises: | ||
# ValueError: If input parameters are invalid. | ||
# Neo4jError: If an error occurs during the database operation. | ||
# """ | ||
# if not user_id or not memory_id: | ||
# raise ValueError("User ID and Memory ID are required for unlinking.") | ||
# | ||
# try: | ||
# unlink_cypher = f""" | ||
# MATCH (user:User {{userId: '{user_id}'}})-[r:{relationship_type}]->(memory) WHERE id(memory) = {memory_id} | ||
# DELETE r | ||
# """ | ||
# await self.query(unlink_cypher) | ||
# except Neo4jError as e: | ||
# logging.error(f"Error unlinking memory from user: {e}") | ||
# raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are large blocks of commented-out code that seem to be previous versions of methods or alternative implementations. This should be cleaned up to avoid confusion and maintain code clarity.
# def create_public_memory(self, labels, topic=None): | ||
# if topic is None: | ||
# topic = "SerbianArchitecture" | ||
# topicMemoryId = topic + "MemoryId" | ||
# # Create an independent Architecture Memory node with countries as properties | ||
# label_list = ', '.join(f"'{label}'" for label in labels) # Prepare countries list as a string | ||
# architecture_memory_cypher = f""" | ||
# CREATE ({topic.lower()}:{topic} {{description: '{topic}', label: [{label_list}]}}) | ||
# RETURN id({topic.lower()}) AS {topicMemoryId} | ||
# """ | ||
# return self.query(architecture_memory_cypher) | ||
# | ||
# def link_public_memory_to_user(self, public_memory_id, user_id): | ||
# # Link an existing Public Memory node to a User node | ||
# link_cypher = f""" | ||
# MATCH (user:User {{userId: '{user_id}'}}) | ||
# MATCH (publicMemory:PublicMemory) WHERE id(publicMemory) = {public_memory_id} | ||
# MERGE (user)-[:HAS_PUBLIC_MEMORY]->(publicMemory) | ||
# """ | ||
# self.query(link_cypher) | ||
# | ||
# def link_public_memory_to_architecture(self, public_memory_id): | ||
# # Link the Public Memory node to the Architecture Memory node | ||
# link_cypher = f""" | ||
# MATCH (publicMemory:PublicMemory) WHERE id(publicMemory) = {public_memory_id} | ||
# MATCH (architecture:Architecture {{description: 'Architecture'}}) | ||
# MERGE (publicMemory)-[:INCLUDES]->(architecture) | ||
# """ | ||
# self.query(link_cypher) | ||
# | ||
# def delete_public_memory(self, public_memory_id): | ||
# # Delete a Public Memory node by its ID | ||
# delete_cypher = f""" | ||
# MATCH (publicMemory:PublicMemory) WHERE id(publicMemory) = {public_memory_id} | ||
# DETACH DELETE publicMemory | ||
# """ | ||
# self.query(delete_cypher) | ||
# | ||
# def delete_architecture_memory(self, architecture_memory_id): | ||
# # Delete an Architecture Memory node by its ID | ||
# delete_cypher = f""" | ||
# MATCH (architecture:Architecture) WHERE id(architecture) = {architecture_memory_id} | ||
# DETACH DELETE architecture | ||
# """ | ||
# self.query(delete_cypher) | ||
# | ||
# def unlink_public_memory_from_user(self, public_memory_id, user_id): | ||
# # Unlink a Public Memory node from a User node | ||
# unlink_cypher = f""" | ||
# MATCH (user:User {{userId: '{user_id}'}})-[r:HAS_PUBLIC_MEMORY]->(publicMemory:PublicMemory) WHERE id(publicMemory) = {public_memory_id} | ||
# DELETE r | ||
# """ | ||
# self.query(unlink_cypher) | ||
# | ||
# def unlink_public_memory_from_architecture(self, public_memory_id): | ||
# # Unlink the Public Memory node from the Architecture Memory node | ||
# unlink_cypher = f""" | ||
# MATCH (publicMemory:PublicMemory)-[r:INCLUDES]->(architecture:Architecture) WHERE id(publicMemory) = {public_memory_id} | ||
# DELETE r | ||
# """ | ||
# self.query(unlink_cypher) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More commented-out code is present at the end of the file. If this code is not needed, it should be removed. If it's meant to be used in the future, it should be documented as to why it's being retained.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 6
Configuration used: CodeRabbit UI
Files selected for processing (7)
- cognitive_architecture/database/graph_database/graph.py (1 hunks)
- cognitive_architecture/database/graph_database/networkx_graph.py (1 hunks)
- cognitive_architecture/database/vectordb/basevectordb.py (2 hunks)
- cognitive_architecture/llm/prompts/generate_graph_prompt.txt (1 hunks)
- cognitive_architecture/llm/queries.py (1 hunks)
- cognitive_architecture/shared/data_models.py (1 hunks)
- main.py (1 hunks)
Files skipped from review as they are similar to previous changes (2)
- cognitive_architecture/database/vectordb/basevectordb.py
- main.py
id: int | ||
description: str | ||
category: str | ||
color: str ="blue" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default value for the color
attribute in the Node
class is set as a mutable default argument. This can lead to unexpected behavior if the object is mutated.
- color: str ="blue"
+ color: str = Field(default="blue")
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
color: str ="blue" | |
color: str = Field(default="blue") |
source: int | ||
target: int | ||
description: str | ||
color: str= "blue" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same issue with mutable default arguments is present in the Edge
class for the color
attribute.
- color: str= "blue"
+ color: str = Field(default="blue")
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
color: str= "blue" | |
color: str = Field(default="blue") |
|
||
|
||
def generate_graph(input) -> KnowledgeGraph: | ||
model = "gpt-4-1106-preview" # Define the model here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hardcoding the model name within the generate_graph
function limits flexibility and maintainability. Consider using a configuration setting for the model name.
- model = "gpt-4-1106-preview" # Define the model here
+ model = config.model # Use the model from the configuration
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
model = "gpt-4-1106-preview" # Define the model here | |
model = config.model # Use the model from the configuration |
|
||
|
||
|
||
async def generate_summary(input) -> MemorySummary: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The generate_summary
function is defined as async
but does not perform any await
operations. If the aclient.chat.completions.create
method is not asynchronous, the async
keyword should be removed.
- async def generate_summary(input) -> MemorySummary:
+ def generate_summary(input) -> MemorySummary:
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
async def generate_summary(input) -> MemorySummary: | |
def generate_summary(input) -> MemorySummary: |
except (FileNotFoundError, EOFError, pickle.UnpicklingError): | ||
self.graph = nx.Graph() # Create a new graph if loading failed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Catching broad exceptions can mask other unexpected issues. Specify the exceptions that you expect to handle.
- except (FileNotFoundError, EOFError, pickle.UnpicklingError):
+ except (FileNotFoundError, EOFError, pickle.UnpicklingError) as e:
+ logging.error(f"Error loading graph: {e}")
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
except (FileNotFoundError, EOFError, pickle.UnpicklingError): | |
self.graph = nx.Graph() # Create a new graph if loading failed | |
except (FileNotFoundError, EOFError, pickle.UnpicklingError) as e: | |
logging.error(f"Error loading graph: {e}") | |
self.graph = nx.Graph() # Create a new graph if loading failed |
def generate_graph_semantic_memory_document_summary(self, document_summary, unique_graphdb_mapping_values, document_namespace, user_id): | ||
for node, attributes in unique_graphdb_mapping_values.items(): | ||
self.graph.add_node(node, **attributes) | ||
self.graph.add_edge(f"{user_id}_semantic", node, relation='HAS_KNOWLEDGE') | ||
self.save_graph() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method generate_graph_semantic_memory_document_summary
does not use the document_summary
parameter. If it's not needed, consider removing it.
- def generate_graph_semantic_memory_document_summary(self, document_summary, unique_graphdb_mapping_values, document_namespace, user_id):
+ def generate_graph_semantic_memory_document_summary(self, unique_graphdb_mapping_values, document_namespace, user_id):
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
def generate_graph_semantic_memory_document_summary(self, document_summary, unique_graphdb_mapping_values, document_namespace, user_id): | |
for node, attributes in unique_graphdb_mapping_values.items(): | |
self.graph.add_node(node, **attributes) | |
self.graph.add_edge(f"{user_id}_semantic", node, relation='HAS_KNOWLEDGE') | |
self.save_graph() | |
def generate_graph_semantic_memory_document_summary(self, unique_graphdb_mapping_values, document_namespace, user_id): | |
for node, attributes in unique_graphdb_mapping_values.items(): | |
self.graph.add_node(node, **attributes) | |
self.graph.add_edge(f"{user_id}_semantic", node, relation='HAS_KNOWLEDGE') | |
self.save_graph() |
Summary by CodeRabbit
New Features
Improvements
libgl1-mesa-glx
package.Config
object.memory_category
field to database models for better memory categorization.Documentation
Bug Fixes
Refactor
Style
Chores
ENV
variable and commented out the execution of thecreate_database.py
script.