Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cog 656 deployment state #368

Merged
merged 19 commits into from
Dec 13, 2024
Merged

Cog 656 deployment state #368

merged 19 commits into from
Dec 13, 2024

Conversation

dexters1
Copy link
Collaborator

@dexters1 dexters1 commented Dec 13, 2024

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced a new search type, COMPLETION, enhancing search functionality.
    • Added a new asynchronous function query_completion for processing queries.
    • Implemented a new function resolve_data_directories to manage data paths.
  • Enhancements

    • Improved data retrieval and error handling in dataset management API.
    • Enhanced permission management with user-specific checks and error reporting.
    • Added a new exception class UnauthorizedDataAccessError for better security.
  • Documentation

    • Updated prompts to emphasize brevity in responses.
  • Chores

    • Expanded module imports to include new classes and functions for better accessibility.

Add ability to send data directories to cognee

Feature COG-656
Resolve issue with UUID concat by casting to string

Fix COG-656
Remove code comments that are not needed

Chore COG-656
Added directory resolution as step in cognee add function

Feature COG-656
Add support for text data to resolving data directory task

Fix COG-656
Add resolving of directories as task for the add pipeline

Feature COG-656
No need to handle different data types in resolving directories, focus on just handling case when it's a directory

Fix COG-656
Rewrote endpoint which adds users to groups

Fix COG-656
Resolve issue with adding permissions to groups

Fix COG-656
… permission already given to group

Added error handling in case permission is already given to group and user is already part of group

Feature COG-656
Verify user has access to data before returning it

Feature COG-656
Add compute search to cognee which makes searches human readable

Feature COG-656
@dexters1 dexters1 self-assigned this Dec 13, 2024
Copy link
Contributor

coderabbitai bot commented Dec 13, 2024

Warning

Rate limit exceeded

@dexters1 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 17 minutes and 10 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 924759a and 35b1f7d.

📒 Files selected for processing (1)
  • cognee/modules/data/exceptions/exceptions.py (1 hunks)

Walkthrough

The changes in this pull request involve multiple modifications across various files in the Cognee API. Key updates include the introduction of a new asynchronous task in the add function, enhancements to user-specific data retrieval and error handling in dataset management, and the addition of new exception classes for better error management. Furthermore, new functionalities related to permissions, search types, and data resolution processes have been implemented, along with updates to method signatures to incorporate user identification. Overall, these changes enhance the API's functionality, control flow, and error handling capabilities.

Changes

File Change Summary
cognee/api/v1/add/add_v2.py Modified add function to include a new task using resolve_data_directories.
cognee/api/v1/datasets/routers/get_datasets_router.py Updated delete_data and get_raw_data to use user.id, refined error handling in get_raw_data, and modified get_dataset_data to return JSON error responses.
cognee/api/v1/permissions/routers/get_permissions_router.py Updated give_permission_to_group and add_user_to_group to accept group_id and user_id as strings, implemented asynchronous database queries.
cognee/api/v1/search/search_v2.py Added COMPLETION to SearchType enum and updated specific_search function to handle this new type.
cognee/infrastructure/llm/prompts/answer_simple_question.txt Added instruction for brevity in responses.
cognee/modules/data/methods/get_data.py Updated get_data function to include user_id parameter and added unauthorized access error handling.
cognee/modules/users/models/GroupPermission.py Introduced GroupPermission class with relevant columns.
cognee/modules/users/models/__init__.py Added imports for UserGroup and GroupPermission.
cognee/tasks/ingestion/__init__.py Imported resolve_data_directories function.
cognee/tasks/ingestion/resolve_data_directories.py Introduced resolve_data_directories function for resolving data directories.
cognee/modules/data/exceptions/__init__.py Added UnauthorizedDataAccessError exception.
cognee/modules/data/exceptions/exceptions.py Defined UnauthorizedDataAccessError class and modified UnstructuredLibraryImportError.
cognee/tasks/completion/__init__.py Imported query_completion function.
cognee/tasks/completion/exceptions/__init__.py Added module docstring and imported NoRelevantDataFound exception.
cognee/tasks/completion/exceptions/exceptions.py Introduced NoRelevantDataFound exception class.
cognee/tasks/completion/query_completion.py Added query_completion function for processing queries.

Possibly related PRs

Suggested reviewers

  • 0xideas
  • borisarzentar

Poem

🐰 In the code where changes hop,
New tasks and functions never stop.
With permissions tight and data bright,
The API dances in the light!
So let us cheer for every line,
As Cognee grows, it’s simply divine! 🌟


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Experiment)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@dexters1 dexters1 marked this pull request as ready for review December 13, 2024 14:34
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🧹 Outside diff range and nitpick comments (8)
cognee/tasks/compute/exceptions/__init__.py (1)

1-5: Enhance module docstring with more details

Consider expanding the docstring to include:

  • List of available exceptions
  • Example usage
  • Common scenarios where these exceptions are raised

Example enhancement:

"""
Custom exceptions for the Cognee API.

This module defines a set of exceptions for handling various compute errors
+ 
+ Available Exceptions:
+ - NoRelevantDataFound: Raised when search operations yield no results
+
+ Example Usage:
+     from cognee.tasks.compute.exceptions import NoRelevantDataFound
+     if not results:
+         raise NoRelevantDataFound
"""
cognee/tasks/compute/exceptions/exceptions.py (1)

4-11: LGTM! Consider adding type hints for status_code

The exception class is well-structured with appropriate default values. Consider adding type hints for the status_code parameter for consistency.

    def __init__(
            self,
            message: str = "Search did not find any data.",
            name: str = "NoRelevantDataFound",
-           status_code=status.HTTP_404_NOT_FOUND,
+           status_code: int = status.HTTP_404_NOT_FOUND,
    ):
cognee/tasks/compute/query_compute.py (3)

17-17: Consider making the chunk limit configurable

Hard-coding limit = 1 might be too restrictive. Consider making this a parameter with a reasonable default value to allow for more comprehensive context when needed.

-    found_chunks = await vector_engine.search("document_chunk_text", query, limit = 1)
+    found_chunks = await vector_engine.search(
+        "document_chunk_text",
+        query,
+        limit=config.DEFAULT_CHUNK_LIMIT
+    )

26-27: Consider externalizing prompt template paths

The hardcoded template paths should be moved to a configuration file for better maintainability.

-    user_prompt = render_prompt("context_for_question.txt", args)
-    system_prompt = read_query_prompt("answer_simple_question.txt")
+    user_prompt = render_prompt(config.CONTEXT_QUESTION_TEMPLATE, args)
+    system_prompt = read_query_prompt(config.ANSWER_QUESTION_TEMPLATE)

7-7: Consider using a structured response model

Using str as the response model might be too simple. Consider defining a structured response model for better type safety and validation.

Example:

from pydantic import BaseModel

class ComputeResponse(BaseModel):
    answer: str
    confidence: float
    source_chunks: list[str]

async def query_compute(query: str) -> list[ComputeResponse]:
    # ... existing code ...
    return [ComputeResponse(
        answer=computed_answer,
        confidence=0.95,  # from LLM if available
        source_chunks=[chunk.payload["text"] for chunk in found_chunks]
    )]

Also applies to: 36-36

cognee/api/v1/search/search_v2.py (1)

Line range hint 17-55: Implementation looks good with some architectural considerations

The integration of the COMPUTE search type follows the existing patterns and maintains important cross-cutting concerns:

  • User permission filtering
  • Query logging
  • Telemetry
  • Error handling

Consider documenting the following aspects:

  1. Expected response format from query_compute to maintain consistency
  2. Performance characteristics of compute operations
  3. Rate limiting requirements if compute operations are resource-intensive
cognee/tasks/ingestion/resolve_data_directories.py (2)

4-14: Documentation looks good, but consider adding more details

The function documentation is clear but could benefit from:

  • Example usage
  • Possible exceptions that might be raised
  • Limitations on directory size/depth

19-37: Consider adding file filtering and progress tracking

For better control and monitoring:

  1. Add file extension filtering
  2. Add progress tracking for large directories
  3. Consider implementing batch processing for memory efficiency

Here's a suggested implementation:

+from typing import Optional, Set
+
 async def resolve_data_directories(
     data: Union[BinaryIO, List[BinaryIO], str, List[str]],
-    include_subdirectories: bool = True
+    include_subdirectories: bool = True,
+    allowed_extensions: Optional[Set[str]] = None,
+    batch_size: int = 1000
 ):
     # ... existing docstring ...
     resolved_data = []
+    processed_count = 0
 
     for item in data:
         if isinstance(item, str):  # Check if the item is a path
             if os.path.isdir(item):  # If it's a directory
                 if include_subdirectories:
                     for root, _, files in os.walk(item):
-                        resolved_data.extend([os.path.join(root, f) for f in files])
+                        for f in files:
+                            if processed_count >= batch_size:
+                                yield resolved_data
+                                resolved_data = []
+                                processed_count = 0
+                            
+                            file_path = os.path.join(root, f)
+                            if allowed_extensions and not any(
+                                file_path.lower().endswith(ext.lower())
+                                for ext in allowed_extensions
+                            ):
+                                continue
+                            
+                            resolved_data.append(file_path)
+                            processed_count += 1
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8dc1ec5 and 67585d0.

📒 Files selected for processing (14)
  • cognee/api/v1/add/add_v2.py (2 hunks)
  • cognee/api/v1/datasets/routers/get_datasets_router.py (3 hunks)
  • cognee/api/v1/permissions/routers/get_permissions_router.py (1 hunks)
  • cognee/api/v1/search/search_v2.py (2 hunks)
  • cognee/infrastructure/llm/prompts/answer_simple_question.txt (1 hunks)
  • cognee/modules/data/methods/get_data.py (2 hunks)
  • cognee/modules/users/models/GroupPermission.py (1 hunks)
  • cognee/modules/users/models/__init__.py (1 hunks)
  • cognee/tasks/compute/__init__.py (1 hunks)
  • cognee/tasks/compute/exceptions/__init__.py (1 hunks)
  • cognee/tasks/compute/exceptions/exceptions.py (1 hunks)
  • cognee/tasks/compute/query_compute.py (1 hunks)
  • cognee/tasks/ingestion/__init__.py (1 hunks)
  • cognee/tasks/ingestion/resolve_data_directories.py (1 hunks)
✅ Files skipped from review due to trivial changes (4)
  • cognee/tasks/ingestion/init.py
  • cognee/infrastructure/llm/prompts/answer_simple_question.txt
  • cognee/tasks/compute/init.py
  • cognee/modules/users/models/init.py
🧰 Additional context used
🪛 Ruff (0.8.2)
cognee/api/v1/permissions/routers/get_permissions_router.py

36-36: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)


57-57: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

🔇 Additional comments (10)
cognee/tasks/compute/query_compute.py (1)

1-5: Verify vector engine and LLM client imports

Let's ensure these infrastructure components are properly initialized and configured.

✅ Verification successful

Vector engine and LLM client imports are properly used

The imports are correctly used throughout the codebase:

  • get_vector_engine() is properly initialized and used in multiple tasks for searching, indexing, and testing.
  • get_llm_client() is correctly utilized for text generation, transcription, and structured output creation.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for proper initialization of dependencies
rg -A 5 "get_vector_engine\(\)" --type py
rg -A 5 "get_llm_client\(\)" --type py

# Check for configuration patterns
rg -l "DEFAULT_CHUNK_LIMIT|CONTEXT_QUESTION_TEMPLATE|ANSWER_QUESTION_TEMPLATE" --type py

Length of output: 17927

cognee/api/v1/search/search_v2.py (3)

23-23: Verify existing SearchType enum usage

The addition of a new enum value could affect existing code that handles SearchType values.

✅ Verification successful

Let me analyze the search results. The code shows that:

  1. The SearchType enum is used in a dictionary mapping in search_v2.py where all enum values are handled exhaustively in the search_tasks dictionary
  2. The new COMPUTE value is already properly mapped to query_compute function in this dictionary
  3. There are no direct if/else comparisons with SearchType values in the codebase
  4. All usages of SearchType in the codebase are through the search function parameter, passing one of the enum values
  5. The implementation follows the existing pattern where each search type maps to a specific query function

The addition of COMPUTE search type is properly integrated

The new enum value is safely added with its corresponding query handler and follows the existing pattern. The code uses a dictionary-based dispatch pattern rather than if/else statements, making it resilient to enum additions.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for exhaustive SearchType handling in switch/if statements

# Look for switch/if statements handling SearchType
rg -B 2 -A 5 "SearchType\." --type py

# Look for explicit enum value checks
ast-grep --pattern 'if $_ == SearchType.$_:'

Length of output: 15497


55-55: Verify query_compute signature and error handling

Please ensure that the query_compute function follows the same contract as other search tasks:

  1. It should be an async function
  2. It should accept a query string parameter
  3. It should return a list of results
  4. It should handle errors consistently with other search tasks
✅ Verification successful

Based on the search results, I can now generate the final response as we have found the implementation details of query_compute and other query functions for comparison.

Function signature and error handling are correctly implemented

The query_compute function follows the same contract as other search tasks in the codebase:

  • It's an async function
  • It accepts a query string parameter with proper type annotation
  • It returns a list as specified in the return type annotation
  • It follows the same pattern as other query functions (query_summaries, query_chunks, query_graph_connections) by:
    • Using the vector engine
    • Having proper docstring documentation
    • Following the same parameter and return type structure
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify query_compute implementation details

# Check function signature and error handling
ast-grep --pattern 'async def query_compute($$$) {
  $$$
}'

# Look for error handling patterns
rg -B 2 -A 5 "raise.*Error" "cognee/tasks/compute"

Length of output: 112


Script:

#!/bin/bash
# Let's try a different approach to locate and analyze the query_compute function

# First, let's find where query_compute is defined
rg -l "def query_compute"

# Then get its implementation with context
rg -B 2 -A 10 "def query_compute"

# Also search for other query functions to compare signatures
rg -B 2 -A 10 "def query_\w+"

# Look for error handling patterns in the entire codebase
rg -B 2 -A 5 "raise.*Error" --type python

Length of output: 6513


17-17: Verify the query_compute module structure

Let's ensure the compute module is properly structured and exported.

✅ Verification successful

Import of query_compute module is correctly structured

The module is properly implemented and exported:

  • query_compute function exists in cognee/tasks/compute/query_compute.py
  • It's correctly exported in cognee/tasks/compute/__init__.py
  • The import path used in search_v2.py matches the actual implementation
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the compute module structure and exports

# Check if the module exists and has proper exports
fd "^(query_compute|__init__)\.py$" -p "cognee/tasks/compute"

# Check the module's interface
rg -A 5 "^def query_compute" "cognee/tasks/compute"

Length of output: 113


Script:

#!/bin/bash
# Let's try a broader search to locate the query_compute module
# First, find all Python files in the tasks directory
echo "=== Python files in tasks directory ==="
fd ".py$" "cognee/tasks"

# Then, search for any mentions of query_compute
echo -e "\n=== Mentions of query_compute ==="
rg "query_compute" --type py

# Check if there's a compute directory or file anywhere in the codebase
echo -e "\n=== Files/directories with 'compute' ==="
fd "compute"

Length of output: 3296


Script:

#!/bin/bash
# Let's verify the implementation and exports of query_compute
echo "=== query_compute implementation ==="
cat cognee/tasks/compute/query_compute.py

echo -e "\n=== __init__.py exports ==="
cat cognee/tasks/compute/__init__.py

Length of output: 1392

cognee/modules/data/methods/get_data.py (1)

6-6: LGTM: Well-documented parameter addition

The addition of user_id parameter is properly documented in the function signature and docstring.

Also applies to: 10-10

cognee/api/v1/datasets/routers/get_datasets_router.py (2)

79-79: LGTM: Proper user context propagation

The get_data call correctly includes the user context for permission checking.


Line range hint 144-171: Optimize data retrieval and standardize error handling

The current implementation has several areas for improvement:

  1. Multiple database calls that could be optimized
  2. Inconsistent error response formats (JSONResponse vs EntityNotFoundError)
  3. Redundant data validation

Consider this refactoring:

@router.get("/{dataset_id}/data/{data_id}/raw", response_class=FileResponse)
async def get_raw_data(dataset_id: str, data_id: str, user: User = Depends(get_authenticated_user)):
    from cognee.modules.data.methods import get_data
    from cognee.modules.data.methods import get_dataset, get_dataset_data

    # First verify dataset access
    dataset = await get_dataset(user.id, dataset_id)
    if dataset is None:
        raise EntityNotFoundError(message=f"Dataset ({dataset_id}) not found.")

-    dataset_data = await get_dataset_data(dataset.id)
-    if dataset_data is None:
-        raise EntityNotFoundError(message=f"No data found in dataset ({dataset_id}).")
-
-    matching_data = [data for data in dataset_data if str(data.id) == data_id]
-    if len(matching_data) == 0:
-        raise EntityNotFoundError(message= f"Data ({data_id}) not found in dataset ({dataset_id}).")

    # Get and verify data access
    data = await get_data(user.id, data_id)
    if data is None:
        raise EntityNotFoundError(message=f"Data ({data_id}) not found.")

    return data.raw_data_location

This refactoring:

  1. Removes redundant data validation
  2. Standardizes on EntityNotFoundError
  3. Reduces database calls

Let's verify the relationship between datasets and data:

cognee/api/v1/add/add_v2.py (2)

17-17: Consider adding error handling for the new task

The new resolve_data_directories task is added to the pipeline, but there's no explicit error handling if the resolution fails. Consider adding try-catch blocks or error handling mechanisms.

Let's check if there's any error handling in the pipeline implementation:


16-19: Verify pipeline changes impact

The addition of resolve_data_directories task before ingest_data_with_metadata modifies the existing pipeline. Please ensure this change doesn't break existing functionality.

Let's check for existing usages:

✅ Verification successful

Let me gather more context about the pipeline and task usage patterns.


Let me check the implementation of these specific functions to understand their relationship and dependencies.


Pipeline modification is safe and logically correct

The addition of resolve_data_directories task before ingest_data_with_metadata is a safe change because:

  1. ingest_data_with_metadata is only used in this specific pipeline (add_v2.py)
  2. resolve_data_directories prepares the input by resolving directories into their contained files, which is a logical prerequisite for data ingestion
  3. The function signatures show that resolve_data_directories accepts the same input types that ingest_data_with_metadata can process (files and binary streams)
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for other pipeline configurations using ingest_data_with_metadata
rg "Task\(ingest_data_with_metadata" --type py

Length of output: 136


Script:

#!/bin/bash
# Search for other pipeline configurations and task definitions
rg "Task\(" -A 3 --type py

# Look for the implementation of ingest_data_with_metadata
ast-grep --pattern 'def ingest_data_with_metadata($$$)'

# Look for resolve_data_directories implementation
ast-grep --pattern 'def resolve_data_directories($$$)'

Length of output: 5718


Script:

#!/bin/bash
# Search for the implementation and usages of both functions with context
rg "def (ingest_data_with_metadata|resolve_data_directories)" -A 5 --type py

# Look for any imports or references to these functions
rg "(ingest_data_with_metadata|resolve_data_directories)" --type py

Length of output: 2086

cognee/api/v1/permissions/routers/get_permissions_router.py (1)

18-18: Validate permission parameter before database operations

Ensure that the permission string is valid and properly formatted to prevent SQL injection or unexpected behavior.


class SearchType(Enum):
SUMMARIES = "SUMMARIES"
INSIGHTS = "INSIGHTS"
CHUNKS = "CHUNKS"
COMPUTE = "COMPUTE"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant completion like it usually exists in LLM APIs -> https://llm-engine.scale.com/guides/completions/

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will make the change

cognee/modules/data/methods/get_data.py Outdated Show resolved Hide resolved
Raise unauth access error when trying to read data without access

Feature COG-656
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (1)
cognee/modules/data/methods/get_data.py (1)

Line range hint 7-16: Enhance docstring with exception documentation

The docstring should document the possible exceptions that can be raised.

Apply this enhancement:

 async def get_data(user_id: UUID, data_id: UUID) -> Optional[Data]:
     """Retrieve data by ID.
 
         Args:
             user_id (UUID): user ID
             data_id (UUID): ID of the data to retrieve
 
         Returns:
             Optional[Data]: The requested data object if found, None otherwise
+
+        Raises:
+            UnauthorizedDataAccessError: If the user does not own the requested data
     """
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 67585d0 and 11634cb.

📒 Files selected for processing (3)
  • cognee/modules/data/exceptions/__init__.py (1 hunks)
  • cognee/modules/data/exceptions/exceptions.py (1 hunks)
  • cognee/modules/data/methods/get_data.py (2 hunks)
🔇 Additional comments (3)
cognee/modules/data/exceptions/__init__.py (1)

9-9: LGTM: Clean import addition

The import statement is properly placed and follows Python's standard practices for exception handling.

cognee/modules/data/exceptions/exceptions.py (1)

10-11: LGTM: Proper super() initialization added

The addition of the super() call ensures proper initialization of the parent class.

cognee/modules/data/methods/get_data.py (1)

22-24: Consider explicit handling of "not found" case

The current implementation implicitly returns None for not found cases. Consider making this more explicit for better maintainability.

Consider this approach:

-        if data and data.owner_id != user_id:
+        if not data:
+            return None
+            
+        if data.owner_id != user_id:
             raise UnauthorizedDataAccessError(message=f"User {user_id} is not authorized to access data {data_id}")
 
         return data

cognee/modules/data/exceptions/exceptions.py Outdated Show resolved Hide resolved
Rename searching type from compute to completion

Refactor COG-656
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (3)
cognee/tasks/completion/exceptions/__init__.py (1)

7-9: Consider future-proofing the import statement.

While the current import is clean, consider using parentheses for imports from the start, as it makes adding future exceptions easier without changing the syntax.

cognee/tasks/completion/exceptions/exceptions.py (1)

4-11: Consider enhancing the error message for better debugging.

While the implementation is solid, the default error message could be more informative for debugging purposes.

Consider this enhancement:

     def __init__(
             self,
-            message: str = "Search did not find any data.",
+            message: str = "Search operation completed but no relevant data was found in the index.",
             name: str = "NoRelevantDataFound",
             status_code=status.HTTP_404_NOT_FOUND,
     ):
         super().__init__(message, name, status_code)

Also, consider adding a docstring to the class to document its specific use case and when it should be raised.

cognee/tasks/completion/query_completion.py (1)

7-14: Function signature needs improvement

The function signature could be enhanced with:

  1. Type hints for exceptions that may be raised
  2. Additional parameters for customization (e.g., limit, threshold)

Consider updating the docstring:

 async def query_completion(query: str) -> list:
     """
+    Computes an answer to a query by searching relevant chunks and using LLM.
+
     Parameters:
     - query (str): The query string to compute.
 
     Returns:
     - list: Answer to the query.
+
+    Raises:
+    - NoRelevantDataFound: When no relevant chunks are found
     """
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 11634cb and 924759a.

📒 Files selected for processing (5)
  • cognee/api/v1/search/search_v2.py (2 hunks)
  • cognee/tasks/completion/__init__.py (1 hunks)
  • cognee/tasks/completion/exceptions/__init__.py (1 hunks)
  • cognee/tasks/completion/exceptions/exceptions.py (1 hunks)
  • cognee/tasks/completion/query_completion.py (1 hunks)
🔇 Additional comments (5)
cognee/tasks/completion/__init__.py (1)

1-1: LGTM! Clean package-level import.

The import statement follows Python conventions and provides clean package-level access to the query_completion function.

cognee/tasks/completion/exceptions/__init__.py (1)

1-5: LGTM! Clear and concise module documentation.

The docstring effectively communicates the module's purpose and scope.

cognee/tasks/completion/exceptions/exceptions.py (1)

1-2: LGTM! Appropriate imports.

The imports are minimal and correctly bring in the required dependencies.

cognee/tasks/completion/query_completion.py (1)

22-28: Verify prompt template paths

The code assumes the existence of template files but doesn't handle missing templates.

Let's verify the template files exist:

✅ Verification successful

Template files are present and correctly located

Both template files exist at the expected location in cognee/infrastructure/llm/prompts/:

  • context_for_question.txt
  • answer_simple_question.txt
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if prompt template files exist
fd -t f "context_for_question.txt|answer_simple_question.txt"

Length of output: 183

cognee/api/v1/search/search_v2.py (1)

23-23: Consider renaming COMPLETION based on previous feedback

Based on the previous review discussion about LLM API conventions, consider a more standard name.

Consider renaming to align with standard LLM API terminology:

-    COMPLETION = "COMPLETION"
+    COMPLETIONS = "COMPLETIONS"

cognee/tasks/completion/query_completion.py Show resolved Hide resolved
cognee/tasks/completion/query_completion.py Show resolved Hide resolved
cognee/api/v1/search/search_v2.py Show resolved Hide resolved
Update typo in string in code

Chore COG-656
@dexters1 dexters1 merged commit 2508727 into dev Dec 13, 2024
24 checks passed
@dexters1 dexters1 deleted the COG-656-deployment-state branch December 13, 2024 16:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants