Cog 656 deployment state #368

dexters1 · 2024-12-13T14:21:11Z

Summary by CodeRabbit

Release Notes

New Features
- Introduced a new search type, COMPLETION, enhancing search functionality.
- Added a new asynchronous function query_completion for processing queries.
- Implemented a new function resolve_data_directories to manage data paths.
Enhancements
- Improved data retrieval and error handling in dataset management API.
- Enhanced permission management with user-specific checks and error reporting.
- Added a new exception class UnauthorizedDataAccessError for better security.
Documentation
- Updated prompts to emphasize brevity in responses.
Chores
- Expanded module imports to include new classes and functions for better accessibility.

Add ability to send data directories to cognee Feature COG-656

Resolve issue with UUID concat by casting to string Fix COG-656

Remove code comments that are not needed Chore COG-656

Added directory resolution as step in cognee add function Feature COG-656

Add support for text data to resolving data directory task Fix COG-656

…nto COG-656-deployment-state

…oyment-state

Add resolving of directories as task for the add pipeline Feature COG-656

No need to handle different data types in resolving directories, focus on just handling case when it's a directory Fix COG-656

Rewrote endpoint which adds users to groups Fix COG-656

Resolve issue with adding permissions to groups Fix COG-656

… permission already given to group Added error handling in case permission is already given to group and user is already part of group Feature COG-656

Verify user has access to data before returning it Feature COG-656

Add compute search to cognee which makes searches human readable Feature COG-656

coderabbitai · 2024-12-13T14:21:18Z

Warning

Rate limit exceeded

@dexters1 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 17 minutes and 10 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 924759a and 35b1f7d.

📒 Files selected for processing (1)

cognee/modules/data/exceptions/exceptions.py (1 hunks)

Walkthrough

The changes in this pull request involve multiple modifications across various files in the Cognee API. Key updates include the introduction of a new asynchronous task in the add function, enhancements to user-specific data retrieval and error handling in dataset management, and the addition of new exception classes for better error management. Furthermore, new functionalities related to permissions, search types, and data resolution processes have been implemented, along with updates to method signatures to incorporate user identification. Overall, these changes enhance the API's functionality, control flow, and error handling capabilities.

Changes

File	Change Summary
`cognee/api/v1/add/add_v2.py`	Modified `add` function to include a new task using `resolve_data_directories`.
`cognee/api/v1/datasets/routers/get_datasets_router.py`	Updated `delete_data` and `get_raw_data` to use `user.id`, refined error handling in `get_raw_data`, and modified `get_dataset_data` to return JSON error responses.
`cognee/api/v1/permissions/routers/get_permissions_router.py`	Updated `give_permission_to_group` and `add_user_to_group` to accept `group_id` and `user_id` as strings, implemented asynchronous database queries.
`cognee/api/v1/search/search_v2.py`	Added `COMPLETION` to `SearchType` enum and updated `specific_search` function to handle this new type.
`cognee/infrastructure/llm/prompts/answer_simple_question.txt`	Added instruction for brevity in responses.
`cognee/modules/data/methods/get_data.py`	Updated `get_data` function to include `user_id` parameter and added unauthorized access error handling.
`cognee/modules/users/models/GroupPermission.py`	Introduced `GroupPermission` class with relevant columns.
`cognee/modules/users/models/__init__.py`	Added imports for `UserGroup` and `GroupPermission`.
`cognee/tasks/ingestion/__init__.py`	Imported `resolve_data_directories` function.
`cognee/tasks/ingestion/resolve_data_directories.py`	Introduced `resolve_data_directories` function for resolving data directories.
`cognee/modules/data/exceptions/__init__.py`	Added `UnauthorizedDataAccessError` exception.
`cognee/modules/data/exceptions/exceptions.py`	Defined `UnauthorizedDataAccessError` class and modified `UnstructuredLibraryImportError`.
`cognee/tasks/completion/__init__.py`	Imported `query_completion` function.
`cognee/tasks/completion/exceptions/__init__.py`	Added module docstring and imported `NoRelevantDataFound` exception.
`cognee/tasks/completion/exceptions/exceptions.py`	Introduced `NoRelevantDataFound` exception class.
`cognee/tasks/completion/query_completion.py`	Added `query_completion` function for processing queries.

Possibly related PRs

Cog 170 pgvector adapter #158: Introduces a new task using resolve_data_directories in the add function, related to the import in cognee/tasks/ingestion/__init__.py.
Cog 337 llama index support #186: Modifies the add function in add_v2.py, directly related to the main PR's changes in the same file.
fix: refactor get_graph_from_model to return nodes and edges correctly #257: Involves task execution order changes, potentially related to control flow changes in the add function.
Cog 505 data dataset model changes #260: Enhances data model modifications that align with the main PR's focus on data processing improvements.

Suggested reviewers

0xideas
borisarzentar

Poem

🐰 In the code where changes hop,
New tasks and functions never stop.
With permissions tight and data bright,
The API dances in the light!
So let us cheer for every line,
As Cognee grows, it’s simply divine! 🌟

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Experiment)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

Add simple instruction for system prompt Feature COG-656

coderabbitai

Actionable comments posted: 10

🧹 Outside diff range and nitpick comments (8)

cognee/tasks/compute/exceptions/__init__.py (1)
1-5: Enhance module docstring with more details

Consider expanding the docstring to include:

List of available exceptions

Example usage

Common scenarios where these exceptions are raised

Example enhancement:
"""
Custom exceptions for the Cognee API.

This module defines a set of exceptions for handling various compute errors
+ 
+ Available Exceptions:
+ - NoRelevantDataFound: Raised when search operations yield no results
+
+ Example Usage:
+     from cognee.tasks.compute.exceptions import NoRelevantDataFound
+     if not results:
+         raise NoRelevantDataFound
"""
cognee/tasks/compute/exceptions/exceptions.py (1)
4-11: LGTM! Consider adding type hints for status_code

The exception class is well-structured with appropriate default values. Consider adding type hints for the status_code parameter for consistency.
    def __init__(
            self,
            message: str = "Search did not find any data.",
            name: str = "NoRelevantDataFound",
-           status_code=status.HTTP_404_NOT_FOUND,
+           status_code: int = status.HTTP_404_NOT_FOUND,
    ):
cognee/tasks/compute/query_compute.py (3)
17-17: Consider making the chunk limit configurable

Hard-coding limit = 1 might be too restrictive. Consider making this a parameter with a reasonable default value to allow for more comprehensive context when needed.
-    found_chunks = await vector_engine.search("document_chunk_text", query, limit = 1)
+    found_chunks = await vector_engine.search(
+        "document_chunk_text",
+        query,
+        limit=config.DEFAULT_CHUNK_LIMIT
+    )
26-27: Consider externalizing prompt template paths

The hardcoded template paths should be moved to a configuration file for better maintainability.
-    user_prompt = render_prompt("context_for_question.txt", args)
-    system_prompt = read_query_prompt("answer_simple_question.txt")
+    user_prompt = render_prompt(config.CONTEXT_QUESTION_TEMPLATE, args)
+    system_prompt = read_query_prompt(config.ANSWER_QUESTION_TEMPLATE)
7-7: Consider using a structured response model

Using str as the response model might be too simple. Consider defining a structured response model for better type safety and validation.

Example:
from pydantic import BaseModel

class ComputeResponse(BaseModel):
    answer: str
    confidence: float
    source_chunks: list[str]

async def query_compute(query: str) -> list[ComputeResponse]:
    # ... existing code ...
    return [ComputeResponse(
        answer=computed_answer,
        confidence=0.95,  # from LLM if available
        source_chunks=[chunk.payload["text"] for chunk in found_chunks]
    )]
Also applies to: 36-36
cognee/api/v1/search/search_v2.py (1)

Line range hint 17-55: Implementation looks good with some architectural considerations

The integration of the COMPUTE search type follows the existing patterns and maintains important cross-cutting concerns:

User permission filtering

Query logging

Telemetry

Error handling

Consider documenting the following aspects:

Expected response format from query_compute to maintain consistency

Performance characteristics of compute operations

Rate limiting requirements if compute operations are resource-intensive
cognee/tasks/ingestion/resolve_data_directories.py (2)
4-14: Documentation looks good, but consider adding more details

The function documentation is clear but could benefit from:

Example usage

Possible exceptions that might be raised

Limitations on directory size/depth

19-37: Consider adding file filtering and progress tracking

For better control and monitoring:

Add file extension filtering

Add progress tracking for large directories

Consider implementing batch processing for memory efficiency

Here's a suggested implementation:
+from typing import Optional, Set
+
 async def resolve_data_directories(
     data: Union[BinaryIO, List[BinaryIO], str, List[str]],
-    include_subdirectories: bool = True
+    include_subdirectories: bool = True,
+    allowed_extensions: Optional[Set[str]] = None,
+    batch_size: int = 1000
 ):
     # ... existing docstring ...
     resolved_data = []
+    processed_count = 0
 
     for item in data:
         if isinstance(item, str):  # Check if the item is a path
             if os.path.isdir(item):  # If it's a directory
                 if include_subdirectories:
                     for root, _, files in os.walk(item):
-                        resolved_data.extend([os.path.join(root, f) for f in files])
+                        for f in files:
+                            if processed_count >= batch_size:
+                                yield resolved_data
+                                resolved_data = []
+                                processed_count = 0
+                            
+                            file_path = os.path.join(root, f)
+                            if allowed_extensions and not any(
+                                file_path.lower().endswith(ext.lower())
+                                for ext in allowed_extensions
+                            ):
+                                continue
+                            
+                            resolved_data.append(file_path)
+                            processed_count += 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8dc1ec5 and 67585d0.

📒 Files selected for processing (14)

cognee/api/v1/add/add_v2.py (2 hunks)
cognee/api/v1/datasets/routers/get_datasets_router.py (3 hunks)
cognee/api/v1/permissions/routers/get_permissions_router.py (1 hunks)
cognee/api/v1/search/search_v2.py (2 hunks)
cognee/infrastructure/llm/prompts/answer_simple_question.txt (1 hunks)
cognee/modules/data/methods/get_data.py (2 hunks)
cognee/modules/users/models/GroupPermission.py (1 hunks)
cognee/modules/users/models/__init__.py (1 hunks)
cognee/tasks/compute/__init__.py (1 hunks)
cognee/tasks/compute/exceptions/__init__.py (1 hunks)
cognee/tasks/compute/exceptions/exceptions.py (1 hunks)
cognee/tasks/compute/query_compute.py (1 hunks)
cognee/tasks/ingestion/__init__.py (1 hunks)
cognee/tasks/ingestion/resolve_data_directories.py (1 hunks)

✅ Files skipped from review due to trivial changes (4)

cognee/tasks/ingestion/init.py
cognee/infrastructure/llm/prompts/answer_simple_question.txt
cognee/tasks/compute/init.py
cognee/modules/users/models/init.py

🧰 Additional context used

🪛 Ruff (0.8.2)

cognee/api/v1/permissions/routers/get_permissions_router.py

36-36: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

57-57: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

🔇 Additional comments (10)

cognee/tasks/compute/query_compute.py (1)

1-5: Verify vector engine and LLM client imports

Let's ensure these infrastructure components are properly initialized and configured.

✅ Verification successful

Vector engine and LLM client imports are properly used

The imports are correctly used throughout the codebase:

get_vector_engine() is properly initialized and used in multiple tasks for searching, indexing, and testing.
get_llm_client() is correctly utilized for text generation, transcription, and structured output creation.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for proper initialization of dependencies
rg -A 5 "get_vector_engine\(\)" --type py
rg -A 5 "get_llm_client\(\)" --type py

# Check for configuration patterns
rg -l "DEFAULT_CHUNK_LIMIT|CONTEXT_QUESTION_TEMPLATE|ANSWER_QUESTION_TEMPLATE" --type py

Length of output: 17927

cognee/api/v1/search/search_v2.py (3)

23-23: Verify existing SearchType enum usage

The addition of a new enum value could affect existing code that handles SearchType values.

✅ Verification successful

Let me analyze the search results. The code shows that:

The SearchType enum is used in a dictionary mapping in search_v2.py where all enum values are handled exhaustively in the search_tasks dictionary
The new COMPUTE value is already properly mapped to query_compute function in this dictionary
There are no direct if/else comparisons with SearchType values in the codebase
All usages of SearchType in the codebase are through the search function parameter, passing one of the enum values
The implementation follows the existing pattern where each search type maps to a specific query function

The addition of COMPUTE search type is properly integrated

The new enum value is safely added with its corresponding query handler and follows the existing pattern. The code uses a dictionary-based dispatch pattern rather than if/else statements, making it resilient to enum additions.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for exhaustive SearchType handling in switch/if statements

# Look for switch/if statements handling SearchType
rg -B 2 -A 5 "SearchType\." --type py

# Look for explicit enum value checks
ast-grep --pattern 'if $_ == SearchType.$_:'

Length of output: 15497

55-55: Verify query_compute signature and error handling

Please ensure that the query_compute function follows the same contract as other search tasks:

It should be an async function
It should accept a query string parameter
It should return a list of results
It should handle errors consistently with other search tasks

✅ Verification successful

Based on the search results, I can now generate the final response as we have found the implementation details of query_compute and other query functions for comparison.

Function signature and error handling are correctly implemented

The query_compute function follows the same contract as other search tasks in the codebase:

It's an async function
It accepts a query string parameter with proper type annotation
It returns a list as specified in the return type annotation
It follows the same pattern as other query functions (query_summaries, query_chunks, query_graph_connections) by:
- Using the vector engine
- Having proper docstring documentation
- Following the same parameter and return type structure

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify query_compute implementation details

# Check function signature and error handling
ast-grep --pattern 'async def query_compute($$$) {
  $$$
}'

# Look for error handling patterns
rg -B 2 -A 5 "raise.*Error" "cognee/tasks/compute"

Length of output: 112

Script:

#!/bin/bash
# Let's try a different approach to locate and analyze the query_compute function

# First, let's find where query_compute is defined
rg -l "def query_compute"

# Then get its implementation with context
rg -B 2 -A 10 "def query_compute"

# Also search for other query functions to compare signatures
rg -B 2 -A 10 "def query_\w+"

# Look for error handling patterns in the entire codebase
rg -B 2 -A 5 "raise.*Error" --type python

Length of output: 6513

17-17: Verify the query_compute module structure

Let's ensure the compute module is properly structured and exported.

✅ Verification successful

Import of query_compute module is correctly structured

The module is properly implemented and exported:

query_compute function exists in cognee/tasks/compute/query_compute.py
It's correctly exported in cognee/tasks/compute/__init__.py
The import path used in search_v2.py matches the actual implementation

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the compute module structure and exports

# Check if the module exists and has proper exports
fd "^(query_compute|__init__)\.py$" -p "cognee/tasks/compute"

# Check the module's interface
rg -A 5 "^def query_compute" "cognee/tasks/compute"

Length of output: 113

Script:

#!/bin/bash
# Let's try a broader search to locate the query_compute module
# First, find all Python files in the tasks directory
echo "=== Python files in tasks directory ==="
fd ".py$" "cognee/tasks"

# Then, search for any mentions of query_compute
echo -e "\n=== Mentions of query_compute ==="
rg "query_compute" --type py

# Check if there's a compute directory or file anywhere in the codebase
echo -e "\n=== Files/directories with 'compute' ==="
fd "compute"

Length of output: 3296

Script:

#!/bin/bash
# Let's verify the implementation and exports of query_compute
echo "=== query_compute implementation ==="
cat cognee/tasks/compute/query_compute.py

echo -e "\n=== __init__.py exports ==="
cat cognee/tasks/compute/__init__.py

Length of output: 1392

cognee/modules/data/methods/get_data.py (1)

6-6: LGTM: Well-documented parameter addition

The addition of user_id parameter is properly documented in the function signature and docstring.

Also applies to: 10-10

cognee/api/v1/datasets/routers/get_datasets_router.py (2)

79-79: LGTM: Proper user context propagation

The get_data call correctly includes the user context for permission checking.

Line range hint 144-171: Optimize data retrieval and standardize error handling

The current implementation has several areas for improvement:

Multiple database calls that could be optimized
Inconsistent error response formats (JSONResponse vs EntityNotFoundError)
Redundant data validation

Consider this refactoring:

@router.get("/{dataset_id}/data/{data_id}/raw", response_class=FileResponse)
async def get_raw_data(dataset_id: str, data_id: str, user: User = Depends(get_authenticated_user)):
    from cognee.modules.data.methods import get_data
    from cognee.modules.data.methods import get_dataset, get_dataset_data

    # First verify dataset access
    dataset = await get_dataset(user.id, dataset_id)
    if dataset is None:
        raise EntityNotFoundError(message=f"Dataset ({dataset_id}) not found.")

-    dataset_data = await get_dataset_data(dataset.id)
-    if dataset_data is None:
-        raise EntityNotFoundError(message=f"No data found in dataset ({dataset_id}).")
-
-    matching_data = [data for data in dataset_data if str(data.id) == data_id]
-    if len(matching_data) == 0:
-        raise EntityNotFoundError(message= f"Data ({data_id}) not found in dataset ({dataset_id}).")

    # Get and verify data access
    data = await get_data(user.id, data_id)
    if data is None:
        raise EntityNotFoundError(message=f"Data ({data_id}) not found.")

    return data.raw_data_location

This refactoring:

Removes redundant data validation
Standardizes on EntityNotFoundError
Reduces database calls

Let's verify the relationship between datasets and data:

cognee/api/v1/add/add_v2.py (2)

17-17: Consider adding error handling for the new task

The new resolve_data_directories task is added to the pipeline, but there's no explicit error handling if the resolution fails. Consider adding try-catch blocks or error handling mechanisms.

Let's check if there's any error handling in the pipeline implementation:

16-19: Verify pipeline changes impact

The addition of resolve_data_directories task before ingest_data_with_metadata modifies the existing pipeline. Please ensure this change doesn't break existing functionality.

Let's check for existing usages:

✅ Verification successful

Let me gather more context about the pipeline and task usage patterns.

Let me check the implementation of these specific functions to understand their relationship and dependencies.

Pipeline modification is safe and logically correct

The addition of resolve_data_directories task before ingest_data_with_metadata is a safe change because:

ingest_data_with_metadata is only used in this specific pipeline (add_v2.py)
resolve_data_directories prepares the input by resolving directories into their contained files, which is a logical prerequisite for data ingestion
The function signatures show that resolve_data_directories accepts the same input types that ingest_data_with_metadata can process (files and binary streams)

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for other pipeline configurations using ingest_data_with_metadata
rg "Task\(ingest_data_with_metadata" --type py

Length of output: 136

Script:

#!/bin/bash
# Search for other pipeline configurations and task definitions
rg "Task\(" -A 3 --type py

# Look for the implementation of ingest_data_with_metadata
ast-grep --pattern 'def ingest_data_with_metadata($$$)'

# Look for resolve_data_directories implementation
ast-grep --pattern 'def resolve_data_directories($$$)'

Length of output: 5718

Script:

#!/bin/bash
# Search for the implementation and usages of both functions with context
rg "def (ingest_data_with_metadata|resolve_data_directories)" -A 5 --type py

# Look for any imports or references to these functions
rg "(ingest_data_with_metadata|resolve_data_directories)" --type py

Length of output: 2086

cognee/api/v1/permissions/routers/get_permissions_router.py (1)

18-18: Validate permission parameter before database operations

Ensure that the permission string is valid and properly formatted to prevent SQL injection or unexpected behavior.

cognee/tasks/compute/query_compute.py

cognee/modules/data/methods/get_data.py

cognee/tasks/ingestion/resolve_data_directories.py

cognee/api/v1/permissions/routers/get_permissions_router.py

cognee/modules/users/models/GroupPermission.py

borisarzentar · 2024-12-13T15:41:56Z

cognee/api/v1/search/search_v2.py


 class SearchType(Enum):
    SUMMARIES = "SUMMARIES"
    INSIGHTS = "INSIGHTS"
    CHUNKS = "CHUNKS"
+    COMPUTE = "COMPUTE"


I meant completion like it usually exists in LLM APIs -> https://llm-engine.scale.com/guides/completions/

Will make the change

cognee/modules/data/methods/get_data.py

Raise unauth access error when trying to read data without access Feature COG-656

coderabbitai

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (1)

cognee/modules/data/methods/get_data.py (1)

Line range hint 7-16: Enhance docstring with exception documentation

The docstring should document the possible exceptions that can be raised.

Apply this enhancement:

 async def get_data(user_id: UUID, data_id: UUID) -> Optional[Data]:
     """Retrieve data by ID.
 
         Args:
             user_id (UUID): user ID
             data_id (UUID): ID of the data to retrieve
 
         Returns:
             Optional[Data]: The requested data object if found, None otherwise
+
+        Raises:
+            UnauthorizedDataAccessError: If the user does not own the requested data
     """

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 67585d0 and 11634cb.

📒 Files selected for processing (3)

cognee/modules/data/exceptions/__init__.py (1 hunks)
cognee/modules/data/exceptions/exceptions.py (1 hunks)
cognee/modules/data/methods/get_data.py (2 hunks)

🔇 Additional comments (3)

cognee/modules/data/exceptions/__init__.py (1)

9-9: LGTM: Clean import addition

The import statement is properly placed and follows Python's standard practices for exception handling.

cognee/modules/data/exceptions/exceptions.py (1)

10-11: LGTM: Proper super() initialization added

The addition of the super() call ensures proper initialization of the parent class.

cognee/modules/data/methods/get_data.py (1)

22-24: Consider explicit handling of "not found" case

The current implementation implicitly returns None for not found cases. Consider making this more explicit for better maintainability.

Consider this approach:

-        if data and data.owner_id != user_id:
+        if not data:
+            return None
+            
+        if data.owner_id != user_id:
             raise UnauthorizedDataAccessError(message=f"User {user_id} is not authorized to access data {data_id}")
 
         return data

cognee/modules/data/exceptions/exceptions.py

Rename searching type from compute to completion Refactor COG-656

coderabbitai

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (3)

cognee/tasks/completion/exceptions/__init__.py (1)

7-9: Consider future-proofing the import statement.

While the current import is clean, consider using parentheses for imports from the start, as it makes adding future exceptions easier without changing the syntax.
cognee/tasks/completion/exceptions/exceptions.py (1)
4-11: Consider enhancing the error message for better debugging.

While the implementation is solid, the default error message could be more informative for debugging purposes.

Consider this enhancement:
     def __init__(
             self,
-            message: str = "Search did not find any data.",
+            message: str = "Search operation completed but no relevant data was found in the index.",
             name: str = "NoRelevantDataFound",
             status_code=status.HTTP_404_NOT_FOUND,
     ):
         super().__init__(message, name, status_code)
Also, consider adding a docstring to the class to document its specific use case and when it should be raised.
cognee/tasks/completion/query_completion.py (1)
7-14: Function signature needs improvement

The function signature could be enhanced with:

Type hints for exceptions that may be raised

Additional parameters for customization (e.g., limit, threshold)

Consider updating the docstring:
 async def query_completion(query: str) -> list:
     """
+    Computes an answer to a query by searching relevant chunks and using LLM.
+
     Parameters:
     - query (str): The query string to compute.
 
     Returns:
     - list: Answer to the query.
+
+    Raises:
+    - NoRelevantDataFound: When no relevant chunks are found
     """

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 11634cb and 924759a.

📒 Files selected for processing (5)

cognee/api/v1/search/search_v2.py (2 hunks)
cognee/tasks/completion/__init__.py (1 hunks)
cognee/tasks/completion/exceptions/__init__.py (1 hunks)
cognee/tasks/completion/exceptions/exceptions.py (1 hunks)
cognee/tasks/completion/query_completion.py (1 hunks)

🔇 Additional comments (5)

cognee/tasks/completion/__init__.py (1)

1-1: LGTM! Clean package-level import.

The import statement follows Python conventions and provides clean package-level access to the query_completion function.

cognee/tasks/completion/exceptions/__init__.py (1)

1-5: LGTM! Clear and concise module documentation.

The docstring effectively communicates the module's purpose and scope.

cognee/tasks/completion/exceptions/exceptions.py (1)

1-2: LGTM! Appropriate imports.

The imports are minimal and correctly bring in the required dependencies.

cognee/tasks/completion/query_completion.py (1)

22-28: Verify prompt template paths

The code assumes the existence of template files but doesn't handle missing templates.

Let's verify the template files exist:

✅ Verification successful

Template files are present and correctly located

Both template files exist at the expected location in cognee/infrastructure/llm/prompts/:

context_for_question.txt
answer_simple_question.txt

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if prompt template files exist
fd -t f "context_for_question.txt|answer_simple_question.txt"

Length of output: 183

cognee/api/v1/search/search_v2.py (1)

23-23: Consider renaming COMPLETION based on previous feedback

Based on the previous review discussion about LLM API conventions, consider a more standard name.

Consider renaming to align with standard LLM API terminology:

-    COMPLETION = "COMPLETION"
+    COMPLETIONS = "COMPLETIONS"

cognee/tasks/completion/query_completion.py

cognee/api/v1/search/search_v2.py

Update typo in string in code Chore COG-656

dexters1 added 14 commits December 11, 2024 14:31

feat: Add ability to send directories with data to cognee

f3ce7be

Add ability to send data directories to cognee Feature COG-656

fix: fix existing edge check

d4e2eb7

Resolve issue with UUID concat by casting to string Fix COG-656

chore: Remove comments from code

d9d90d9

Remove code comments that are not needed Chore COG-656

feat: Add directory resolution as step in cognee add function

ff9fd90

Added directory resolution as step in cognee add function Feature COG-656

fix: Resolve issue with text being submitted as data

9b4af85

Add support for text data to resolving data directory task Fix COG-656

Merge branch 'fix-pgvector-search' of github.com:topoteretes/cognee i…

3a1229c

…nto COG-656-deployment-state

Merge branch 'dev' of github.com:topoteretes/cognee into COG-656-depl…

659a995

…oyment-state

feat: Add resolving of directories as task for the add pipeline

7100a49

Add resolving of directories as task for the add pipeline Feature COG-656

fix: Remove data handling based on type in resolving directory function

92d0122

No need to handle different data types in resolving directories, focus on just handling case when it's a directory Fix COG-656

fix: Rewrite endpoint to add users to groups

eddfc17

Rewrote endpoint which adds users to groups Fix COG-656

fix: Resolve issue with adding permissions to groups

b8ba436

Resolve issue with adding permissions to groups Fix COG-656

feat: Add error handling in case user is already part of database and…

1180839

… permission already given to group Added error handling in case permission is already given to group and user is already part of group Feature COG-656

feat: Add user verification for accessing data

43187e4

Verify user has access to data before returning it Feature COG-656

feat: Add compute search to cognee

9c3e242

Add compute search to cognee which makes searches human readable Feature COG-656

dexters1 self-assigned this Dec 13, 2024

dexters1 added the run-checks label Dec 13, 2024

dexters1 and others added 2 commits December 13, 2024 15:23

Merge branch 'dev' into COG-656-deployment-state

c90d90a

feat: Add simple instruction for system prompt

67585d0

Add simple instruction for system prompt Feature COG-656

dexters1 marked this pull request as ready for review December 13, 2024 14:34

coderabbitai bot reviewed Dec 13, 2024

View reviewed changes

dexters1 requested a review from borisarzentar December 13, 2024 14:45

borisarzentar reviewed Dec 13, 2024

View reviewed changes

feat: Add unauth access error to getting data

11634cb

Raise unauth access error when trying to read data without access Feature COG-656

coderabbitai bot reviewed Dec 13, 2024

View reviewed changes

cognee/modules/data/exceptions/exceptions.py Outdated Show resolved Hide resolved

refactor: Rename query compute to query completion

924759a

Rename searching type from compute to completion Refactor COG-656

coderabbitai bot reviewed Dec 13, 2024

View reviewed changes

cognee/tasks/completion/query_completion.py Show resolved Hide resolved

cognee/tasks/completion/query_completion.py Show resolved Hide resolved

cognee/api/v1/search/search_v2.py Show resolved Hide resolved

chore: Update typo in code

35b1f7d

Update typo in string in code Chore COG-656

borisarzentar approved these changes Dec 13, 2024

View reviewed changes

dexters1 merged commit 2508727 into dev Dec 13, 2024
24 checks passed

dexters1 deleted the COG-656-deployment-state branch December 13, 2024 16:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cog 656 deployment state #368

Cog 656 deployment state #368

dexters1 commented Dec 13, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 13, 2024 •

edited

Loading

Rate limit exceeded

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

borisarzentar Dec 13, 2024

dexters1 Dec 13, 2024

coderabbitai bot left a comment

coderabbitai bot left a comment

Cog 656 deployment state #368

Cog 656 deployment state #368

Conversation

dexters1 commented Dec 13, 2024 • edited by coderabbitai bot Loading

Summary by CodeRabbit

Release Notes

coderabbitai bot commented Dec 13, 2024 • edited Loading

Rate limit exceeded

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

borisarzentar Dec 13, 2024

Choose a reason for hiding this comment

dexters1 Dec 13, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

dexters1 commented Dec 13, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 13, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)