Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cog 533 pydantic unit tests #230

Merged
merged 20 commits into from
Nov 19, 2024
Merged

Cog 533 pydantic unit tests #230

merged 20 commits into from
Nov 19, 2024

Conversation

0xideas
Copy link
Contributor

@0xideas 0xideas commented Nov 15, 2024

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced a benchmarking script to evaluate the performance of graph generation based on recursive organizational structures.
    • Added new utility functions for creating recursive society structures and counting entities within them.
    • Implemented a new test suite to validate the functionality of the graph model generation.
  • Bug Fixes

    • Enhanced type safety in handling graph edges and nodes.
  • Refactor

    • Simplified logic in the get_graph_from_model and get_model_instance_from_graph functions for better readability and maintainability.
    • Updated test functions to improve structure and focus on specific relationships within the graph model.
    • Ensured models are rebuilt before being returned in the storage utility functions.
  • Documentation

    • Improved test assertions and messages for better debugging and clarity.

Copy link
Contributor

coderabbitai bot commented Nov 15, 2024

Walkthrough

The pull request introduces significant modifications to the get_graph_from_model and get_model_instance_from_graph functions, including parameter updates and the addition of new helper functions. The handling of nodes and edges is streamlined, enhancing code readability and efficiency. New test cases are added to validate these functionalities, and existing tests are refactored for clarity. Additionally, new utility functions and classes are introduced to support the testing of organizational structures, alongside updates to test fixtures.

Changes

File Path Change Summary
cognee/modules/graph/utils/get_graph_from_model.py Updated get_graph_from_model method signature; added add_nodes_and_edges method; refactored logic for processing DataPoint instances.
cognee/modules/graph/utils/get_model_instance_from_graph.py Updated get_model_instance_from_graph method signature for type safety; optimized node_map construction; modified edge processing logic.
cognee/tests/unit/interfaces/graph/conftest.py Removed unused imports; replaced graph_outputs fixture with boris, changing scope to function.
cognee/tests/unit/interfaces/graph/get_graph_from_model_generative_test.py Introduced a new parameterized test for get_graph_from_model to validate graph structure generation.
cognee/tests/unit/interfaces/graph/get_graph_from_model_test.py Added new constants for edge ground truth; introduced new test functions for validating car node extraction.
cognee/tests/unit/interfaces/graph/get_model_instance_from_graph_generative_test.py Added a new parameterized test for validating society model instance integrity from graph.
cognee/tests/unit/interfaces/graph/get_model_instance_from_graph_test.py Updated test_parsed_person to utilize new input parameter boris and streamline graph handling.
cognee/tests/unit/interfaces/graph/util.py Introduced Organization and SocietyPerson classes; added utility functions for recursive creation and counting of society structures.
profiling/graph_pydantic_conversion/benchmark_function.py Added benchmark_function for performance measurement of callable functions.
profiling/graph_pydantic_conversion/profile_graph_pydantic_conversion.py Introduced a script for benchmarking graph generation performance based on recursive structures.
cognee/modules/storage/utils/__init__.py Updated copy_model function to include a call to model.model_rebuild() after model creation.

Possibly related PRs

Suggested reviewers

  • Vasilije1990
  • borisarzentar

Poem

🐇 In the garden of code, changes bloom bright,
Functions refactored, oh what a sight!
Nodes and edges dance in a new way,
Testing the models, come join the play!
With boris now ready, let’s hop and explore,
In the world of graphs, there’s always more! 🌼


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@0xideas 0xideas self-assigned this Nov 15, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Outside diff range and nitpick comments (19)
cognee/tests/unit/interfaces/graph/get_model_instance_from_graph_generative_test.py (3)

14-14: Document the rationale for chosen recursive depths

The parametrization tests depths 1-3, but it's not clear why these specific values were chosen or what they represent in terms of test coverage.

Add a comment explaining the significance of each depth value:

+# Test different complexity levels:
+# 1: Simple organization with one level
+# 2: Organization with sub-organizations (medium complexity)
+# 3: Deep nested structure (high complexity)
 @pytest.mark.parametrize("recursive_depth", [1, 2, 3])

15-24: Enhance test robustness with additional assertions

While comparing string representations is useful, it might miss structural issues. Consider adding more specific assertions.

Add these assertions before the string comparison:

     parsed_society = get_model_instance_from_graph(nodes, edges, "society")
 
+    # Verify graph structure
+    assert len(nodes) > 0, "Graph should contain nodes"
+    assert len(edges) > 0, "Graph should contain edges"
+    
+    # Verify basic properties
+    assert parsed_society.name == society.name, "Organization name should match"
+    assert type(parsed_society) == type(society), "Organization type should match"
+
     assert str(society) == (str(parsed_society)), show_first_difference(
         str(society), str(parsed_society), "society", "parsed_society"
     )

15-15: Add docstring to explain test purpose

The test function would benefit from documentation explaining its purpose and approach.

Add a descriptive docstring:

 def test_society_nodes_and_edges(recursive_depth):
+    """
+    Test bidirectional conversion between organization models and graph representations.
+    
+    Verifies that an organization structure can be:
+    1. Converted to a graph (nodes and edges)
+    2. Reconstructed back to an equivalent organization model
+    
+    Args:
+        recursive_depth: The depth of nested organizations to test
+    """
     society = create_organization_recursive(
cognee/tests/unit/interfaces/graph/get_graph_from_model_generative_test.py (3)

11-15: Add docstring to explain test purpose and parameters.

While the test name is descriptive, adding a docstring would improve documentation by explaining:

  • The purpose of testing different recursive depths
  • Expected behavior for each depth
  • The relationship between society structure and graph representation
 @pytest.mark.parametrize("recursive_depth", [1, 2, 3])
 def test_society_nodes_and_edges(recursive_depth):
+    """Test graph generation for recursive organizational structure.
+    
+    Args:
+        recursive_depth: Depth of the recursive organization structure (1-3 levels)
+            1: Single level hierarchy
+            2: Two-level hierarchy
+            3: Three-level hierarchy
+            
+    Validates:
+        - All organizations and persons are represented as nodes
+        - Parent-child relationships are correctly represented as edges
+    """
     society = create_organization_recursive(

17-20: Add validation for the generated society structure.

Consider adding assertions to validate the society structure before graph generation to ensure the test data is correctly set up.

     society = create_organization_recursive(
         "society", "Society", PERSON_NAMES, recursive_depth
     )

+    # Validate society structure
+    assert society.name == "Society", "Root organization name should be 'Society'"
+    assert n_organizations > 0, "Society should have at least one organization"
+    assert n_persons > 0, "Society should have at least one person"
+
     n_organizations, n_persons = count_society(society)
     society_counts_total = n_organizations + n_persons

11-11: Add test cases for edge cases.

Consider adding test cases for:

  • Empty organization (no persons)
  • Maximum depth scenarios
  • Invalid recursive depths
-@pytest.mark.parametrize("recursive_depth", [1, 2, 3])
+@pytest.mark.parametrize("recursive_depth", [
+    pytest.param(0, marks=pytest.mark.xfail(reason="Depth 0 is invalid")),
+    1,
+    2,
+    3,
+    pytest.param(10, marks=pytest.mark.xfail(reason="Exceeds maximum depth")),
+])
cognee/tests/unit/interfaces/graph/conftest.py (3)

Line range hint 45-68: LGTM! Well-structured fixture with appropriate scope.

The fixture is well-designed with proper test isolation using function scope, and it provides a comprehensive test object with nested relationships.

Consider adding a docstring to document the fixture's purpose and structure:

 @pytest.fixture(scope="function")
 def boris():
+    """
+    Creates a test Person instance named Boris with a linked Car and CarType.
+    
+    Returns:
+        Person: A test person with predefined attributes including a car and driving license.
+    """
     boris = Person(

Line range hint 60-65: Consider using dynamic dates for the driving license.

The hardcoded dates (2025-11-06) might cause tests to fail after 2025. Consider using dynamic dates relative to the current date.

Here's a suggested improvement:

+from datetime import datetime, timedelta
+
+# In the fixture
+future_date = (datetime.now() + timedelta(days=365)).strftime("%Y-%m-%d")
 driving_license={
     "issued_by": "PU Vrsac",
-    "issued_on": "2025-11-06",
+    "issued_on": datetime.now().strftime("%Y-%m-%d"),
     "number": "1234567890",
-    "expires_on": "2025-11-06",
+    "expires_on": future_date,
 },

45-46: Consider adding parameterization for different test scenarios.

The fixture could be made more versatile by supporting different test scenarios through parameterization.

Example implementation:

@pytest.fixture(scope="function")
def person(request):
    """
    Creates a test Person instance with configurable attributes.
    
    Args:
        scenario (str): The test scenario to create data for ('boris', 'alice', etc.)
    """
    scenarios = {
        'boris': {
            'name': 'Boris',
            'age': 30,
            'car_type': CarTypeName.Sedan
        },
        'alice': {
            'name': 'Alice',
            'age': 25,
            'car_type': CarTypeName.SUV
        }
    }
    
    scenario = getattr(request, 'param', 'boris')
    data = scenarios[scenario]
    
    return Person(
        id=data['name'].lower(),
        name=data['name'],
        age=data['age'],
        owns_car=[
            Car(
                id="car1",
                brand="Toyota",
                model="Camry",
                year=2020,
                color="Blue",
                is_type=CarType(id="type1", name=data['car_type']),
            )
        ],
        driving_license={...}  # Same as before
    )

Usage:

@pytest.mark.parametrize('person', ['boris', 'alice'], indirect=True)
def test_person_data(person):
    assert person.name in ['Boris', 'Alice']
cognee/modules/graph/utils/get_model_instance_from_graph.py (1)

18-19: Add docstring to explain the complex graph-to-model conversion logic.

This function performs complex operations involving Pydantic models and graph structures. A detailed docstring would help future maintainers understand the purpose and behavior.

Add a docstring like:

def get_model_instance_from_graph(nodes: list[DataPoint], edges: list[tuple[str, str, str, dict[str, str]]], entity_id: str):
    """Convert a graph structure into a Pydantic model instance.

    Args:
        nodes: List of DataPoint objects representing graph nodes
        edges: List of tuples (source_id, target_id, label, properties) representing graph edges
        entity_id: ID of the root entity to return

    Returns:
        A Pydantic model instance representing the graph structure

    Raises:
        KeyError: If node_id or edge metadata is not found
    """
cognee/tests/unit/interfaces/graph/get_graph_from_model_test.py (4)

4-13: Add metadata field for consistency with other edge definitions

The CAR_SEDAN_EDGE is missing the metadata field that is present in BORIS_CAR_EDGE_GROUND_TRUTH. Consider adding it for consistency.

 CAR_SEDAN_EDGE = (
     "car1",
     "sedan",
     "is_type",
     {
         "source_node_id": "car1",
         "target_node_id": "sedan",
         "relationship_name": "is_type",
+        "metadata": {},
     },
 )

Line range hint 16-27: Consider adding type hints to ground truth constants

Adding type hints to these constants would improve IDE support and make the expected structure more explicit.

+ from typing import Dict, Tuple, Any
+
+ CAR_SEDAN_EDGE: Tuple[str, str, str, Dict[str, Any]] = (
    "car1",
    # ...
)

+ CAR_TYPE_GROUND_TRUTH: Dict[str, str] = {
    "id": "sedan"
}

Also applies to: 28-50


Line range hint 72-90: Refactor duplicate edge comparison logic

The edge comparison logic is duplicated between test_extracted_car_sedan_edge and test_extracted_boris_car_edge. Consider extracting this into a helper function.

+ def assert_edge_matches_ground_truth(edge, ground_truth):
+     """Helper function to compare an edge with its ground truth."""
+     assert ground_truth[:3] == edge[:3], f"{ground_truth[:3] = } != {edge[:3] = }"
+     for key, truth_value in ground_truth[3].items():
+         assert truth_value == edge[3][key], f"{truth_value = } != {edge[3][key] = }"

 def test_extracted_car_sedan_edge(boris):
     _, edges = get_graph_from_model(boris)
     edge = edges[0]
-    assert CAR_SEDAN_EDGE[:3] == edge[:3], f"{CAR_SEDAN_EDGE[:3] = } != {edge[:3] = }"
-    for key, ground_truth in CAR_SEDAN_EDGE[3].items():
-        assert ground_truth == edge[3][key], f"{ground_truth = } != {edge[3][key] = }"
+    assert_edge_matches_ground_truth(edge, CAR_SEDAN_EDGE)

Line range hint 51-90: Add docstrings to test functions

The test functions lack docstrings explaining their purpose and expected behavior. Consider adding descriptive docstrings to improve maintainability.

Example:

def test_extracted_car_type(boris):
    """
    Test that car type node is correctly extracted from the model.
    
    Verifies:
    - The correct number of nodes is extracted
    - The car type node matches the expected ground truth
    """
cognee/modules/graph/utils/get_graph_from_model.py (2)

75-76: Potential Collision in edge_key Generation

Concatenating IDs and strings to generate edge_key may lead to key collisions if IDs and relationship names overlap. To prevent collisions and improve robustness, consider using a tuple as the key.

Apply this change:

-    edge_key = str(edge[0]) + str(edge[1]) + edge[2]
+    edge_key = (edge[0], edge[1], edge[2])

Ensure that the added_edges dictionary uses these tuple keys.


82-83: Potential Collision in edge_key Generation

Using string concatenation for edge_key may lead to collisions. Using a tuple of IDs and the field name ensures uniqueness and avoids potential errors.

Apply this change:

-    edge_key = str(data_point.id) + str(property_node.id) + field_name
+    edge_key = (data_point.id, property_node.id, field_name)
cognee/tests/unit/interfaces/graph/util.py (3)

28-29: Consider implementing list handling in run_test_against_ground_truth

Currently, when ground_truth is a list, the function raises a NotImplementedError. Implementing support for lists would enhance the utility of this function and allow for comprehensive testing against ground truth data structures.


70-88: Add docstring to create_society_person_recursive function

Consider adding a docstring to explain the purpose, parameters, and return value of this recursive function for better maintainability and readability.


90-108: Add docstring to create_organization_recursive function

Adding a docstring will help clarify the functionality and usage of this recursive function, aiding future developers in understanding and maintaining the code.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 8c0c0b9 and f3f0bca.

📒 Files selected for processing (8)
  • cognee/modules/graph/utils/get_graph_from_model.py (2 hunks)
  • cognee/modules/graph/utils/get_model_instance_from_graph.py (1 hunks)
  • cognee/tests/unit/interfaces/graph/conftest.py (2 hunks)
  • cognee/tests/unit/interfaces/graph/get_graph_from_model_generative_test.py (1 hunks)
  • cognee/tests/unit/interfaces/graph/get_graph_from_model_test.py (3 hunks)
  • cognee/tests/unit/interfaces/graph/get_model_instance_from_graph_generative_test.py (1 hunks)
  • cognee/tests/unit/interfaces/graph/get_model_instance_from_graph_test.py (2 hunks)
  • cognee/tests/unit/interfaces/graph/util.py (2 hunks)
🔇 Additional comments (7)
cognee/tests/unit/interfaces/graph/get_model_instance_from_graph_generative_test.py (1)

1-11: LGTM! Well-organized imports

The imports are logically grouped and all are utilized in the test implementation.

cognee/tests/unit/interfaces/graph/get_model_instance_from_graph_test.py (2)

1-4: LGTM! Clean and focused imports

The imports are well-organized, importing only the specific functions needed for the test.


28-30: Consider renaming the test to better reflect its purpose

While the implementation looks good, the test name test_parsed_person could be more descriptive. Consider renaming it to test_person_model_graph_roundtrip to better reflect that it's testing the round-trip conversion from model to graph and back.

Let's verify the boris fixture exists and is properly typed:

✅ Verification successful

The test implementation and fixture are well-typed and appropriate

The test name suggestion can be considered, but the current implementation is solid:

  • The boris fixture is properly defined and typed, inheriting from DataPoint via the Person class
  • All required models (Person, Car) are properly defined with clear type hints
  • The test correctly verifies the round-trip conversion of a complex object graph (Person with Car relationships)
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the boris fixture and its type hints
# Look for the fixture definition and its type hints
rg -A 5 "def boris" --type py

Length of output: 423


Script:

#!/bin/bash
# Let's verify the Person model definition and its type hints
rg -A 10 "class Person" --type py

Length of output: 1568


Script:

#!/bin/bash
# Let's verify the Car model definition and check if both Person and Car inherit from DataPoint
rg -A 5 "class Car" --type py && rg "class DataPoint" --type py

Length of output: 1699

cognee/tests/unit/interfaces/graph/get_graph_from_model_generative_test.py (1)

1-9: LGTM! Well-organized imports.

The imports are cleanly structured with appropriate separation between external and internal dependencies.

cognee/modules/graph/utils/get_model_instance_from_graph.py (1)

29-34: Verify model compatibility before merging.

When merging models or creating new ones, it's important to ensure type compatibility between source and target nodes.

Let's verify the model compatibility in the codebase:

cognee/modules/graph/utils/get_graph_from_model.py (1)

6-14: Proper Handling of Default Mutable Arguments

Replacing the default values of added_nodes and added_edges from empty dictionaries to None and initializing them inside the function prevents unintended side effects caused by mutable default arguments. This is a good practice in Python.

cognee/tests/unit/interfaces/graph/util.py (1)

39-52: LGTM: Class definitions for Organization and SocietyPerson

The new classes Organization and SocietyPerson are correctly defined with appropriate type annotations and forward references. The use of model_rebuild() ensures that forward references are resolved properly.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Outside diff range and nitpick comments (3)
cognee/tests/unit/interfaces/graph/util.py (3)

28-29: Enhance the NotImplementedError message

The error message should provide more context about what list type is not supported and potential workarounds.

-            raise NotImplementedError("Currently not implemented for 'list'")
+            raise NotImplementedError(f"Validation for list type attributes is not yet implemented. Key: {key}, Value: {ground_truth}")

54-66: Consider expanding test data to include edge cases

The current test data is good but could be enhanced to cover more scenarios:

  • Names with special characters
  • Unicode characters
  • Very long names
  • Empty strings

69-86: Consider making test data generation deterministic

Random ID generation could make tests non-deterministic and harder to debug. Consider:

  1. Using a seeded random generator
  2. Or using predictable ID patterns based on depth and position
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between f3f0bca and a1f7272.

📒 Files selected for processing (1)
  • cognee/tests/unit/interfaces/graph/util.py (2 hunks)
🔇 Additional comments (1)
cognee/tests/unit/interfaces/graph/util.py (1)

39-51: LGTM! Well-structured data models with proper type hints

The circular reference between Organization and SocietyPerson is handled correctly using forward references in type hints. The model_rebuild() calls ensure proper initialization.

cognee/tests/unit/interfaces/graph/util.py Show resolved Hide resolved
cognee/tests/unit/interfaces/graph/util.py Show resolved Hide resolved
cognee/tests/unit/interfaces/graph/util.py Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (6)
profiling/graph_pydantic_conversion/profile_graph_pydantic_conversion.py (2)

1-10: Remove unused import and consider reorganizing imports

The time module is imported but never used in the code. Additionally, consider using an absolute import path for benchmark_function to improve reliability.

Apply this diff:

-import time

+from cognee.profiling.benchmark_function import benchmark_function
-from benchmark_function import benchmark_function

from cognee.modules.graph.utils import get_graph_from_model
from cognee.tests.unit.interfaces.graph.util import (
    PERSON_NAMES,
    create_organization_recursive,
)

1-43: Consider integrating with a comprehensive performance testing framework

This script could be enhanced by:

  1. Integration with CI/CD pipelines for automated performance tracking
  2. Comparison with historical benchmark results
  3. Setting performance budgets and alerting on regressions
  4. Generating performance trend visualizations

Would you like assistance in setting up a more comprehensive performance testing framework?

profiling/graph_pydantic_conversion/benchmark_function.py (4)

9-20: Enhance docstring with return value details

While the docstring is well-structured, it would be helpful to specify the metrics included in the returned dictionary.

     Returns:
-        Dictionary containing benchmark metrics
+        Dictionary containing benchmark metrics:
+            - mean_execution_time: Average execution time in seconds
+            - mean_peak_memory_mb: Average peak memory usage in MB
+            - mean_cpu_percent: Average CPU utilization percentage
+            - num_runs: Number of benchmark iterations performed
+            - std_execution_time: Standard deviation of execution times (if num_runs > 1)

48-48: Consider including memory delta in metrics

The memory_used variable calculates the memory delta but isn't included in the final metrics. Consider adding it to provide insights into memory allocation patterns.

     analysis = {
         "mean_execution_time": statistics.mean(execution_times),
         "mean_peak_memory_mb": statistics.mean(peak_memory_usages),
         "mean_cpu_percent": statistics.mean(cpu_percentages),
         "num_runs": num_runs,
+        "memory_delta_bytes": memory_used,  # Add memory allocation delta
     }
🧰 Tools
🪛 Ruff

48-48: Local variable memory_used is assigned to but never used

Remove assignment to unused variable memory_used

(F841)


9-9: Add input validation for num_runs parameter

Consider adding validation to ensure num_runs is positive to prevent potential issues.

 def benchmark_function(func: Callable, *args, num_runs: int = 5) -> Dict[str, Any]:
+    if num_runs <= 0:
+        raise ValueError("num_runs must be positive")

25-56: Consider adding error handling for system-level operations

The benchmarking code interacts with system-level metrics through psutil and tracemalloc. Consider adding try-except blocks to handle potential system-related errors gracefully.

For example:

  • Handle psutil.NoSuchProcess or psutil.AccessDenied
  • Ensure tracemalloc is properly stopped even if an error occurs
  • Consider using a context manager pattern for the benchmarking lifecycle
🧰 Tools
🪛 Ruff

36-36: Local variable result is assigned to but never used

Remove assignment to unused variable result

(F841)


48-48: Local variable memory_used is assigned to but never used

Remove assignment to unused variable memory_used

(F841)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between a1f7272 and 5b420eb.

📒 Files selected for processing (2)
  • profiling/graph_pydantic_conversion/benchmark_function.py (1 hunks)
  • profiling/graph_pydantic_conversion/profile_graph_pydantic_conversion.py (1 hunks)
🧰 Additional context used
🪛 Ruff
profiling/graph_pydantic_conversion/benchmark_function.py

36-36: Local variable result is assigned to but never used

Remove assignment to unused variable result

(F841)


48-48: Local variable memory_used is assigned to but never used

Remove assignment to unused variable memory_used

(F841)

🔇 Additional comments (3)
profiling/graph_pydantic_conversion/profile_graph_pydantic_conversion.py (1)

14-26: LGTM! Well-structured argument parsing

The argument parser is well-implemented with clear descriptions and sensible defaults.

profiling/graph_pydantic_conversion/benchmark_function.py (2)

1-8: LGTM! Well-organized imports

The imports are properly organized and include all necessary dependencies for the benchmarking functionality.


36-36: Keep the result assignment

While the result variable is flagged as unused, it's necessary to ensure the function actually executes and doesn't get optimized away by the interpreter.

🧰 Tools
🪛 Ruff

36-36: Local variable result is assigned to but never used

Remove assignment to unused variable result

(F841)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (8)
cognee/modules/graph/utils/get_model_instance_from_graph.py (2)

9-20: Add input validation and documentation.

The function implementation is clean and well-typed. Consider these improvements:

  1. Add input validation for the aggregation function
  2. Add a docstring explaining the purpose, parameters, and return value
 def merge_dicts(dict1: dict, dict2: dict, agg_fn: Callable) -> dict:
+    """Merge two dictionaries using an aggregation function for common keys.
+    
+    Args:
+        dict1: The first dictionary
+        dict2: The second dictionary
+        agg_fn: Callable that takes two values and returns the aggregated result
+    
+    Returns:
+        dict: Merged dictionary with aggregated values for common keys
+    
+    Raises:
+        TypeError: If agg_fn is not callable
+    """
+    if not callable(agg_fn):
+        raise TypeError("agg_fn must be callable")
+
     merged_dict = {}

37-54: Consider extracting model creation logic.

The model creation logic for both edge types is similar but duplicated. Consider extracting it into a helper function to improve maintainability and reduce duplication.

+def create_new_model(
+    source_node: DataPoint,
+    target_node: DataPoint,
+    edge_label: str,
+    is_list: bool = False
+) -> tuple[type, dict]:
+    """Create a new model type and data dictionary based on edge type."""
+    model_type = copy_model(
+        type(source_node),
+        {edge_label: (list[type(target_node)] if is_list else type(target_node), PydanticUndefined)}
+    )
+    
+    if is_list:
+        model_data = merge_dicts(
+            source_node.model_dump(),
+            {edge_label: [target_node]},
+            lambda a, b: a + b,
+        )
+    else:
+        model_data = {**source_node.model_dump(), **{edge_label: target_node}}
+    
+    return model_type, model_data

 def get_model_instance_from_graph(...):
     # ... existing code ...
     
     if edge_type == "list":
-        NewModel = copy_model(
-            type(source_node),
-            {edge_label: (list[type(target_node)], PydanticUndefined)},
-        )
-        new_model_dict = merge_dicts(
-            source_node.model_dump(),
-            {edge_label: [target_node]},
-            lambda a, b: a + b,
-        )
-        node_map[source_node_id] = NewModel(**new_model_dict)
+        model_type, model_data = create_new_model(
+            source_node, target_node, edge_label, is_list=True
+        )
+        node_map[source_node_id] = model_type(**model_data)
     else:
-        NewModel = copy_model(
-            type(source_node),
-            {edge_label: (type(target_node), PydanticUndefined)}
-        )
-        node_map[target_node_id] = NewModel(
-            **source_node.model_dump(), **{edge_label: target_node}
+        model_type, model_data = create_new_model(
+            source_node, target_node, edge_label
         )
+        node_map[target_node_id] = model_type(**model_data)
cognee/modules/graph/utils/get_graph_from_model.py (1)

100-102: Consider making the datetime format configurable

The datetime format is currently hardcoded. Consider:

  1. Moving it to a configuration constant
  2. Ensuring the format includes timezone information
  3. Aligning it with any database requirements
+# At the top of the file
+DATETIME_FORMAT = "%Y-%m-%d %H:%M:%S%z"
+
 # In the edge creation
-                        "updated_at": datetime.now(timezone.utc).strftime(
-                            "%Y-%m-%d %H:%M:%S"
-                        ),
+                        "updated_at": datetime.now(timezone.utc).strftime(DATETIME_FORMAT),
cognee/tests/unit/interfaces/graph/util.py (5)

39-51: Add docstrings to document the data models.

The class structure and type hints look good, but adding docstrings would improve code documentation.

Add docstrings to describe the purpose and attributes of each class:

 class Organization(DataPoint):
+    """Represents an organization in the society structure.
+
+    Attributes:
+        id: Unique identifier for the organization
+        name: Name of the organization
+        members: Optional list of SocietyPerson members
+    """
     id: str
     name: str
     members: Optional[list["SocietyPerson"]]

 class SocietyPerson(DataPoint):
+    """Represents a person in the society structure.
+
+    Attributes:
+        id: Unique identifier for the person
+        name: Name of the person
+        memberships: Optional list of Organization memberships
+    """
     id: str
     name: str
     memberships: Optional[list[Organization]]

54-66: Add type hints to constants.

The constants are well-organized but would benefit from explicit type hints.

-ORGANIZATION_NAMES = [
+ORGANIZATION_NAMES: list[str] = [
     "ChessClub",
     # ...
 ]
-PERSON_NAMES = ["Sarah", "Anna", "John", "Sam"]
+PERSON_NAMES: list[str] = ["Sarah", "Anna", "John", "Sam"]

69-106: Add return type hints to recursive functions.

The recursive functions would benefit from explicit return type hints for better type safety.

-def create_society_person_recursive(id, name, organization_names, max_depth, depth=0):
+def create_society_person_recursive(id: str, name: str, organization_names: list[str], max_depth: int, depth: int = 0) -> SocietyPerson:
     # ...

-def create_organization_recursive(id, name, member_names, max_depth, depth=0):
+def create_organization_recursive(id: str, name: str, member_names: list[str], max_depth: int, depth: int = 0) -> Organization:
     # ...

109-131: Add type hints and docstring to count_society function.

The function would benefit from explicit type hints and documentation.

-def count_society(obj):
+def count_society(obj: Union[Organization, SocietyPerson]) -> tuple[int, int]:
+    """Count the number of organizations and society persons in a society structure.
+
+    Args:
+        obj: An Organization or SocietyPerson object to count
+
+    Returns:
+        A tuple of (organization_count, society_person_count)
+
+    Raises:
+        TypeError: If obj is neither Organization nor SocietyPerson
+    """
     # ...

Don't forget to add the Union import:

-from typing import Any, Dict, Optional
+from typing import Any, Dict, Optional, Union

134-149: Add type hints and error handling to show_first_difference function.

The function would benefit from type hints and proper error handling for empty strings.

-def show_first_difference(str1, str2, str1_name, str2_name, context=30):
+def show_first_difference(str1: str, str2: str, str1_name: str, str2_name: str, context: int = 30) -> str:
+    """Show where two strings first diverge, with surrounding context.
+
+    Args:
+        str1: First string to compare
+        str2: Second string to compare
+        str1_name: Name/identifier for the first string
+        str2_name: Name/identifier for the second string
+        context: Number of characters to show before and after the difference
+
+    Returns:
+        A string describing where the first difference occurs
+    """
+    if not str1 or not str2:
+        return f"One or both strings are empty: {str1_name}({len(str1)}) vs {str2_name}({len(str2)})"
     # ...
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 5b420eb and a334291.

📒 Files selected for processing (3)
  • cognee/modules/graph/utils/get_graph_from_model.py (2 hunks)
  • cognee/modules/graph/utils/get_model_instance_from_graph.py (1 hunks)
  • cognee/tests/unit/interfaces/graph/util.py (2 hunks)
🔇 Additional comments (7)
cognee/modules/graph/utils/get_model_instance_from_graph.py (2)

1-6: LGTM! Imports and type hints are well structured.

The imports are correctly organized and the type hints are properly implemented.


23-27: LGTM! Improved type safety and edge handling.

The function signature now has proper type hints for the edges parameter, making the expected structure explicit. The default value for edge_type provides better error handling.

cognee/modules/graph/utils/get_graph_from_model.py (4)

7-12: LGTM! Improved parameter defaults and initialization

The changes to use None as default instead of mutable empty dictionaries is a good practice that prevents the shared state issues common with mutable defaults. The initialization checks are clear and safe.


23-33: LGTM! Improved code modularity

The extraction of node and edge handling logic into add_nodes_and_edges improves code organization and readability while maintaining the same functionality.


43-50: LGTM! Efficient edge metadata handling

The edge metadata updates are now handled efficiently by tracking the number of edges before the update and only modifying the newly added edges.


71-73: ⚠️ Potential issue

Issue with dict copies persists

The previous review comment about potential duplicate nodes and edges due to dict copies remains valid.

cognee/tests/unit/interfaces/graph/util.py (1)

1-2: LGTM! Good error handling improvement.

The addition of necessary imports and the explicit NotImplementedError for list ground truth cases improves the code's robustness and maintainability.

Also applies to: 4-4, 28-29

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
cognee/modules/storage/utils/__init__.py (2)

32-34: Add error handling for model rebuilding

While adding model_rebuild() is good for ensuring model integrity, consider adding error handling to gracefully handle potential rebuild failures.

-    model = create_model(model.__name__, **final_fields)
-    model.model_rebuild()
-    return model
+    try:
+        model = create_model(model.__name__, **final_fields)
+        model.model_rebuild()
+        return model
+    except Exception as e:
+        raise ValueError(f"Failed to rebuild model {model.__name__}: {str(e)}")

32-34: Consider caching rebuilt models for performance

Since model rebuilding can be computationally expensive, consider caching rebuilt models using a class-level cache for frequently used model configurations.

Example implementation:

from functools import lru_cache

@lru_cache(maxsize=128)
def _create_and_rebuild_model(model_name: str, fields_hash: str):
    fields = json.loads(fields_hash)  # Convert back from JSON string
    model = create_model(model_name, **fields)
    model.model_rebuild()
    return model

def copy_model(model: DataPoint, include_fields: dict = {}, exclude_fields: list = []):
    fields = {
        name: (field.annotation, field.default if field.default is not None else PydanticUndefined)
            for name, field in model.model_fields.items()
            if name not in exclude_fields
    }
    final_fields = {**fields, **include_fields}
    
    # Convert fields to a hashable string for caching
    fields_hash = json.dumps(final_fields, sort_keys=True)
    return _create_and_rebuild_model(model.__name__, fields_hash)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between a334291 and 8a2cf20.

📒 Files selected for processing (1)
  • cognee/modules/storage/utils/__init__.py (1 hunks)

Copy link

gitguardian bot commented Nov 18, 2024

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secret in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
9573981 Triggered Generic Password fde56f0 notebooks/cognee_llama_index.ipynb View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secret safely. Learn here the best practices.
  3. Revoke and rotate this secret.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
cognee/tests/unit/interfaces/graph/util.py (1)

70-87: Add depth validation and consider UUID for IDs

While the implementation is functional, consider these improvements:

  1. Add depth validation to prevent stack overflow (as suggested in past review)
  2. Consider using UUID for IDs instead of random strings to prevent potential collisions
 def create_society_person_recursive(id, name, organization_names, max_depth, depth=0):
+    if max_depth < 0:
+        raise ValueError("max_depth must be non-negative")
+    if depth > 100:  # reasonable limit to prevent stack overflow
+        raise ValueError("Maximum recursion depth exceeded")
-    id_suffix = "".join(random.choice(string.ascii_lowercase) for _ in range(10))
+    id_suffix = str(uuid.uuid4())
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 8a2cf20 and fde56f0.

📒 Files selected for processing (3)
  • cognee/tests/unit/interfaces/graph/get_graph_from_model_generative_test.py (1 hunks)
  • cognee/tests/unit/interfaces/graph/get_model_instance_from_graph_generative_test.py (1 hunks)
  • cognee/tests/unit/interfaces/graph/util.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • cognee/tests/unit/interfaces/graph/get_graph_from_model_generative_test.py
  • cognee/tests/unit/interfaces/graph/get_model_instance_from_graph_generative_test.py
🔇 Additional comments (5)
cognee/tests/unit/interfaces/graph/util.py (5)

1-2: LGTM! Good practice using NotImplementedError

The addition of required imports and explicit handling of unsupported list type in ground truth is a good practice.

Also applies to: 4-4, 28-29


39-52: Well-structured classes with proper type hints

Good implementation of:

  • Forward references for circular dependencies using string literals
  • Optional type hints for nullable fields
  • Proper model rebuilding for circular references

55-67: LGTM! Well-defined test data constants

Good variety of test data with clear naming conventions.


110-132: Add type hints and improve error handling

The error handling improvement was noted in a past review. Additionally:

  1. Add type hints to improve code clarity and enable better IDE support
-def count_society(obj):
+def count_society(obj: Union[Organization, SocietyPerson]) -> Tuple[int, int]:

135-150: Add docstring and fix index handling

The index handling issues were noted in a past review. Additionally:

  1. Add a docstring explaining the return format and the context parameter
 def show_first_difference(str1, str2, str1_name, str2_name, context=30):
+    """Compare two strings and show their first difference with context.
+
+    Args:
+        str1: First string to compare
+        str2: Second string to compare
+        str1_name: Name/identifier for the first string
+        str2_name: Name/identifier for the second string
+        context: Number of characters to show before and after the difference
+
+    Returns:
+        A formatted string showing the first difference with context, or a message
+        if the strings are identical or one is a prefix of the other.
+    """

@0xideas
Copy link
Contributor Author

0xideas commented Nov 18, 2024

Hey @borisarzentar,

I created two pydantic Classes, Organization and SocietyPerson, that contain each other. A nested pydantic structure of this kind is created by calling create_organization_recursive in the file get_model_instance_from_graph_generative_test.py.

Weirdly, the get_model_instance_from_graph call fails in Python 3.9 and 3.10, but succeeds in 3.11. The issue seems to be that the pydantic class SocietyPerson is undefined in the call to model = create_model(model.__name__, **final_fields) for reconstructing Organization. Python 3.11 seems to have solved this issue, although I'm unsure why/how. For now, I'm just skipping these tests for Python 3.9 and 3.10 and I have added a bug ticket.

Thanks!
Leon

type(source_node),
{edge_label: (list[type(target_node)], PydanticUndefined)},
)
new_model_dict = merge_dicts(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge_dicts seems like an overkill here, we want to merge just one property here. It will iterate over properties of both objects for one property, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point, I changed it to merge the key values directly.

Copy link
Contributor Author

@0xideas 0xideas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments! I removed the merge_dicts call.

type(source_node),
{edge_label: (list[type(target_node)], PydanticUndefined)},
)
new_model_dict = merge_dicts(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point, I changed it to merge the key values directly.

@0xideas 0xideas merged commit c3757cc into main Nov 19, 2024
27 of 28 checks passed
@0xideas 0xideas deleted the COG-533-pydantic-unit-tests branch November 19, 2024 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants