-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cog 533 pydantic unit tests #230
Conversation
WalkthroughThe pull request introduces significant modifications to the Changes
Possibly related PRs
Suggested reviewers
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
🧹 Outside diff range and nitpick comments (19)
cognee/tests/unit/interfaces/graph/get_model_instance_from_graph_generative_test.py (3)
14-14
: Document the rationale for chosen recursive depthsThe parametrization tests depths 1-3, but it's not clear why these specific values were chosen or what they represent in terms of test coverage.
Add a comment explaining the significance of each depth value:
+# Test different complexity levels: +# 1: Simple organization with one level +# 2: Organization with sub-organizations (medium complexity) +# 3: Deep nested structure (high complexity) @pytest.mark.parametrize("recursive_depth", [1, 2, 3])
15-24
: Enhance test robustness with additional assertionsWhile comparing string representations is useful, it might miss structural issues. Consider adding more specific assertions.
Add these assertions before the string comparison:
parsed_society = get_model_instance_from_graph(nodes, edges, "society") + # Verify graph structure + assert len(nodes) > 0, "Graph should contain nodes" + assert len(edges) > 0, "Graph should contain edges" + + # Verify basic properties + assert parsed_society.name == society.name, "Organization name should match" + assert type(parsed_society) == type(society), "Organization type should match" + assert str(society) == (str(parsed_society)), show_first_difference( str(society), str(parsed_society), "society", "parsed_society" )
15-15
: Add docstring to explain test purposeThe test function would benefit from documentation explaining its purpose and approach.
Add a descriptive docstring:
def test_society_nodes_and_edges(recursive_depth): + """ + Test bidirectional conversion between organization models and graph representations. + + Verifies that an organization structure can be: + 1. Converted to a graph (nodes and edges) + 2. Reconstructed back to an equivalent organization model + + Args: + recursive_depth: The depth of nested organizations to test + """ society = create_organization_recursive(cognee/tests/unit/interfaces/graph/get_graph_from_model_generative_test.py (3)
11-15
: Add docstring to explain test purpose and parameters.While the test name is descriptive, adding a docstring would improve documentation by explaining:
- The purpose of testing different recursive depths
- Expected behavior for each depth
- The relationship between society structure and graph representation
@pytest.mark.parametrize("recursive_depth", [1, 2, 3]) def test_society_nodes_and_edges(recursive_depth): + """Test graph generation for recursive organizational structure. + + Args: + recursive_depth: Depth of the recursive organization structure (1-3 levels) + 1: Single level hierarchy + 2: Two-level hierarchy + 3: Three-level hierarchy + + Validates: + - All organizations and persons are represented as nodes + - Parent-child relationships are correctly represented as edges + """ society = create_organization_recursive(
17-20
: Add validation for the generated society structure.Consider adding assertions to validate the society structure before graph generation to ensure the test data is correctly set up.
society = create_organization_recursive( "society", "Society", PERSON_NAMES, recursive_depth ) + # Validate society structure + assert society.name == "Society", "Root organization name should be 'Society'" + assert n_organizations > 0, "Society should have at least one organization" + assert n_persons > 0, "Society should have at least one person" + n_organizations, n_persons = count_society(society) society_counts_total = n_organizations + n_persons
11-11
: Add test cases for edge cases.Consider adding test cases for:
- Empty organization (no persons)
- Maximum depth scenarios
- Invalid recursive depths
-@pytest.mark.parametrize("recursive_depth", [1, 2, 3]) +@pytest.mark.parametrize("recursive_depth", [ + pytest.param(0, marks=pytest.mark.xfail(reason="Depth 0 is invalid")), + 1, + 2, + 3, + pytest.param(10, marks=pytest.mark.xfail(reason="Exceeds maximum depth")), +])cognee/tests/unit/interfaces/graph/conftest.py (3)
Line range hint
45-68
: LGTM! Well-structured fixture with appropriate scope.The fixture is well-designed with proper test isolation using function scope, and it provides a comprehensive test object with nested relationships.
Consider adding a docstring to document the fixture's purpose and structure:
@pytest.fixture(scope="function") def boris(): + """ + Creates a test Person instance named Boris with a linked Car and CarType. + + Returns: + Person: A test person with predefined attributes including a car and driving license. + """ boris = Person(
Line range hint
60-65
: Consider using dynamic dates for the driving license.The hardcoded dates (2025-11-06) might cause tests to fail after 2025. Consider using dynamic dates relative to the current date.
Here's a suggested improvement:
+from datetime import datetime, timedelta + +# In the fixture +future_date = (datetime.now() + timedelta(days=365)).strftime("%Y-%m-%d") driving_license={ "issued_by": "PU Vrsac", - "issued_on": "2025-11-06", + "issued_on": datetime.now().strftime("%Y-%m-%d"), "number": "1234567890", - "expires_on": "2025-11-06", + "expires_on": future_date, },
45-46
: Consider adding parameterization for different test scenarios.The fixture could be made more versatile by supporting different test scenarios through parameterization.
Example implementation:
@pytest.fixture(scope="function") def person(request): """ Creates a test Person instance with configurable attributes. Args: scenario (str): The test scenario to create data for ('boris', 'alice', etc.) """ scenarios = { 'boris': { 'name': 'Boris', 'age': 30, 'car_type': CarTypeName.Sedan }, 'alice': { 'name': 'Alice', 'age': 25, 'car_type': CarTypeName.SUV } } scenario = getattr(request, 'param', 'boris') data = scenarios[scenario] return Person( id=data['name'].lower(), name=data['name'], age=data['age'], owns_car=[ Car( id="car1", brand="Toyota", model="Camry", year=2020, color="Blue", is_type=CarType(id="type1", name=data['car_type']), ) ], driving_license={...} # Same as before )Usage:
@pytest.mark.parametrize('person', ['boris', 'alice'], indirect=True) def test_person_data(person): assert person.name in ['Boris', 'Alice']cognee/modules/graph/utils/get_model_instance_from_graph.py (1)
18-19
: Add docstring to explain the complex graph-to-model conversion logic.This function performs complex operations involving Pydantic models and graph structures. A detailed docstring would help future maintainers understand the purpose and behavior.
Add a docstring like:
def get_model_instance_from_graph(nodes: list[DataPoint], edges: list[tuple[str, str, str, dict[str, str]]], entity_id: str): """Convert a graph structure into a Pydantic model instance. Args: nodes: List of DataPoint objects representing graph nodes edges: List of tuples (source_id, target_id, label, properties) representing graph edges entity_id: ID of the root entity to return Returns: A Pydantic model instance representing the graph structure Raises: KeyError: If node_id or edge metadata is not found """cognee/tests/unit/interfaces/graph/get_graph_from_model_test.py (4)
4-13
: Add metadata field for consistency with other edge definitionsThe
CAR_SEDAN_EDGE
is missing themetadata
field that is present inBORIS_CAR_EDGE_GROUND_TRUTH
. Consider adding it for consistency.CAR_SEDAN_EDGE = ( "car1", "sedan", "is_type", { "source_node_id": "car1", "target_node_id": "sedan", "relationship_name": "is_type", + "metadata": {}, }, )
Line range hint
16-27
: Consider adding type hints to ground truth constantsAdding type hints to these constants would improve IDE support and make the expected structure more explicit.
+ from typing import Dict, Tuple, Any + + CAR_SEDAN_EDGE: Tuple[str, str, str, Dict[str, Any]] = ( "car1", # ... ) + CAR_TYPE_GROUND_TRUTH: Dict[str, str] = { "id": "sedan" }Also applies to: 28-50
Line range hint
72-90
: Refactor duplicate edge comparison logicThe edge comparison logic is duplicated between
test_extracted_car_sedan_edge
andtest_extracted_boris_car_edge
. Consider extracting this into a helper function.+ def assert_edge_matches_ground_truth(edge, ground_truth): + """Helper function to compare an edge with its ground truth.""" + assert ground_truth[:3] == edge[:3], f"{ground_truth[:3] = } != {edge[:3] = }" + for key, truth_value in ground_truth[3].items(): + assert truth_value == edge[3][key], f"{truth_value = } != {edge[3][key] = }" def test_extracted_car_sedan_edge(boris): _, edges = get_graph_from_model(boris) edge = edges[0] - assert CAR_SEDAN_EDGE[:3] == edge[:3], f"{CAR_SEDAN_EDGE[:3] = } != {edge[:3] = }" - for key, ground_truth in CAR_SEDAN_EDGE[3].items(): - assert ground_truth == edge[3][key], f"{ground_truth = } != {edge[3][key] = }" + assert_edge_matches_ground_truth(edge, CAR_SEDAN_EDGE)
Line range hint
51-90
: Add docstrings to test functionsThe test functions lack docstrings explaining their purpose and expected behavior. Consider adding descriptive docstrings to improve maintainability.
Example:
def test_extracted_car_type(boris): """ Test that car type node is correctly extracted from the model. Verifies: - The correct number of nodes is extracted - The car type node matches the expected ground truth """cognee/modules/graph/utils/get_graph_from_model.py (2)
75-76
: Potential Collision inedge_key
GenerationConcatenating IDs and strings to generate
edge_key
may lead to key collisions if IDs and relationship names overlap. To prevent collisions and improve robustness, consider using a tuple as the key.Apply this change:
- edge_key = str(edge[0]) + str(edge[1]) + edge[2] + edge_key = (edge[0], edge[1], edge[2])Ensure that the
added_edges
dictionary uses these tuple keys.
82-83
: Potential Collision inedge_key
GenerationUsing string concatenation for
edge_key
may lead to collisions. Using a tuple of IDs and the field name ensures uniqueness and avoids potential errors.Apply this change:
- edge_key = str(data_point.id) + str(property_node.id) + field_name + edge_key = (data_point.id, property_node.id, field_name)cognee/tests/unit/interfaces/graph/util.py (3)
28-29
: Consider implementing list handling inrun_test_against_ground_truth
Currently, when
ground_truth
is a list, the function raises aNotImplementedError
. Implementing support for lists would enhance the utility of this function and allow for comprehensive testing against ground truth data structures.
70-88
: Add docstring tocreate_society_person_recursive
functionConsider adding a docstring to explain the purpose, parameters, and return value of this recursive function for better maintainability and readability.
90-108
: Add docstring tocreate_organization_recursive
functionAdding a docstring will help clarify the functionality and usage of this recursive function, aiding future developers in understanding and maintaining the code.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (8)
cognee/modules/graph/utils/get_graph_from_model.py
(2 hunks)cognee/modules/graph/utils/get_model_instance_from_graph.py
(1 hunks)cognee/tests/unit/interfaces/graph/conftest.py
(2 hunks)cognee/tests/unit/interfaces/graph/get_graph_from_model_generative_test.py
(1 hunks)cognee/tests/unit/interfaces/graph/get_graph_from_model_test.py
(3 hunks)cognee/tests/unit/interfaces/graph/get_model_instance_from_graph_generative_test.py
(1 hunks)cognee/tests/unit/interfaces/graph/get_model_instance_from_graph_test.py
(2 hunks)cognee/tests/unit/interfaces/graph/util.py
(2 hunks)
🔇 Additional comments (7)
cognee/tests/unit/interfaces/graph/get_model_instance_from_graph_generative_test.py (1)
1-11
: LGTM! Well-organized imports
The imports are logically grouped and all are utilized in the test implementation.
cognee/tests/unit/interfaces/graph/get_model_instance_from_graph_test.py (2)
1-4
: LGTM! Clean and focused imports
The imports are well-organized, importing only the specific functions needed for the test.
28-30
: Consider renaming the test to better reflect its purpose
While the implementation looks good, the test name test_parsed_person
could be more descriptive. Consider renaming it to test_person_model_graph_roundtrip
to better reflect that it's testing the round-trip conversion from model to graph and back.
Let's verify the boris fixture exists and is properly typed:
✅ Verification successful
The test implementation and fixture are well-typed and appropriate
The test name suggestion can be considered, but the current implementation is solid:
- The
boris
fixture is properly defined and typed, inheriting fromDataPoint
via thePerson
class - All required models (
Person
,Car
) are properly defined with clear type hints - The test correctly verifies the round-trip conversion of a complex object graph (Person with Car relationships)
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the boris fixture and its type hints
# Look for the fixture definition and its type hints
rg -A 5 "def boris" --type py
Length of output: 423
Script:
#!/bin/bash
# Let's verify the Person model definition and its type hints
rg -A 10 "class Person" --type py
Length of output: 1568
Script:
#!/bin/bash
# Let's verify the Car model definition and check if both Person and Car inherit from DataPoint
rg -A 5 "class Car" --type py && rg "class DataPoint" --type py
Length of output: 1699
cognee/tests/unit/interfaces/graph/get_graph_from_model_generative_test.py (1)
1-9
: LGTM! Well-organized imports.
The imports are cleanly structured with appropriate separation between external and internal dependencies.
cognee/modules/graph/utils/get_model_instance_from_graph.py (1)
29-34
: Verify model compatibility before merging.
When merging models or creating new ones, it's important to ensure type compatibility between source and target nodes.
Let's verify the model compatibility in the codebase:
cognee/modules/graph/utils/get_graph_from_model.py (1)
6-14
: Proper Handling of Default Mutable Arguments
Replacing the default values of added_nodes
and added_edges
from empty dictionaries to None
and initializing them inside the function prevents unintended side effects caused by mutable default arguments. This is a good practice in Python.
cognee/tests/unit/interfaces/graph/util.py (1)
39-52
: LGTM: Class definitions for Organization
and SocietyPerson
The new classes Organization
and SocietyPerson
are correctly defined with appropriate type annotations and forward references. The use of model_rebuild()
ensures that forward references are resolved properly.
cognee/tests/unit/interfaces/graph/get_model_instance_from_graph_generative_test.py
Outdated
Show resolved
Hide resolved
cognee/tests/unit/interfaces/graph/get_graph_from_model_generative_test.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (3)
cognee/tests/unit/interfaces/graph/util.py (3)
28-29
: Enhance the NotImplementedError messageThe error message should provide more context about what list type is not supported and potential workarounds.
- raise NotImplementedError("Currently not implemented for 'list'") + raise NotImplementedError(f"Validation for list type attributes is not yet implemented. Key: {key}, Value: {ground_truth}")
54-66
: Consider expanding test data to include edge casesThe current test data is good but could be enhanced to cover more scenarios:
- Names with special characters
- Unicode characters
- Very long names
- Empty strings
69-86
: Consider making test data generation deterministicRandom ID generation could make tests non-deterministic and harder to debug. Consider:
- Using a seeded random generator
- Or using predictable ID patterns based on depth and position
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
cognee/tests/unit/interfaces/graph/util.py
(2 hunks)
🔇 Additional comments (1)
cognee/tests/unit/interfaces/graph/util.py (1)
39-51
: LGTM! Well-structured data models with proper type hints
The circular reference between Organization and SocietyPerson is handled correctly using forward references in type hints. The model_rebuild() calls ensure proper initialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (6)
profiling/graph_pydantic_conversion/profile_graph_pydantic_conversion.py (2)
1-10
: Remove unused import and consider reorganizing importsThe
time
module is imported but never used in the code. Additionally, consider using an absolute import path forbenchmark_function
to improve reliability.Apply this diff:
-import time +from cognee.profiling.benchmark_function import benchmark_function -from benchmark_function import benchmark_function from cognee.modules.graph.utils import get_graph_from_model from cognee.tests.unit.interfaces.graph.util import ( PERSON_NAMES, create_organization_recursive, )
1-43
: Consider integrating with a comprehensive performance testing frameworkThis script could be enhanced by:
- Integration with CI/CD pipelines for automated performance tracking
- Comparison with historical benchmark results
- Setting performance budgets and alerting on regressions
- Generating performance trend visualizations
Would you like assistance in setting up a more comprehensive performance testing framework?
profiling/graph_pydantic_conversion/benchmark_function.py (4)
9-20
: Enhance docstring with return value detailsWhile the docstring is well-structured, it would be helpful to specify the metrics included in the returned dictionary.
Returns: - Dictionary containing benchmark metrics + Dictionary containing benchmark metrics: + - mean_execution_time: Average execution time in seconds + - mean_peak_memory_mb: Average peak memory usage in MB + - mean_cpu_percent: Average CPU utilization percentage + - num_runs: Number of benchmark iterations performed + - std_execution_time: Standard deviation of execution times (if num_runs > 1)
48-48
: Consider including memory delta in metricsThe
memory_used
variable calculates the memory delta but isn't included in the final metrics. Consider adding it to provide insights into memory allocation patterns.analysis = { "mean_execution_time": statistics.mean(execution_times), "mean_peak_memory_mb": statistics.mean(peak_memory_usages), "mean_cpu_percent": statistics.mean(cpu_percentages), "num_runs": num_runs, + "memory_delta_bytes": memory_used, # Add memory allocation delta }
🧰 Tools
🪛 Ruff
48-48: Local variable
memory_used
is assigned to but never usedRemove assignment to unused variable
memory_used
(F841)
9-9
: Add input validation for num_runs parameterConsider adding validation to ensure
num_runs
is positive to prevent potential issues.def benchmark_function(func: Callable, *args, num_runs: int = 5) -> Dict[str, Any]: + if num_runs <= 0: + raise ValueError("num_runs must be positive")
25-56
: Consider adding error handling for system-level operationsThe benchmarking code interacts with system-level metrics through
psutil
andtracemalloc
. Consider adding try-except blocks to handle potential system-related errors gracefully.For example:
- Handle
psutil.NoSuchProcess
orpsutil.AccessDenied
- Ensure
tracemalloc
is properly stopped even if an error occurs- Consider using a context manager pattern for the benchmarking lifecycle
🧰 Tools
🪛 Ruff
36-36: Local variable
result
is assigned to but never usedRemove assignment to unused variable
result
(F841)
48-48: Local variable
memory_used
is assigned to but never usedRemove assignment to unused variable
memory_used
(F841)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
profiling/graph_pydantic_conversion/benchmark_function.py
(1 hunks)profiling/graph_pydantic_conversion/profile_graph_pydantic_conversion.py
(1 hunks)
🧰 Additional context used
🪛 Ruff
profiling/graph_pydantic_conversion/benchmark_function.py
36-36: Local variable result
is assigned to but never used
Remove assignment to unused variable result
(F841)
48-48: Local variable memory_used
is assigned to but never used
Remove assignment to unused variable memory_used
(F841)
🔇 Additional comments (3)
profiling/graph_pydantic_conversion/profile_graph_pydantic_conversion.py (1)
14-26
: LGTM! Well-structured argument parsing
The argument parser is well-implemented with clear descriptions and sensible defaults.
profiling/graph_pydantic_conversion/benchmark_function.py (2)
1-8
: LGTM! Well-organized imports
The imports are properly organized and include all necessary dependencies for the benchmarking functionality.
36-36
: Keep the result assignment
While the result
variable is flagged as unused, it's necessary to ensure the function actually executes and doesn't get optimized away by the interpreter.
🧰 Tools
🪛 Ruff
36-36: Local variable result
is assigned to but never used
Remove assignment to unused variable result
(F841)
profiling/graph_pydantic_conversion/profile_graph_pydantic_conversion.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (8)
cognee/modules/graph/utils/get_model_instance_from_graph.py (2)
9-20
: Add input validation and documentation.The function implementation is clean and well-typed. Consider these improvements:
- Add input validation for the aggregation function
- Add a docstring explaining the purpose, parameters, and return value
def merge_dicts(dict1: dict, dict2: dict, agg_fn: Callable) -> dict: + """Merge two dictionaries using an aggregation function for common keys. + + Args: + dict1: The first dictionary + dict2: The second dictionary + agg_fn: Callable that takes two values and returns the aggregated result + + Returns: + dict: Merged dictionary with aggregated values for common keys + + Raises: + TypeError: If agg_fn is not callable + """ + if not callable(agg_fn): + raise TypeError("agg_fn must be callable") + merged_dict = {}
37-54
: Consider extracting model creation logic.The model creation logic for both edge types is similar but duplicated. Consider extracting it into a helper function to improve maintainability and reduce duplication.
+def create_new_model( + source_node: DataPoint, + target_node: DataPoint, + edge_label: str, + is_list: bool = False +) -> tuple[type, dict]: + """Create a new model type and data dictionary based on edge type.""" + model_type = copy_model( + type(source_node), + {edge_label: (list[type(target_node)] if is_list else type(target_node), PydanticUndefined)} + ) + + if is_list: + model_data = merge_dicts( + source_node.model_dump(), + {edge_label: [target_node]}, + lambda a, b: a + b, + ) + else: + model_data = {**source_node.model_dump(), **{edge_label: target_node}} + + return model_type, model_data def get_model_instance_from_graph(...): # ... existing code ... if edge_type == "list": - NewModel = copy_model( - type(source_node), - {edge_label: (list[type(target_node)], PydanticUndefined)}, - ) - new_model_dict = merge_dicts( - source_node.model_dump(), - {edge_label: [target_node]}, - lambda a, b: a + b, - ) - node_map[source_node_id] = NewModel(**new_model_dict) + model_type, model_data = create_new_model( + source_node, target_node, edge_label, is_list=True + ) + node_map[source_node_id] = model_type(**model_data) else: - NewModel = copy_model( - type(source_node), - {edge_label: (type(target_node), PydanticUndefined)} - ) - node_map[target_node_id] = NewModel( - **source_node.model_dump(), **{edge_label: target_node} + model_type, model_data = create_new_model( + source_node, target_node, edge_label ) + node_map[target_node_id] = model_type(**model_data)cognee/modules/graph/utils/get_graph_from_model.py (1)
100-102
: Consider making the datetime format configurableThe datetime format is currently hardcoded. Consider:
- Moving it to a configuration constant
- Ensuring the format includes timezone information
- Aligning it with any database requirements
+# At the top of the file +DATETIME_FORMAT = "%Y-%m-%d %H:%M:%S%z" + # In the edge creation - "updated_at": datetime.now(timezone.utc).strftime( - "%Y-%m-%d %H:%M:%S" - ), + "updated_at": datetime.now(timezone.utc).strftime(DATETIME_FORMAT),cognee/tests/unit/interfaces/graph/util.py (5)
39-51
: Add docstrings to document the data models.The class structure and type hints look good, but adding docstrings would improve code documentation.
Add docstrings to describe the purpose and attributes of each class:
class Organization(DataPoint): + """Represents an organization in the society structure. + + Attributes: + id: Unique identifier for the organization + name: Name of the organization + members: Optional list of SocietyPerson members + """ id: str name: str members: Optional[list["SocietyPerson"]] class SocietyPerson(DataPoint): + """Represents a person in the society structure. + + Attributes: + id: Unique identifier for the person + name: Name of the person + memberships: Optional list of Organization memberships + """ id: str name: str memberships: Optional[list[Organization]]
54-66
: Add type hints to constants.The constants are well-organized but would benefit from explicit type hints.
-ORGANIZATION_NAMES = [ +ORGANIZATION_NAMES: list[str] = [ "ChessClub", # ... ] -PERSON_NAMES = ["Sarah", "Anna", "John", "Sam"] +PERSON_NAMES: list[str] = ["Sarah", "Anna", "John", "Sam"]
69-106
: Add return type hints to recursive functions.The recursive functions would benefit from explicit return type hints for better type safety.
-def create_society_person_recursive(id, name, organization_names, max_depth, depth=0): +def create_society_person_recursive(id: str, name: str, organization_names: list[str], max_depth: int, depth: int = 0) -> SocietyPerson: # ... -def create_organization_recursive(id, name, member_names, max_depth, depth=0): +def create_organization_recursive(id: str, name: str, member_names: list[str], max_depth: int, depth: int = 0) -> Organization: # ...
109-131
: Add type hints and docstring to count_society function.The function would benefit from explicit type hints and documentation.
-def count_society(obj): +def count_society(obj: Union[Organization, SocietyPerson]) -> tuple[int, int]: + """Count the number of organizations and society persons in a society structure. + + Args: + obj: An Organization or SocietyPerson object to count + + Returns: + A tuple of (organization_count, society_person_count) + + Raises: + TypeError: If obj is neither Organization nor SocietyPerson + """ # ...Don't forget to add the Union import:
-from typing import Any, Dict, Optional +from typing import Any, Dict, Optional, Union
134-149
: Add type hints and error handling to show_first_difference function.The function would benefit from type hints and proper error handling for empty strings.
-def show_first_difference(str1, str2, str1_name, str2_name, context=30): +def show_first_difference(str1: str, str2: str, str1_name: str, str2_name: str, context: int = 30) -> str: + """Show where two strings first diverge, with surrounding context. + + Args: + str1: First string to compare + str2: Second string to compare + str1_name: Name/identifier for the first string + str2_name: Name/identifier for the second string + context: Number of characters to show before and after the difference + + Returns: + A string describing where the first difference occurs + """ + if not str1 or not str2: + return f"One or both strings are empty: {str1_name}({len(str1)}) vs {str2_name}({len(str2)})" # ...
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
cognee/modules/graph/utils/get_graph_from_model.py
(2 hunks)cognee/modules/graph/utils/get_model_instance_from_graph.py
(1 hunks)cognee/tests/unit/interfaces/graph/util.py
(2 hunks)
🔇 Additional comments (7)
cognee/modules/graph/utils/get_model_instance_from_graph.py (2)
1-6
: LGTM! Imports and type hints are well structured.
The imports are correctly organized and the type hints are properly implemented.
23-27
: LGTM! Improved type safety and edge handling.
The function signature now has proper type hints for the edges parameter, making the expected structure explicit. The default value for edge_type provides better error handling.
cognee/modules/graph/utils/get_graph_from_model.py (4)
7-12
: LGTM! Improved parameter defaults and initialization
The changes to use None
as default instead of mutable empty dictionaries is a good practice that prevents the shared state issues common with mutable defaults. The initialization checks are clear and safe.
23-33
: LGTM! Improved code modularity
The extraction of node and edge handling logic into add_nodes_and_edges
improves code organization and readability while maintaining the same functionality.
43-50
: LGTM! Efficient edge metadata handling
The edge metadata updates are now handled efficiently by tracking the number of edges before the update and only modifying the newly added edges.
71-73
:
Issue with dict copies persists
The previous review comment about potential duplicate nodes and edges due to dict copies remains valid.
cognee/tests/unit/interfaces/graph/util.py (1)
1-2
: LGTM! Good error handling improvement.
The addition of necessary imports and the explicit NotImplementedError for list ground truth cases improves the code's robustness and maintainability.
Also applies to: 4-4, 28-29
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
cognee/modules/storage/utils/__init__.py (2)
32-34
: Add error handling for model rebuildingWhile adding
model_rebuild()
is good for ensuring model integrity, consider adding error handling to gracefully handle potential rebuild failures.- model = create_model(model.__name__, **final_fields) - model.model_rebuild() - return model + try: + model = create_model(model.__name__, **final_fields) + model.model_rebuild() + return model + except Exception as e: + raise ValueError(f"Failed to rebuild model {model.__name__}: {str(e)}")
32-34
: Consider caching rebuilt models for performanceSince model rebuilding can be computationally expensive, consider caching rebuilt models using a class-level cache for frequently used model configurations.
Example implementation:
from functools import lru_cache @lru_cache(maxsize=128) def _create_and_rebuild_model(model_name: str, fields_hash: str): fields = json.loads(fields_hash) # Convert back from JSON string model = create_model(model_name, **fields) model.model_rebuild() return model def copy_model(model: DataPoint, include_fields: dict = {}, exclude_fields: list = []): fields = { name: (field.annotation, field.default if field.default is not None else PydanticUndefined) for name, field in model.model_fields.items() if name not in exclude_fields } final_fields = {**fields, **include_fields} # Convert fields to a hashable string for caching fields_hash = json.dumps(final_fields, sort_keys=True) return _create_and_rebuild_model(model.__name__, fields_hash)
|
GitGuardian id | GitGuardian status | Secret | Commit | Filename | |
---|---|---|---|---|---|
9573981 | Triggered | Generic Password | fde56f0 | notebooks/cognee_llama_index.ipynb | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secret safely. Learn here the best practices.
- Revoke and rotate this secret.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (1)
cognee/tests/unit/interfaces/graph/util.py (1)
70-87
: Add depth validation and consider UUID for IDsWhile the implementation is functional, consider these improvements:
- Add depth validation to prevent stack overflow (as suggested in past review)
- Consider using UUID for IDs instead of random strings to prevent potential collisions
def create_society_person_recursive(id, name, organization_names, max_depth, depth=0): + if max_depth < 0: + raise ValueError("max_depth must be non-negative") + if depth > 100: # reasonable limit to prevent stack overflow + raise ValueError("Maximum recursion depth exceeded") - id_suffix = "".join(random.choice(string.ascii_lowercase) for _ in range(10)) + id_suffix = str(uuid.uuid4())
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
cognee/tests/unit/interfaces/graph/get_graph_from_model_generative_test.py
(1 hunks)cognee/tests/unit/interfaces/graph/get_model_instance_from_graph_generative_test.py
(1 hunks)cognee/tests/unit/interfaces/graph/util.py
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- cognee/tests/unit/interfaces/graph/get_graph_from_model_generative_test.py
- cognee/tests/unit/interfaces/graph/get_model_instance_from_graph_generative_test.py
🔇 Additional comments (5)
cognee/tests/unit/interfaces/graph/util.py (5)
1-2
: LGTM! Good practice using NotImplementedError
The addition of required imports and explicit handling of unsupported list type in ground truth is a good practice.
Also applies to: 4-4, 28-29
39-52
: Well-structured classes with proper type hints
Good implementation of:
- Forward references for circular dependencies using string literals
- Optional type hints for nullable fields
- Proper model rebuilding for circular references
55-67
: LGTM! Well-defined test data constants
Good variety of test data with clear naming conventions.
110-132
: Add type hints and improve error handling
The error handling improvement was noted in a past review. Additionally:
- Add type hints to improve code clarity and enable better IDE support
-def count_society(obj):
+def count_society(obj: Union[Organization, SocietyPerson]) -> Tuple[int, int]:
135-150
: Add docstring and fix index handling
The index handling issues were noted in a past review. Additionally:
- Add a docstring explaining the return format and the context parameter
def show_first_difference(str1, str2, str1_name, str2_name, context=30):
+ """Compare two strings and show their first difference with context.
+
+ Args:
+ str1: First string to compare
+ str2: Second string to compare
+ str1_name: Name/identifier for the first string
+ str2_name: Name/identifier for the second string
+ context: Number of characters to show before and after the difference
+
+ Returns:
+ A formatted string showing the first difference with context, or a message
+ if the strings are identical or one is a prefix of the other.
+ """
Hey @borisarzentar, I created two pydantic Classes, Weirdly, the Thanks! |
type(source_node), | ||
{edge_label: (list[type(target_node)], PydanticUndefined)}, | ||
) | ||
new_model_dict = merge_dicts( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merge_dicts
seems like an overkill here, we want to merge just one property here. It will iterate over properties of both objects for one property, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good point, I changed it to merge the key values directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comments! I removed the merge_dicts call.
type(source_node), | ||
{edge_label: (list[type(target_node)], PydanticUndefined)}, | ||
) | ||
new_model_dict = merge_dicts( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good point, I changed it to merge the key values directly.
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Refactor
get_graph_from_model
andget_model_instance_from_graph
functions for better readability and maintainability.Documentation