-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Structured code summarization #375
Changes from all commits
6a0c1da
f46e8aa
15cf708
d5efe03
313395a
17a116a
52a0dac
b78a011
4023b4d
c55227c
ad1b435
9e6406a
51271b1
c299995
6096485
3529a70
d9374a0
cf1d53a
7523b0e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
You are an expert Python programmer and technical writer. Your task is to summarize the given Python code snippet or file. | ||
The code may contain multiple imports, classes, functions, constants and logic. Provide a clear, structured explanation of its components | ||
and their relationships. | ||
|
||
Instructions: | ||
Provide an overview: Start with a high-level summary of what the code does as a whole. | ||
Break it down: Summarize each class and function individually, explaining their purpose and how they interact. | ||
Describe the workflow: Outline how the classes and functions work together. Mention any control flow (e.g., main functions, entry points, loops). | ||
Key features: Highlight important elements like arguments, return values, or unique logic. | ||
Maintain clarity: Write in plain English for someone familiar with Python but unfamiliar with this code. |
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -1,39 +1,36 @@ | ||||||||||||||||||||||||
import asyncio | ||||||||||||||||||||||||
from typing import AsyncGenerator, Union | ||||||||||||||||||||||||
from uuid import uuid5 | ||||||||||||||||||||||||
from typing import Type | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
from pydantic import BaseModel | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
from cognee.infrastructure.engine import DataPoint | ||||||||||||||||||||||||
from cognee.modules.data.extraction.extract_summary import extract_summary | ||||||||||||||||||||||||
from cognee.shared.CodeGraphEntities import CodeFile | ||||||||||||||||||||||||
from cognee.modules.data.extraction.extract_summary import extract_code_summary | ||||||||||||||||||||||||
from .models import CodeSummary | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
|
||||||||||||||||||||||||
async def summarize_code( | ||||||||||||||||||||||||
code_graph_nodes: list[DataPoint], | ||||||||||||||||||||||||
summarization_model: Type[BaseModel], | ||||||||||||||||||||||||
) -> list[DataPoint]: | ||||||||||||||||||||||||
) -> AsyncGenerator[Union[DataPoint, CodeSummary], None]: | ||||||||||||||||||||||||
if len(code_graph_nodes) == 0: | ||||||||||||||||||||||||
return | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
code_files_data_points = [file for file in code_graph_nodes if isinstance(file, CodeFile)] | ||||||||||||||||||||||||
code_data_points = [file for file in code_graph_nodes if hasattr(file, "source_code")] | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
file_summaries = await asyncio.gather( | ||||||||||||||||||||||||
*[extract_summary(file.source_code, summarization_model) for file in code_files_data_points] | ||||||||||||||||||||||||
*[extract_code_summary(file.source_code) for file in code_data_points] | ||||||||||||||||||||||||
) | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
file_summaries_map = { | ||||||||||||||||||||||||
code_file_data_point.extracted_id: file_summary.summary | ||||||||||||||||||||||||
for code_file_data_point, file_summary in zip(code_files_data_points, file_summaries) | ||||||||||||||||||||||||
code_data_point.extracted_id: str(file_summary) | ||||||||||||||||||||||||
for code_data_point, file_summary in zip(code_data_points, file_summaries) | ||||||||||||||||||||||||
} | ||||||||||||||||||||||||
Comment on lines
23
to
26
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add null check for extracted_id in file_summaries_map The code assumes that extracted_id is always present and non-null, which might not be true. file_summaries_map = {
- code_data_point.extracted_id: str(file_summary)
+ code_data_point.extracted_id: str(file_summary)
for code_data_point, file_summary in zip(code_data_points, file_summaries)
+ if code_data_point.extracted_id is not None
} 📝 Committable suggestion
Suggested change
|
||||||||||||||||||||||||
|
||||||||||||||||||||||||
for node in code_graph_nodes: | ||||||||||||||||||||||||
if not isinstance(node, DataPoint): | ||||||||||||||||||||||||
continue | ||||||||||||||||||||||||
yield node | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
if not isinstance(node, CodeFile): | ||||||||||||||||||||||||
if not hasattr(node, "source_code"): | ||||||||||||||||||||||||
continue | ||||||||||||||||||||||||
|
||||||||||||||||||||||||
yield CodeSummary( | ||||||||||||||||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Use proper async generator syntax for early return
The early return for empty code_graph_nodes should use async generator syntax.