-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize remote db #41
Conversation
* make db.get and db.update batch calls * add document to the graph for local use
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, the pull request aims to reduce direct database operations within annotators and store more data in memory, which aligns with the approach of writing and reading less often from the database to improve performance. The proposed changes appear to align with these goals by modifying method signatures and internal logic to eliminate unnecessary database interactions. It's important though to ensure all related changes are fully implemented across the entire codebase to maintain consistency and functionality.
Butler is in closed beta. Reply with feedback or to ask Butler to review other parts of the PR. Please give feedback with emoji reacts.
# Save to db and graph | ||
metadatas = record["metadatas"][0] | ||
metadatas[self.call_field_id] = json.dumps(calls) | ||
db.update(data["checksum"], metadatas=metadatas) | ||
data[self.call_field_id] = calls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing removal of db
parameter as suggested in docstring change at line 155. This should be consistent across the class if database interactions are removed as per the changes in other parts of the code.
data[self.call_field_id] = calls | |
async def annotate( | |
self, graph: KnowledgeGraph, refresh: bool = False, | |
) -> KnowledgeGraph: |
@@ -212,7 +211,6 @@ def is_complete(self, graph: KnowledgeGraph, db: Database) -> bool: | |||
document, context = get_document_and_context( | |||
node, | |||
graph, | |||
db, | |||
summary_field_id=self.summary_field_id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The db
parameter is still mentioned in the annotate
method whereas it appears to be unused considering recent changes that focus on reducing direct database operations within annotators.
summary_field_id=self.summary_field_id, | |
async def annotate( | |
self, graph: KnowledgeGraph, refresh: bool = False | |
) -> KnowledgeGraph: |
The basic approach is:
document
in the graph, not just the database. This means the .json files saved will be quite a bit bigger, but never more than 2x the size of the codebase, which isn't crazy, each time it updates it will overwrite the previous version instead of saving a new one.Seems to be all working alright on smaller repos, and I'm getting the expected speedup when working with the remote database.