Optimize remote db #41

granawkins · 2024-05-16T09:27:31Z

The basic approach is:

Store document in the graph, not just the database. This means the .json files saved will be quite a bit bigger, but never more than 2x the size of the codebase, which isn't crazy, each time it updates it will overwrite the previous version instead of saving a new one.
Do all database operations as batches

Seems to be all working alright on smaller repos, and I'm getting the expected speedup when working with the remote database.

* make db.get and db.update batch calls * add document to the graph for local use

mentatai

Overall, the pull request aims to reduce direct database operations within annotators and store more data in memory, which aligns with the approach of writing and reading less often from the database to improve performance. The proposed changes appear to align with these goals by modifying method signatures and internal logic to eliminate unnecessary database interactions. It's important though to ensure all related changes are fully implemented across the entire codebase to maintain consistency and functionality.

Butler is in closed beta. Reply with feedback or to ask Butler to review other parts of the PR. Please give feedback with emoji reacts.

mentatai · 2024-05-16T09:28:05Z

ragdaemon/annotators/call_graph.py

-        # Save to db and graph
-        metadatas = record["metadatas"][0]
-        metadatas[self.call_field_id] = json.dumps(calls)
-        db.update(data["checksum"], metadatas=metadatas)
        data[self.call_field_id] = calls


Missing removal of db parameter as suggested in docstring change at line 155. This should be consistent across the class if database interactions are removed as per the changes in other parts of the code.

Suggested change

data[self.call_field_id] = calls

async def annotate(

self, graph: KnowledgeGraph, refresh: bool = False,

) -> KnowledgeGraph:

mentatai · 2024-05-16T09:28:05Z

ragdaemon/annotators/summarizer.py

@@ -212,7 +211,6 @@ def is_complete(self, graph: KnowledgeGraph, db: Database) -> bool:
            document, context = get_document_and_context(
                node,
                graph,
-                db,
                summary_field_id=self.summary_field_id,


The db parameter is still mentioned in the annotate method whereas it appears to be unused considering recent changes that focus on reducing direct database operations within annotators.

Suggested change

summary_field_id=self.summary_field_id,

async def annotate(

self, graph: KnowledgeGraph, refresh: bool = False

) -> KnowledgeGraph:

granawkins added 10 commits May 16, 2024 08:49

optimize HierarchyAnnotator for remote db

d49fbc1

* make db.get and db.update batch calls * add document to the graph for local use

optimize ChunkerAnnotator for remote db

31541f5

optimize DiffAnnotator for remote db

774ff69

optimize CallGraphAnnotator for remote db

d83913c

remove db from ContextBuilder entirely

1862aa4

optimize SummarizerAnnotator for remote db

2cb4563

optimize database.query for remote db

417d198

minor version bump

aa32a4c

fixes from testing

c840780

fix chunk parent resolution

9a844b2

mentatai bot reviewed May 16, 2024

View reviewed changes

granawkins added 4 commits May 17, 2024 08:48

fix chunker issues

d74eecc

switch DEFAULT_COMPLETIONS_MODEL to gpt-4o

0d475d7

done use chroma.upsert, it duplicates embeddings

0e42260

format fixes

2dcc6c8

granawkins merged commit 781e8fb into main May 17, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize remote db #41

Optimize remote db #41

granawkins commented May 16, 2024

mentatai bot left a comment

mentatai bot May 16, 2024

mentatai bot May 16, 2024

-        data[self.call_field_id] = calls
+async def annotate(
+    self, graph: KnowledgeGraph, refresh: bool = False,
+) -> KnowledgeGraph:

Optimize remote db #41

Optimize remote db #41

Conversation

granawkins commented May 16, 2024

mentatai bot left a comment

Choose a reason for hiding this comment

mentatai bot May 16, 2024

Choose a reason for hiding this comment

mentatai bot May 16, 2024

Choose a reason for hiding this comment