-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Issue]: Does cache still work if extracting one-file graphRAG from a multiple files graphRAG? #819
Comments
Here is my settings.yml file
|
If your settings have not been changed, and the original file is still in the folder, then it should use the cache in several places. For example, the text units (chunks) should be identical, so graph extraction should use the cache for those. However, and new entities and relationships extracted from the second file will trigger re-compute of the communities, and therefore all of the community summarization, which can be much of your overall expense. We're tracking more efficient incremental indexing with #741. |
Hi, @natoverse , Below is how I change: |
I don't think it should matter - the key to getting an accurate cache is that we hash all of the LLM params and prompt so that identical API calls are avoided. This is done per step, so individual parameter changes should only affect the steps that rely on them. |
Thank you @natoverse for your graphRAG and your answer, I still have one question that is related to this topic: Since I generated graphRAG using two files at first; however, I decided to build a graphRAG using one of them later. I am wondering whether the system needs to regenerate the entity summary because the description list may be changed resulting from the reduction of input documents. So as to the summaries of relationships and claims. |
The entity/relationship extraction step is separate from the summarization. When extracting, each entity and relationship is given a description by the LLM. This will get the benefit of the cache. Before creating the community reports, the descriptions for each entity are combined into a single "canonical" description. This is also done by the LLMs, and if you have new instances of the entities, it should not use the cache. |
Many thanks |
Hi, Here is the scenerio I currently confront:
I build a graphRAG based on two distinct .txt file. And later, I want to see if I can build a graphRAG based on one of them.
After I modify the settings file to ensure that only one file gets ingested, I run the following command
python -m graphrag.index --root .
I was expecting that this act will not cost too much if the indexing stage can leverage the cache; however, it still make complete calls to Openai to build the graph.
So, can someone tells me if I did wrong or this scenerio has not been supported yet.
Many thanks.
The text was updated successfully, but these errors were encountered: