-
Notifications
You must be signed in to change notification settings - Fork 18
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
updated README.MD and added some artifacts in random folder
- Loading branch information
1 parent
e5b6101
commit 6e95e2e
Showing
4 changed files
with
92 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,79 @@ | ||
|
||
# Knowledge Graph Evaluation | ||
This module introduces methods to evaluate performance of GraphRag.We have following integrations for evaluation | ||
|
||
- Ilama-Index evaluation pack | ||
- Ragas evaluation pack | ||
This module provides methods to evaluate the performance of GraphRag. The following integrations are available for evaluation: | ||
|
||
- **Llama-Index Evaluation Pack** | ||
- **Ragas Evaluation Pack** | ||
|
||
Additionally, this module includes scripts for creating custom test datasets to benchmark and evaluate GraphRag. | ||
|
||
## Getting Started | ||
|
||
### Evaluating Your Knowledge Graph | ||
|
||
You can easily evaluate the performance of your query engine using this module. | ||
|
||
#### 1. Load and Evaluate Your Dataset | ||
|
||
Use the `load_test_dataset` function to load your dataset and directly evaluate it using the `evaluate` function. This method handles all necessary steps, including batching the data. | ||
|
||
```python | ||
from your_module import load_test_dataset, evaluate | ||
|
||
# Step 1: Load the dataset from a pickle file | ||
dataset_path = "random/dataset_200_llama3.pkl" | ||
test_dataset = load_test_dataset(dataset_path) | ||
``` | ||
|
||
> **Note:** `test_dataset` is a list of Llama-Index `Document` objects. | ||
```python | ||
# Step 2: Define the language model and embedding | ||
llm = Ollama(base_url="http://localhost:11434", model="codellama") | ||
embedding = HuggingFaceEmbedding(model_name="microsoft/codebert-base") | ||
|
||
# Step 3: Specify the metrics for evaluation | ||
metrics = [faithfulness, answer_relevancy, context_precision, context_recall] | ||
|
||
# Step 4: Load the query engine (Llama-Index) | ||
from graph_rag.graph_retrieval.graph_retrieval import get_index_from_pickle, get_query_engine | ||
|
||
index = get_index_from_pickle("/content/Results/graphIndex.pkl") | ||
query_engine = get_query_engine(index) | ||
|
||
# Step 5: Evaluate the dataset | ||
evaluation_results = evaluate( | ||
query_engine=query_engine, | ||
dataset=test_dataset, | ||
llm=llm, | ||
embeddings=embedding, | ||
metrics=metrics, | ||
# Default batch size is 4 | ||
) | ||
``` | ||
|
||
**Output:** | ||
```python | ||
{'faithfulness': 0.0333, 'answer_relevancy': 0.9834, 'context_precision': 0.2000, 'context_recall': 0.8048} | ||
``` | ||
|
||
```python | ||
rdf = evaluation_results.to_pandas() | ||
rdf.to_csv("results.csv", index=False) | ||
``` | ||
--- | ||
**Detailed Result:** | ||
|
||
| question | contexts | answer | ground_truth | faithfulness | answer_relevancy | context_precision | context_recall | | ||
|-----------------------------------------------|---------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------|--------------|------------------|-------------------|----------------| | ||
| What is mixed precision in computing? | [Examples GPT-2 text generation Parameter…] | Mixed precision is a technique used to improve… | A combination of different numerical precision… | 0.166667 | 0.981859 | 0.0 | 0.666667 | | ||
| What is the title of the guide discussed in th... | [Available guides… Hyperparameter T…] | The title of the guide discussed in the given… | How to distribute training | 0.000000 | 1.000000 | 0.0 | 1.000000 | | ||
| What is Keras 3? | [No relationships found.] | Keras 3 is a new version of the popular deep l… | A deep learning framework that works with Tensor… | 0.000000 | 0.974711 | 0.0 | 0.500000 | | ||
| What was the percentage boost in StableDiffusion... | [A first example: A MNIST convnet…] | The percentage boost in StableDiffusion traini… | Over 150% | 0.000000 | 0.970565 | 1.0 | 1.000000 | | ||
| What are some examples of pretrained models av... | [No relationships found.] | Some examples of pre-trained models available… | BERT, OPT, Whisper, T5, StableDiffusion, YOLOv8… | 0.000000 | 0.989769 | 0.0 | 0.857143 | | ||
|
||
|
||
|
||
|
||
|
||
This module also provide scripts for creating custom Test Data-Set for benchmarking/evaluating GraphRag. |
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
question,contexts,answer,ground_truth,faithfulness,answer_relevancy,context_precision,context_recall | ||
"What is mixed precision in computing? | ||
","['Examples\n\n* GPT-2 text generation\n* Parameter-efficient fine-tuning of GPT-2 with LoRA\n* Semantic Similarity\n* Sentence embeddings using Siamese RoBERTa-networks\n* Data Parallel Training with tf.distribute\n* English-to-Spanish translation\n* GPT text generation from scratch\n* Text Classification using FNet\n\n---' | ||
""Supported hardware\n\nWhile mixed precision will run on most hardware, it will only speed up models on recent NVIDIA GPUs and Google TPUs.\nNVIDIA GPUs support using a mix of float16 and float32, while TPUs support a mix of bfloat16 and float32.\n\nAmong NVIDIA GPUs, those with compute capability 7.0 or higher will see the greatest performance benefit\nfrom mixed precision because they have special hardware units, called Tensor Cores,\nto accelerate float16 matrix multiplications and convolutions. Older GPUs offer no math\nperformance benefit for using mixed precision, however memory and bandwidth savings can enable some speedups.\nYou can look up the compute capability for your GPU at NVIDIA's CUDA GPU web page.\nExamples of GPUs that will benefit most from mixed precision include RTX GPUs, the V100, and the A100.\n\nEven on CPUs and older GPUs, where no speedup is expected, mixed precision APIs can still be used for unit testing,\ndebugging, or just to try out the API. On CPUs, mixed precision will run significantly slower, however.\n\n\nYou can check your GPU type with the following command:\n\n```\nnvidia-smi -L\n```"" | ||
""The following are knowledge sequence in max depth 2 in the form of directed graph like:\n`subject -[predicate]->, object, <-[predicate_next_hop]-, object_next_hop ...`\n['Mixed precision', 'Will run on most hardware', 'But will only speed up models on recent nvidia gpus and google tpus']""]","Mixed precision is a technique used to improve the efficiency of computer computations by using lower-precision data types (e.g., float16 or bfloat16) for some parts of the computation while still maintaining accuracy. It is commonly used in deep learning and other numerical compute-intensive applications, where memory bandwidth and computational resources are limited. | ||
|
||
While mixed precision will run on most hardware, it will only speed up models on recent NVIDIA GPUs and Google TPUs. Older GPUs offer no math performance benefit for using mixed precision, however memory and bandwidth savings can enable some speedups. You can look up the compute capability for your GPU at NVIDIA's CUDA GPU web page to determine whether it will benefit from mixed precision. Examples of GPUs that will benefit most from mixed precision include RTX GPUs, the V100, and the A100.",A combination of different numerical precisions used to optimize memory usage and computation speed.,0.16666666666666666,0.9818589787934606,0.0,0.6666666666666666 | ||
"What is the title of the guide discussed in the given context? | ||
","['Available guides\n\n{{toc}}' | ||
'Hyperparameter Tuning\n\nThese guides cover KerasTuner best practices.']","The title of the guide discussed in the given context is ""Hyperparameter Tuning""",How to distribute training,0.0,0.9999999999999996,0.0,1.0 | ||
"What is Keras 3? | ||
",['No relationships found.'],"Keras 3 is a new version of the popular deep learning framework Keras that was released in October 2020. It introduces several new features and improvements over the previous versions, including support for more complex architectures, improved performance on certain tasks, and easier integration with other libraries and frameworks.","A deep learning framework that works with TensorFlow, JAX, and PyTorch interchangeably.",0.0,0.9747114512069875,0.0,0.5 | ||
"What was the percentage boost in StableDiffusion training throughput with Keras 3? | ||
","['A first example: A MNIST convnet\n\nLet\'s start with the Hello World of ML: training a convnet\nto classify MNIST digits.\n\nHere\'s the data:\n\n\n```python\n# Load the data and split it between train and test sets\n(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()\n\n# Scale images to the [0, 1] range\nx_train = x_train.astype(""float32"") / 255\nx_test = x_test.astype(""float32"") / 255\n# Make sure images have shape (28, 28, 1)\nx_train = np.expand_dims(x_train, -1)\nx_test = np.expand_dims(x_test, -1)\nprint(""x_train shape:"", x_train.shape)\nprint(""y_train shape:"", y_train.shape)\nprint(x_train.shape[0], ""train samples"")\nprint(x_test.shape[0], ""test samples"")\n```\n\n\n```\nx_train shape: (60000, 28, 28, 1)\ny_train shape: (60000,)\n60000 train samples\n10000 test samples\n\n```\n\nHere\'s our model.\n\nDifferent model-building options that Keras offers include:\n\n- The Sequential API (what we use below)\n- The Functional API (most typical)\n- Writing your own models yourself via subclassing (for advanced use cases)\n\n\n```python\n# Model parameters\nnum_classes = 10\ninput_shape = (28, 28, 1)\n\nmodel = keras.Sequential(\n [\n keras.layers.Input(shape=input_shape),\n keras.layers.Conv2D(64, kernel_size=(3, 3), activation=""relu""),\n keras.layers.Conv2D(64, kernel_size=(3, 3), activation=""relu""),\n keras.layers.MaxPooling2D(pool_size=(2, 2)),\n keras.layers.Conv2D(128, kernel_size=(3, 3), activation=""relu""),\n keras.layers.Conv2D(128, kernel_size=(3, 3), activation=""relu""),\n keras.layers.GlobalAveragePooling2D(),\n keras.layers.Dropout(0.5),\n keras.layers.Dense(num_classes, activation=""softmax""),\n ]\n)\n```\n\nHere\'s our model summary:\n\n\n```python\nmodel.summary()\n```\n\n\nModel: ""sequential""']","The percentage boost in StableDiffusion training throughput with Keras 3 is not mentioned in the provided context information. Therefore, I cannot answer this question.",Over 150%,0.0,0.9705647972395562,0.9999999999,1.0 | ||
"What are some examples of pretrained models available in KerasCV and KerasNLP? | ||
",['No relationships found.'],"Some examples of pre-trained models available in KerasCV include the InceptionV3 model, the ResNet50 model, and the VGG16 model. Some examples of pre-trained models available in KerasNLP include the BERT model, the RoBERTa model, and the DistilBERT model. These models can be used for a variety of natural language processing tasks such as text classification, sentiment analysis, named entity recognition, and question answering.","BERT, OPT, Whisper, T5, StableDiffusion, YOLOv8, SegmentAnything, etc.",0.0,0.9897694771234743,0.0,0.8571428571428571 |