Enable cmd runner #30

Vasilije1990 · 2023-10-28T22:19:01Z

Summary by CodeRabbit

New Feature: Added a new endpoint /rag-test/rag_test_run to handle RAG tests.
New Feature: Introduced a new DocsModel to manage document-related data in the database.
New Feature: Implemented a dynamic_test_manager function to manage dynamic tests.
Refactor: Updated command-line arguments in the README file, replacing --url with --file and adding --retriever_type.
Refactor: Enhanced rag_test_manager with additional functions and improved existing ones for better error handling and functionality.
Style: Adjusted the condition in vectordb/loaders/loaders.py to handle text document formats in uppercase.
Documentation: Updated README.md and code comments for better clarity and understanding.

coderabbitai · 2023-10-28T22:19:08Z

Walkthrough

The codebase has undergone significant changes, including updates to command-line arguments, addition of new endpoints, model and function definitions, and modifications to existing functions. The changes aim to enhance the functionality, improve error handling, and facilitate dynamic testing.

Changes

File	Summary
`README.md`	Updated command-line arguments and their default values.
`level_3/api.py`	Added a new endpoint and defined its handling function.
`level_3/models/docs.py`	Introduced a new `DocsModel` to represent a 'docs' table in the database.
`level_3/rag_test_manager.py`	Added new functions, updated existing ones, and included logging for debugging.
`level_3/vectordb/.../loaders.py`	Altered a condition in an if statement to handle text document format.

🐇💻

"In the land of code, where the shadows lie,

Changes come, under the cloudless sky.

New functions born, old ones refined,

In the quest for bugs, that we're destined to find.

With every keystroke, and every line,

We weave magic, oh so divine.

So here's to the changes, big and small,

In the world of code, we stand tall!" 🎉🥕

Tips

Chat with CodeRabbit Bot (`@coderabbitai`)

If you reply to a review comment from CodeRabbit, the bot will automatically respond.
To engage with CodeRabbit bot directly around the specific lines of code in the PR, mention @coderabbitai in your review comment
Note: Review comments are made on code diffs or files, not on the PR overview.
Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai help to get help.
@coderabbitai resolve to resolve all the CodeRabbit review comments.

Note: For conversation with the bot, please use the review comments on code diffs or files.

CodeRabbit Configration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
The JSON schema for the configuration file is available here.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.json

coderabbitai

Review Status

Actionable comments generated: 2

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 259198d and 12796a2.

Files ignored due to filter (2)

level_3/docker-compose.yml
level_3/poetry.lock

Files selected for processing (5)

README.md (1 hunks)
level_3/api.py (2 hunks)
level_3/models/docs.py (1 hunks)
level_3/rag_test_manager.py (11 hunks)
level_3/vectordb/loaders/loaders.py (1 hunks)

Files skipped from review due to trivial changes (1)

README.md

Additional comments: 14

level_3/models/docs.py (1)

1-18: The DocsModel class seems to be well defined with appropriate relationships and columns. However, it's important to ensure that the operation_id foreign key is correctly set up in the Operation model and that the back_populates attribute is correctly used. Also, ensure that the id column is being populated with unique values, as it is the primary key.
level_3/vectordb/loaders/loaders.py (1)
48-50: The condition in the if statement has been changed from document_format == "text" to document_format == "TEXT". This change will affect the control flow of the code. Ensure that all instances where this function is called have been updated to use the correct case for the document format. Alternatively, consider making the comparison case-insensitive to avoid potential issues.
- elif document_format == "TEXT":
+ elif document_format.upper() == "TEXT":
level_3/api.py (2)

7-14: The import statements have been updated to reflect the new module structure. Ensure that these changes do not break any existing functionality.

205-220: A new endpoint /rag-test/rag_test_run has been added. This endpoint is handled by the rag_test_run function, which takes a payload parameter of type Payload and returns a dictionary. The function calls the start_test function from the rag_test_manager module with the decoded payload data. The response from the start_test function is returned as a JSON response.

Ensure that the start_test function is correctly implemented and that it handles all possible edge cases. Also, make sure that the Payload model correctly validates the incoming data.

level_3/rag_test_manager.py (10)

314-318: The renaming of the parameters in the LLMTestCase instantiation is fine as long as it is consistent throughout the codebase. Ensure that all references to these parameters have been updated accordingly.

385-404: The data_format_route function has been updated to handle case-insensitive matching and to return a default category if no match is found. This is a good improvement for robustness and flexibility.

406-426: The data_location_route function has been updated to handle case-insensitive matching and to return a default category if no match is found. This is a good improvement for robustness and flexibility.

451-454: The start_test function has been updated to accept a new parameter retriever_type. Ensure that all calls to this function throughout the codebase have been updated to match the new signature.

515-525: The start_test function has been updated to include document names in the Docs entity. This is a good improvement for data tracking.

613-617: The start_test function has been updated to handle the llm_context retriever type. Ensure that the run_eval function is implemented correctly and that the test_qa and context parameters are passed correctly.

742-748: The command-line argument parsing in the main function has been updated. Ensure that the help messages for each argument are clear and accurate.

752-758: The main function has been updated to load the test set from a JSON file. It correctly handles exceptions and checks that the loaded JSON is a list.

761-767: The main function has been updated to load metadata from a JSON file. It correctly handles exceptions and checks that the loaded JSON is a dictionary.

769-778: The main function has been updated to parse additional parameters from a JSON string. It correctly handles exceptions and checks that the parsed JSON is a dictionary.

780:
The main function has been updated to call the start_test function with the parsed command-line arguments. Ensure that the start_test function is implemented correctly and that the arguments are passed correctly.

coderabbitai · 2023-10-28T22:21:15Z

level_3/rag_test_manager.py

+def get_document_names(doc_input):
+    """
+    Get a list of document names.
+
+    This function takes doc_input, which can be a folder path, a single document file path, or a document name as a string.
+    It returns a list of document names based on the doc_input.
+
+    Args:
+        doc_input (str): The doc_input can be a folder path, a single document file path, or a document name as a string.
+
+    Returns:
+        list: A list of document names.

+    Example usage:
+        - Folder path: get_document_names(".data")
+        - Single document file path: get_document_names(".data/example.pdf")
+        - Document name provided as a string: get_document_names("example.docx")
+    """
+    if os.path.isdir(doc_input):
+        # doc_input is a folder
+        folder_path = doc_input
+        document_names = []
+        for filename in os.listdir(folder_path):
+            if os.path.isfile(os.path.join(folder_path, filename)):
+                document_names.append(filename)
+        return document_names
+    elif os.path.isfile(doc_input):
+        # doc_input is a single document file
+        return [os.path.basename(doc_input)]
+    elif isinstance(doc_input, str):
+        # doc_input is a document name provided as a string
+        return [doc_input]
+    else:
+        # doc_input is not valid
+        return []


The get_document_names function is well implemented. It handles different types of inputs and returns a list of document names accordingly. However, it would be better to raise an exception when the input is not valid instead of returning an empty list. This would make it easier to debug issues related to invalid inputs.

- # doc_input is not valid - return [] + raise ValueError(f"Invalid doc_input: {doc_input}")

Commitable suggestion (Beta)

Suggested change

def get_document_names(doc_input):

"""

Get a list of document names.

This function takes doc_input, which can be a folder path, a single document file path, or a document name as a string.

It returns a list of document names based on the doc_input.

Args:

doc_input (str): The doc_input can be a folder path, a single document file path, or a document name as a string.

Returns:

list: A list of document names.

Example usage:

- Folder path: get_document_names(".data")

- Single document file path: get_document_names(".data/example.pdf")

- Document name provided as a string: get_document_names("example.docx")

"""

if os.path.isdir(doc_input):

# doc_input is a folder

folder_path = doc_input

document_names = []

for filename in os.listdir(folder_path):

if os.path.isfile(os.path.join(folder_path, filename)):

document_names.append(filename)

return document_names

elif os.path.isfile(doc_input):

# doc_input is a single document file

return [os.path.basename(doc_input)]

elif isinstance(doc_input, str):

# doc_input is a document name provided as a string

return [doc_input]

else:

# doc_input is not valid

return []

def get_document_names(doc_input):

"""

Get a list of document names.

This function takes doc_input, which can be a folder path, a single document file path, or a document name as a string.

It returns a list of document names based on the doc_input.

Args:

doc_input (str): The doc_input can be a folder path, a single document file path, or a document name as a string.

Returns:

list: A list of document names.

Example usage:

- Folder path: get_document_names(".data")

- Single document file path: get_document_names(".data/example.pdf")

- Document name provided as a string: get_document_names("example.docx")

"""

if os.path.isdir(doc_input):

# doc_input is a folder

folder_path = doc_input

document_names = []

for filename in os.listdir(folder_path):

if os.path.isfile(os.path.join(folder_path, filename)):

document_names.append(filename)

return document_names

elif os.path.isfile(doc_input):

# doc_input is a single document file

return [os.path.basename(doc_input)]

elif isinstance(doc_input, str):

# doc_input is a document name provided as a string

return [doc_input]

else:

# doc_input is not valid

raise ValueError(f"Invalid doc_input: {doc_input}")

coderabbitai · 2023-10-28T22:21:15Z

level_3/rag_test_manager.py

+# def data_format_route(data_string: str):
+#     @ai_classifier
+#     class FormatRoute(Enum):
+#         """Represents classifier for the data format"""
+#
+#         PDF = "PDF"
+#         UNSTRUCTURED_WEB = "UNSTRUCTURED_WEB"
+#         GITHUB = "GITHUB"
+#         TEXT = "TEXT"
+#         CSV = "CSV"
+#         WIKIPEDIA = "WIKIPEDIA"
+#


The commented-out code should be removed to maintain code cleanliness.

Vasilije1990 added 2 commits October 28, 2023 23:12

fixes to chunking logic and optimizations

7a07be7

Added docs functionality

12796a2

Vasilije1990 merged commit 6a0e567 into main Oct 28, 2023

coderabbitai bot reviewed Oct 28, 2023

View reviewed changes

borisarzentar pushed a commit that referenced this pull request Dec 4, 2024

feat/add correctness score calculation with LLM as a judge (#30)

7061011

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable cmd runner #30

Enable cmd runner #30

Vasilije1990 commented Oct 28, 2023 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 28, 2023 •

edited

Loading

Chat with CodeRabbit Bot (`@coderabbitai`)

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (`.coderabbit.yaml`)

coderabbitai bot left a comment

coderabbitai bot Oct 28, 2023

coderabbitai bot Oct 28, 2023

Enable cmd runner #30

Enable cmd runner #30

Conversation

Vasilije1990 commented Oct 28, 2023 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented Oct 28, 2023 • edited Loading

Walkthrough

Changes

Chat with CodeRabbit Bot (@coderabbitai)

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (.coderabbit.yaml)

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Oct 28, 2023

Choose a reason for hiding this comment

coderabbitai bot Oct 28, 2023

Choose a reason for hiding this comment

Vasilije1990 commented Oct 28, 2023 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 28, 2023 •

edited

Loading

Chat with CodeRabbit Bot (`@coderabbitai`)

CodeRabbit Configration File (`.coderabbit.yaml`)