Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable cmd runner #30

Merged
merged 2 commits into from
Oct 28, 2023
Merged

Enable cmd runner #30

merged 2 commits into from
Oct 28, 2023

Conversation

Vasilije1990
Copy link
Contributor

@Vasilije1990 Vasilije1990 commented Oct 28, 2023

Summary by CodeRabbit

  • New Feature: Added a new endpoint /rag-test/rag_test_run to handle RAG tests.
  • New Feature: Introduced a new DocsModel to manage document-related data in the database.
  • New Feature: Implemented a dynamic_test_manager function to manage dynamic tests.
  • Refactor: Updated command-line arguments in the README file, replacing --url with --file and adding --retriever_type.
  • Refactor: Enhanced rag_test_manager with additional functions and improved existing ones for better error handling and functionality.
  • Style: Adjusted the condition in vectordb/loaders/loaders.py to handle text document formats in uppercase.
  • Documentation: Updated README.md and code comments for better clarity and understanding.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 28, 2023

Walkthrough

The codebase has undergone significant changes, including updates to command-line arguments, addition of new endpoints, model and function definitions, and modifications to existing functions. The changes aim to enhance the functionality, improve error handling, and facilitate dynamic testing.

Changes

File Summary
README.md Updated command-line arguments and their default values.
level_3/api.py Added a new endpoint and defined its handling function.
level_3/models/docs.py Introduced a new DocsModel to represent a 'docs' table in the database.
level_3/rag_test_manager.py Added new functions, updated existing ones, and included logging for debugging.
level_3/vectordb/.../loaders.py Altered a condition in an if statement to handle text document format.

🐇💻

"In the land of code, where the shadows lie,

Changes come, under the cloudless sky.

New functions born, old ones refined,

In the quest for bugs, that we're destined to find.

With every keystroke, and every line,

We weave magic, oh so divine.

So here's to the changes, big and small,

In the world of code, we stand tall!" 🎉🥕


Tips

Chat with CodeRabbit Bot (@coderabbitai)

  • If you reply to a review comment from CodeRabbit, the bot will automatically respond.
  • To engage with CodeRabbit bot directly around the specific lines of code in the PR, mention @coderabbitai in your review comment
  • Note: Review comments are made on code diffs or files, not on the PR overview.
  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai help to get help.
  • @coderabbitai resolve to resolve all the CodeRabbit review comments.

Note: For conversation with the bot, please use the review comments on code diffs or files.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.json

@Vasilije1990 Vasilije1990 merged commit 6a0e567 into main Oct 28, 2023
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 2

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 259198d and 12796a2.
Files ignored due to filter (2)
  • level_3/docker-compose.yml
  • level_3/poetry.lock
Files selected for processing (5)
  • README.md (1 hunks)
  • level_3/api.py (2 hunks)
  • level_3/models/docs.py (1 hunks)
  • level_3/rag_test_manager.py (11 hunks)
  • level_3/vectordb/loaders/loaders.py (1 hunks)
Files skipped from review due to trivial changes (1)
  • README.md
Additional comments: 14
level_3/models/docs.py (1)
  • 1-18: The DocsModel class seems to be well defined with appropriate relationships and columns. However, it's important to ensure that the operation_id foreign key is correctly set up in the Operation model and that the back_populates attribute is correctly used. Also, ensure that the id column is being populated with unique values, as it is the primary key.
level_3/vectordb/loaders/loaders.py (1)
  • 48-50: The condition in the if statement has been changed from document_format == "text" to document_format == "TEXT". This change will affect the control flow of the code. Ensure that all instances where this function is called have been updated to use the correct case for the document format. Alternatively, consider making the comparison case-insensitive to avoid potential issues.
- elif document_format == "TEXT":
+ elif document_format.upper() == "TEXT":
level_3/api.py (2)
  • 7-14: The import statements have been updated to reflect the new module structure. Ensure that these changes do not break any existing functionality.

  • 205-220: A new endpoint /rag-test/rag_test_run has been added. This endpoint is handled by the rag_test_run function, which takes a payload parameter of type Payload and returns a dictionary. The function calls the start_test function from the rag_test_manager module with the decoded payload data. The response from the start_test function is returned as a JSON response.

Ensure that the start_test function is correctly implemented and that it handles all possible edge cases. Also, make sure that the Payload model correctly validates the incoming data.

level_3/rag_test_manager.py (10)
  • 314-318: The renaming of the parameters in the LLMTestCase instantiation is fine as long as it is consistent throughout the codebase. Ensure that all references to these parameters have been updated accordingly.

  • 385-404: The data_format_route function has been updated to handle case-insensitive matching and to return a default category if no match is found. This is a good improvement for robustness and flexibility.

  • 406-426: The data_location_route function has been updated to handle case-insensitive matching and to return a default category if no match is found. This is a good improvement for robustness and flexibility.

  • 451-454: The start_test function has been updated to accept a new parameter retriever_type. Ensure that all calls to this function throughout the codebase have been updated to match the new signature.

  • 515-525: The start_test function has been updated to include document names in the Docs entity. This is a good improvement for data tracking.

  • 613-617: The start_test function has been updated to handle the llm_context retriever type. Ensure that the run_eval function is implemented correctly and that the test_qa and context parameters are passed correctly.

  • 742-748: The command-line argument parsing in the main function has been updated. Ensure that the help messages for each argument are clear and accurate.

  • 752-758: The main function has been updated to load the test set from a JSON file. It correctly handles exceptions and checks that the loaded JSON is a list.

  • 761-767: The main function has been updated to load metadata from a JSON file. It correctly handles exceptions and checks that the loaded JSON is a dictionary.

  • 769-778: The main function has been updated to parse additional parameters from a JSON string. It correctly handles exceptions and checks that the parsed JSON is a dictionary.

780:
The main function has been updated to call the start_test function with the parsed command-line arguments. Ensure that the start_test function is implemented correctly and that the arguments are passed correctly.

Comment on lines +76 to +110
def get_document_names(doc_input):
"""
Get a list of document names.

This function takes doc_input, which can be a folder path, a single document file path, or a document name as a string.
It returns a list of document names based on the doc_input.

Args:
doc_input (str): The doc_input can be a folder path, a single document file path, or a document name as a string.

Returns:
list: A list of document names.

Example usage:
- Folder path: get_document_names(".data")
- Single document file path: get_document_names(".data/example.pdf")
- Document name provided as a string: get_document_names("example.docx")
"""
if os.path.isdir(doc_input):
# doc_input is a folder
folder_path = doc_input
document_names = []
for filename in os.listdir(folder_path):
if os.path.isfile(os.path.join(folder_path, filename)):
document_names.append(filename)
return document_names
elif os.path.isfile(doc_input):
# doc_input is a single document file
return [os.path.basename(doc_input)]
elif isinstance(doc_input, str):
# doc_input is a document name provided as a string
return [doc_input]
else:
# doc_input is not valid
return []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The get_document_names function is well implemented. It handles different types of inputs and returns a list of document names accordingly. However, it would be better to raise an exception when the input is not valid instead of returning an empty list. This would make it easier to debug issues related to invalid inputs.

-        # doc_input is not valid
-        return []
+        raise ValueError(f"Invalid doc_input: {doc_input}")

Commitable suggestion (Beta)
Suggested change
def get_document_names(doc_input):
"""
Get a list of document names.
This function takes doc_input, which can be a folder path, a single document file path, or a document name as a string.
It returns a list of document names based on the doc_input.
Args:
doc_input (str): The doc_input can be a folder path, a single document file path, or a document name as a string.
Returns:
list: A list of document names.
Example usage:
- Folder path: get_document_names(".data")
- Single document file path: get_document_names(".data/example.pdf")
- Document name provided as a string: get_document_names("example.docx")
"""
if os.path.isdir(doc_input):
# doc_input is a folder
folder_path = doc_input
document_names = []
for filename in os.listdir(folder_path):
if os.path.isfile(os.path.join(folder_path, filename)):
document_names.append(filename)
return document_names
elif os.path.isfile(doc_input):
# doc_input is a single document file
return [os.path.basename(doc_input)]
elif isinstance(doc_input, str):
# doc_input is a document name provided as a string
return [doc_input]
else:
# doc_input is not valid
return []
def get_document_names(doc_input):
"""
Get a list of document names.
This function takes doc_input, which can be a folder path, a single document file path, or a document name as a string.
It returns a list of document names based on the doc_input.
Args:
doc_input (str): The doc_input can be a folder path, a single document file path, or a document name as a string.
Returns:
list: A list of document names.
Example usage:
- Folder path: get_document_names(".data")
- Single document file path: get_document_names(".data/example.pdf")
- Document name provided as a string: get_document_names("example.docx")
"""
if os.path.isdir(doc_input):
# doc_input is a folder
folder_path = doc_input
document_names = []
for filename in os.listdir(folder_path):
if os.path.isfile(os.path.join(folder_path, filename)):
document_names.append(filename)
return document_names
elif os.path.isfile(doc_input):
# doc_input is a single document file
return [os.path.basename(doc_input)]
elif isinstance(doc_input, str):
# doc_input is a document name provided as a string
return [doc_input]
else:
# doc_input is not valid
raise ValueError(f"Invalid doc_input: {doc_input}")

Comment on lines +359 to +370
# def data_format_route(data_string: str):
# @ai_classifier
# class FormatRoute(Enum):
# """Represents classifier for the data format"""
#
# PDF = "PDF"
# UNSTRUCTURED_WEB = "UNSTRUCTURED_WEB"
# GITHUB = "GITHUB"
# TEXT = "TEXT"
# CSV = "CSV"
# WIKIPEDIA = "WIKIPEDIA"
#
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commented-out code should be removed to maintain code cleanliness.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant