-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable cmd runner #30
Conversation
WalkthroughThe codebase has undergone significant changes, including updates to command-line arguments, addition of new endpoints, model and function definitions, and modifications to existing functions. The changes aim to enhance the functionality, improve error handling, and facilitate dynamic testing. Changes
TipsChat with CodeRabbit Bot (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 2
Configuration used: CodeRabbit UI
Files ignored due to filter (2)
- level_3/docker-compose.yml
- level_3/poetry.lock
Files selected for processing (5)
- README.md (1 hunks)
- level_3/api.py (2 hunks)
- level_3/models/docs.py (1 hunks)
- level_3/rag_test_manager.py (11 hunks)
- level_3/vectordb/loaders/loaders.py (1 hunks)
Files skipped from review due to trivial changes (1)
- README.md
Additional comments: 14
level_3/models/docs.py (1)
- 1-18: The
DocsModel
class seems to be well defined with appropriate relationships and columns. However, it's important to ensure that theoperation_id
foreign key is correctly set up in theOperation
model and that theback_populates
attribute is correctly used. Also, ensure that theid
column is being populated with unique values, as it is the primary key.level_3/vectordb/loaders/loaders.py (1)
- 48-50: The condition in the if statement has been changed from
document_format == "text"
todocument_format == "TEXT"
. This change will affect the control flow of the code. Ensure that all instances where this function is called have been updated to use the correct case for the document format. Alternatively, consider making the comparison case-insensitive to avoid potential issues.- elif document_format == "TEXT": + elif document_format.upper() == "TEXT":level_3/api.py (2)
7-14: The import statements have been updated to reflect the new module structure. Ensure that these changes do not break any existing functionality.
205-220: A new endpoint
/rag-test/rag_test_run
has been added. This endpoint is handled by therag_test_run
function, which takes apayload
parameter of typePayload
and returns a dictionary. The function calls thestart_test
function from therag_test_manager
module with the decoded payload data. The response from thestart_test
function is returned as a JSON response.Ensure that the
start_test
function is correctly implemented and that it handles all possible edge cases. Also, make sure that thePayload
model correctly validates the incoming data.level_3/rag_test_manager.py (10)
314-318: The renaming of the parameters in the
LLMTestCase
instantiation is fine as long as it is consistent throughout the codebase. Ensure that all references to these parameters have been updated accordingly.385-404: The
data_format_route
function has been updated to handle case-insensitive matching and to return a default category if no match is found. This is a good improvement for robustness and flexibility.406-426: The
data_location_route
function has been updated to handle case-insensitive matching and to return a default category if no match is found. This is a good improvement for robustness and flexibility.451-454: The
start_test
function has been updated to accept a new parameterretriever_type
. Ensure that all calls to this function throughout the codebase have been updated to match the new signature.515-525: The
start_test
function has been updated to include document names in theDocs
entity. This is a good improvement for data tracking.613-617: The
start_test
function has been updated to handle thellm_context
retriever type. Ensure that therun_eval
function is implemented correctly and that thetest_qa
andcontext
parameters are passed correctly.742-748: The command-line argument parsing in the
main
function has been updated. Ensure that the help messages for each argument are clear and accurate.752-758: The
main
function has been updated to load the test set from a JSON file. It correctly handles exceptions and checks that the loaded JSON is a list.761-767: The
main
function has been updated to load metadata from a JSON file. It correctly handles exceptions and checks that the loaded JSON is a dictionary.769-778: The
main
function has been updated to parse additional parameters from a JSON string. It correctly handles exceptions and checks that the parsed JSON is a dictionary.780:
Themain
function has been updated to call thestart_test
function with the parsed command-line arguments. Ensure that thestart_test
function is implemented correctly and that the arguments are passed correctly.
def get_document_names(doc_input): | ||
""" | ||
Get a list of document names. | ||
|
||
This function takes doc_input, which can be a folder path, a single document file path, or a document name as a string. | ||
It returns a list of document names based on the doc_input. | ||
|
||
Args: | ||
doc_input (str): The doc_input can be a folder path, a single document file path, or a document name as a string. | ||
|
||
Returns: | ||
list: A list of document names. | ||
|
||
Example usage: | ||
- Folder path: get_document_names(".data") | ||
- Single document file path: get_document_names(".data/example.pdf") | ||
- Document name provided as a string: get_document_names("example.docx") | ||
""" | ||
if os.path.isdir(doc_input): | ||
# doc_input is a folder | ||
folder_path = doc_input | ||
document_names = [] | ||
for filename in os.listdir(folder_path): | ||
if os.path.isfile(os.path.join(folder_path, filename)): | ||
document_names.append(filename) | ||
return document_names | ||
elif os.path.isfile(doc_input): | ||
# doc_input is a single document file | ||
return [os.path.basename(doc_input)] | ||
elif isinstance(doc_input, str): | ||
# doc_input is a document name provided as a string | ||
return [doc_input] | ||
else: | ||
# doc_input is not valid | ||
return [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The get_document_names
function is well implemented. It handles different types of inputs and returns a list of document names accordingly. However, it would be better to raise an exception when the input is not valid instead of returning an empty list. This would make it easier to debug issues related to invalid inputs.
- # doc_input is not valid
- return []
+ raise ValueError(f"Invalid doc_input: {doc_input}")
Commitable suggestion (Beta)
def get_document_names(doc_input): | |
""" | |
Get a list of document names. | |
This function takes doc_input, which can be a folder path, a single document file path, or a document name as a string. | |
It returns a list of document names based on the doc_input. | |
Args: | |
doc_input (str): The doc_input can be a folder path, a single document file path, or a document name as a string. | |
Returns: | |
list: A list of document names. | |
Example usage: | |
- Folder path: get_document_names(".data") | |
- Single document file path: get_document_names(".data/example.pdf") | |
- Document name provided as a string: get_document_names("example.docx") | |
""" | |
if os.path.isdir(doc_input): | |
# doc_input is a folder | |
folder_path = doc_input | |
document_names = [] | |
for filename in os.listdir(folder_path): | |
if os.path.isfile(os.path.join(folder_path, filename)): | |
document_names.append(filename) | |
return document_names | |
elif os.path.isfile(doc_input): | |
# doc_input is a single document file | |
return [os.path.basename(doc_input)] | |
elif isinstance(doc_input, str): | |
# doc_input is a document name provided as a string | |
return [doc_input] | |
else: | |
# doc_input is not valid | |
return [] | |
def get_document_names(doc_input): | |
""" | |
Get a list of document names. | |
This function takes doc_input, which can be a folder path, a single document file path, or a document name as a string. | |
It returns a list of document names based on the doc_input. | |
Args: | |
doc_input (str): The doc_input can be a folder path, a single document file path, or a document name as a string. | |
Returns: | |
list: A list of document names. | |
Example usage: | |
- Folder path: get_document_names(".data") | |
- Single document file path: get_document_names(".data/example.pdf") | |
- Document name provided as a string: get_document_names("example.docx") | |
""" | |
if os.path.isdir(doc_input): | |
# doc_input is a folder | |
folder_path = doc_input | |
document_names = [] | |
for filename in os.listdir(folder_path): | |
if os.path.isfile(os.path.join(folder_path, filename)): | |
document_names.append(filename) | |
return document_names | |
elif os.path.isfile(doc_input): | |
# doc_input is a single document file | |
return [os.path.basename(doc_input)] | |
elif isinstance(doc_input, str): | |
# doc_input is a document name provided as a string | |
return [doc_input] | |
else: | |
# doc_input is not valid | |
raise ValueError(f"Invalid doc_input: {doc_input}") |
# def data_format_route(data_string: str): | ||
# @ai_classifier | ||
# class FormatRoute(Enum): | ||
# """Represents classifier for the data format""" | ||
# | ||
# PDF = "PDF" | ||
# UNSTRUCTURED_WEB = "UNSTRUCTURED_WEB" | ||
# GITHUB = "GITHUB" | ||
# TEXT = "TEXT" | ||
# CSV = "CSV" | ||
# WIKIPEDIA = "WIKIPEDIA" | ||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commented-out code should be removed to maintain code cleanliness.
Summary by CodeRabbit
/rag-test/rag_test_run
to handle RAG tests.DocsModel
to manage document-related data in the database.dynamic_test_manager
function to manage dynamic tests.--url
with--file
and adding--retriever_type
.rag_test_manager
with additional functions and improved existing ones for better error handling and functionality.vectordb/loaders/loaders.py
to handle text document formats in uppercase.