-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tests for chunking module #4
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests look already good, but it would be nice to have something more explicitly. For example some text snippets and the expected boundaries.
For the modifications to the chunking. Probably we need to clean this up a bit further, I want to get rid of all the default values
6d7e7d5
to
b1d7b54
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found two small things other than that it looks good
chunked_pooling/chunking.py
Outdated
text, | ||
return_offsets_mapping=True, | ||
add_special_tokens=False, | ||
max_length=512, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this correct to set max-length=512?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can probably remove this.
run_chunked_eval.py
Outdated
@@ -104,7 +99,6 @@ def main(model_name, strategy, task_name, eval_split): | |||
output_folder='results-normal-pooling', | |||
eval_splits=[eval_split], | |||
overwrite_results=True, | |||
batch_size=BATCH_SIZE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should not be remove as here is another batch size argument:
batch_size=1, |
I know it is a bit messy ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's annoying, but OK.
This PR adds some more tests to the chunking methods, and adds back the semantic chunking logic.