Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENG 9390: Files MIN_SAMPLES=1 #185

Conversation

azahed98
Copy link
Contributor

@azahed98 azahed98 commented Sep 9, 2024

To support smaller validation datasets, the minimum dataset sample size is being reduced to 1 (we still need to make sure the dataset is nonempty). This is a single integer constant change.

$ cat ../test.jsonl 
{"text": "hi"}
$ together files check ../test.jsonl
{
    "is_check_passed": true,
    "message": "Checks passed",
    "found": true,
    "file_size": 15,
    "utf8": true,
    "line_type": true,
    "text_field": true,
    "key_value": true,
    "min_samples": true,
    "num_samples": 1,
    "load_json": true,
    "filetype": "jsonl"
}

@azahed98 azahed98 requested a review from orangetin September 9, 2024 22:26
@orangetin orangetin merged commit 8925c6e into main Sep 12, 2024
13 checks passed
@orangetin orangetin deleted the arsh/eng-9390-remove-minimum-100-sample-requirements-in-file-upload-for-ft branch September 12, 2024 21:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants