You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Huggingface seems to have a dataset definition library that I think we could leverage both for the immediate notion of splitting data, and for some longer-term purposes.
HF not only defines a scheme for doing dataset splits, but they also support things like progressive downloading of a dataset and making the dataset available to both TensorFlow based and PyTorch based code.
We can use HF's off-the-shelf notion of images-in-a-folder, or define our own scheme which allows us to read our existing metadata files: https://huggingface.co/docs/datasets/en/image_dataset#imagefolder https://huggingface.co/docs/datasets/en/image_dataset#legacy-loading-script
Splits in this library appear to be implemented by splitting metadata rather than creating separate folders. We already have metadata, and are keeping all images in one folder, so we have the bones of this system already.
Here's where they keep the code: https://github.com/huggingface/datasets/tree/v2.21.0-release
It's also my hope that by exploring this library we can also pave the way for folks using fibad to easily upload their datasets to HF, and share them with other researchers.
The text was updated successfully, but these errors were encountered:
Huggingface seems to have a dataset definition library that I think we could leverage both for the immediate notion of splitting data, and for some longer-term purposes.
HF not only defines a scheme for doing dataset splits, but they also support things like progressive downloading of a dataset and making the dataset available to both TensorFlow based and PyTorch based code.
We can use HF's off-the-shelf notion of images-in-a-folder, or define our own scheme which allows us to read our existing metadata files:
https://huggingface.co/docs/datasets/en/image_dataset#imagefolder
https://huggingface.co/docs/datasets/en/image_dataset#legacy-loading-script
Splits in this library appear to be implemented by splitting metadata rather than creating separate folders. We already have metadata, and are keeping all images in one folder, so we have the bones of this system already.
Here's where they keep the code: https://github.com/huggingface/datasets/tree/v2.21.0-release
It's also my hope that by exploring this library we can also pave the way for folks using fibad to easily upload their datasets to HF, and share them with other researchers.
The text was updated successfully, but these errors were encountered: