Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot load Kaggle datasets #759

Closed
marcenacp opened this issue Oct 29, 2024 · 1 comment
Closed

Cannot load Kaggle datasets #759

marcenacp opened this issue Oct 29, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@marcenacp
Copy link
Contributor

marcenacp commented Oct 29, 2024

Initially reported by @goeffthomas.

import tensorflow_datasets as tfds
builder = tfds.dataset_builders.CroissantBuilder(
    jsonld="https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud/croissant/download",
    file_format='array_record',
)
builder.download_and_prepare()
ds = builder.as_data_source()
print(ds['default'][0])

FWIW, even the demo code here doesn't seem to work: https://www.tensorflow.org/datasets/format_specific_dataset_builders#croissantbuilder_2

Addition by @marcenacp:

For me on the latest version of tfds-nightly, it even fails with another error:

**************************** WARNING *********************************
Warning: The dataset you're trying to generate is using Apache Beam,
yet no `beam_runner` nor `beam_options` was explicitly provided.

Some Beam datasets take weeks to generate, so are usually not suited
for single machine generation. Please have a look at the instructions
to setup distributed generation:

https://www.tensorflow.org/datasets/beam_datasets#generating_a_beam_dataset
**********************************************************************
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-3-445fac78df9a> in <cell line: 6>()
      4     file_format='array_record',
      5 )
----> 6 builder.download_and_prepare()
      7 ds = builder.as_data_source()
      8 print(ds['default'][0])

12 frames
/usr/lib/python3.10/importlib/_bootstrap.py in _find_and_load_unlocked(name, import_)

ModuleNotFoundError: No module named 'apache_beam'
@marcenacp marcenacp added the bug Something isn't working label Oct 29, 2024
@marcenacp
Copy link
Contributor Author

  • It was a TFDS issue, I shouldn't have created this bug here.
  • Using the latest version should fix it (colab).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

No branches or pull requests

1 participant