Missing folder exception with Google Cloud Storage checkpointing #18044
Labels
3rd party
Related to a 3rd-party
bug
Something isn't working
help wanted
Open to be worked on
logger: csv
ver: 2.0.x
Bug description
The training fails when a GCP bucket is chosen as the Trainer's
default_root_dir
. I am properly logged in usinggcloud auth application-default login
and the correct GCP project is set usinggcloud config set project <my-project>
. I have no problem with reading or writing to the same bucket using the console or other applications.What version are you seeing the problem on?
v2.0
How to reproduce the bug
Error messages and logs
Environment
Current environment
- GPU: None
- available: False
- version: None
- lightning-utilities: 0.9.0
- pytorch-lightning: 2.0.4
- torch: 2.0.1
- torchmetrics: 0.11.4
- accelerate: 0.20.3
- aiohttp: 3.8.4
- aiosignal: 1.3.1
- appdirs: 1.4.4
- async-timeout: 4.0.2
- attrs: 23.1.0
- bitsandbytes: 0.39.1
- black: 23.3.0
- cachetools: 5.3.1
- certifi: 2023.5.7
- chardet: 5.1.0
- charset-normalizer: 3.1.0
- click: 8.1.3
- configue: 4.2.0
- coolname: 2.2.0
- coverage: 7.2.7
- datasets: 2.13.1
- decorator: 5.1.1
- deepspeed: 0.9.5
- deptry: 0.12.0
- dill: 0.3.6
- docker: 6.1.3
- docker-pycreds: 0.4.0
- einops: 0.6.1
- exceptiongroup: 1.1.2
- filelock: 3.12.2
- fire: 0.5.0
- frozenlist: 1.3.3
- fsspec: 2023.6.0
- gcsfs: 2023.6.0
- gitdb: 4.0.10
- gitpython: 3.1.31
- google-api-core: 2.11.1
- google-auth: 2.21.0
- google-auth-oauthlib: 1.0.0
- google-cloud-core: 2.3.2
- google-cloud-storage: 2.10.0
- google-crc32c: 1.5.0
- google-resumable-media: 2.5.0
- googleapis-common-protos: 1.59.1
- greenlet: 2.0.2
- hjson: 3.1.0
- huggingface-hub: 0.15.1
- idna: 3.4
- iniconfig: 2.0.0
- instruction-finetuning: 0.1.dev89+g6a38a51.d20230710
- instructions-finetuning: 0.1.dev75+gee76c64.d20230630
- jinja2: 3.1.2
- jiwer: 3.0.2
- joblib: 1.3.1
- lightning-utilities: 0.9.0
- markupsafe: 2.1.3
- mpmath: 1.3.0
- multidict: 6.0.4
- multiprocess: 0.70.14
- mypy: 1.4.1
- mypy-extensions: 1.0.0
- networkx: 3.1
- ninja: 1.11.1
- numpy: 1.25.0
- oauthlib: 3.2.2
- packaging: 23.1
- pandas: 2.0.3
- pandasql: 0.7.3
- pathspec: 0.11.1
- pathtools: 0.1.2
- peft: 0.4.0.dev0
- platformdirs: 3.8.1
- pluggy: 1.2.0
- protobuf: 3.20.3
- psutil: 5.9.5
- py-cpuinfo: 9.0.0
- pyarrow: 12.0.1
- pyasn1: 0.5.0
- pyasn1-modules: 0.3.0
- pydantic: 1.10.11
- pytest: 7.4.0
- pytest-mock: 3.11.1
- python-dateutil: 2.8.2
- pytorch-lightning: 2.0.4
- pytz: 2023.3
- pyyaml: 5.4.1
- rapidfuzz: 2.13.7
- regex: 2023.6.3
- requests: 2.31.0
- requests-oauthlib: 1.3.1
- rsa: 4.9
- ruff: 0.0.277
- safetensors: 0.3.1
- scikit-learn: 1.3.0
- scipy: 1.11.1
- sentencepiece: 0.1.99
- sentry-sdk: 1.26.0
- setproctitle: 1.3.2
- setuptools: 68.0.0
- six: 1.16.0
- smmap: 5.0.0
- sqlalchemy: 2.0.17
- sympy: 1.12
- termcolor: 2.3.0
- threadpoolctl: 3.1.0
- tiktoken: 0.4.0
- tokenizers: 0.13.3
- tomli: 2.0.1
- torch: 2.0.1
- torchmetrics: 0.11.4
- tqdm: 4.65.0
- transformers: 4.30.2
- types-google-cloud-ndb: 2.1.0.7
- types-tqdm: 4.65.0.1
- typing-extensions: 4.6.3
- tzdata: 2023.3
- urllib3: 1.26.16
- wandb: 0.15.4
- websocket-client: 1.6.1
- xxhash: 3.2.0
- yarl: 1.9.2
- OS: Darwin
- architecture:
- 64bit
-
- processor: i386
- python: 3.9.17
- release: 22.3.0
- version: Darwin Kernel Version 22.3.0: Mon Jan 30 20:42:11 PST 2023; root:xnu-8792.81.3~2/RELEASE_X86_64
More info
No response
cc @Borda
The text was updated successfully, but these errors were encountered: