Skip to content

Commit

Permalink
Simplify the example script and rename examples (#91)
Browse files Browse the repository at this point in the history
* Simplify the example script a bit

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix broken tests

Signed-off-by: Fabrice Normandin <[email protected]>

* Add new regression files (init is on GPU now)

Signed-off-by: Fabrice Normandin <[email protected]>

* Simplify the imports of `project/main.py`

Signed-off-by: Fabrice Normandin <[email protected]>

* Add xfail for the example on macos

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix error in main.py

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix ULTRA weird bug w/ pickling and singledispatch

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix raised exception type in example_test.py

Signed-off-by: Fabrice Normandin <[email protected]>

* Also move the example input array to the GPU

Signed-off-by: Fabrice Normandin <[email protected]>

* Add a bit of a hack to fix self._device

Signed-off-by: Fabrice Normandin <[email protected]>

* Require 16gb vram for finetuning tests

Signed-off-by: Fabrice Normandin <[email protected]>

* Add mark on flaky test :(

Signed-off-by: Fabrice Normandin <[email protected]>

* Remove duplicated code in text_classification_example_test.py

Signed-off-by: Fabrice Normandin <[email protected]>

* text_classification_example-->text_classification

Signed-off-by: Fabrice Normandin <[email protected]>

* llm_finetuning_example-->llm_finetuning

Signed-off-by: Fabrice Normandin <[email protected]>

* Rename regression files as well

Signed-off-by: Fabrice Normandin <[email protected]>

* `LearningAlgorithmTests`-->`LightningModuleTests`

Signed-off-by: Fabrice Normandin <[email protected]>

* Remove duplicate module (?)

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix minuscule typing error

Signed-off-by: Fabrice Normandin <[email protected]>

* Remove oudated todo

Signed-off-by: Fabrice Normandin <[email protected]>

* Add missing regression files

Signed-off-by: Fabrice Normandin <[email protected]>

* [HUGE] Rename examples (drop "Example" suffix)

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix JaxImageClassifier test issues

Signed-off-by: Fabrice Normandin <[email protected]>

* Add fix 4 non-deterministic jax_image_classifier

Signed-off-by: Fabrice Normandin <[email protected]>

* Standardize ImageClassifier/JaxImageClassifier

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix issue in `main_test.py`

Signed-off-by: Fabrice Normandin <[email protected]>

* Add test for the `demo` of jax_image_classifier.py

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix / rename examples in docs

Signed-off-by: Fabrice Normandin <[email protected]>

* Remove NETWORK_DIR from devcontainer.json

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix test for `demo` of jax_image_classifier.py

Signed-off-by: Fabrice Normandin <[email protected]>

* Add back all regression files

Signed-off-by: Fabrice Normandin <[email protected]>

* Use temp dir for logs in test of demo

Signed-off-by: Fabrice Normandin <[email protected]>

* Cleanup jax_ppo.yaml values

Signed-off-by: Fabrice Normandin <[email protected]>

* Rename jax trainer config to jax_trainer.yaml

Signed-off-by: Fabrice Normandin <[email protected]>

* Remove oudated comments in main.py

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix test for autoref mkdocs plugin

Signed-off-by: Fabrice Normandin <[email protected]>

* Mark 'algorithm' as required

Signed-off-by: Fabrice Normandin <[email protected]>

* Remove unused test in text_classification_test.py

Signed-off-by: Fabrice Normandin <[email protected]>

* Move `import_object` to where it is used

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix test in `remote_launcher_plugin_test.py`

Signed-off-by: Fabrice Normandin <[email protected]>

* Add missing regression files for RL test

Signed-off-by: Fabrice Normandin <[email protected]>

* Use dependency-groups instead of dev-dependencies

Signed-off-by: Fabrice Normandin <[email protected]>

* Remove empty test file

Signed-off-by: Fabrice Normandin <[email protected]>

* Remove `seeding.py`

Signed-off-by: Fabrice Normandin <[email protected]>

* Cleanup, remove unused code

Signed-off-by: Fabrice Normandin <[email protected]>

* Minor doc improvements

Signed-off-by: Fabrice Normandin <[email protected]>

* Remove more unused code

Signed-off-by: Fabrice Normandin <[email protected]>

* Remove unused `get_constant`

Signed-off-by: Fabrice Normandin <[email protected]>

* Remove unnecessary use of Datamodule

Signed-off-by: Fabrice Normandin <[email protected]>

* Revert "Remove unused `get_constant`"

This reverts commit 074cfd6.

* Fix error in profiling_test.py

Signed-off-by: Fabrice Normandin <[email protected]>

* Add note in profiling_test.py

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix `test_demo`

Signed-off-by: Fabrice Normandin <[email protected]>

* Skip some tests on MAC in CI (instead of xfail)

Signed-off-by: Fabrice Normandin <[email protected]>

* Don't remove normalization if normalize=False

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix issue in cifar10, add note about protocol

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix bug in VisionDataModule.__init__

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix bug in remote_launcher_test.py

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix pre-commit issues

Signed-off-by: Fabrice Normandin <[email protected]>

* "fix" weird pre-commit issue?

Signed-off-by: Fabrice Normandin <[email protected]>

* Try to make tests faster

Signed-off-by: Fabrice Normandin <[email protected]>

* Silence some typing errors

Signed-off-by: Fabrice Normandin <[email protected]>

* Add missing regression files

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix device of example_input_array (and network!)

Signed-off-by: Fabrice Normandin <[email protected]>

* Make the timeout longer for integration tests

Signed-off-by: Fabrice Normandin <[email protected]>

* Save correct device type in regression test

Signed-off-by: Fabrice Normandin <[email protected]>

* Add some more `type: ignore` comments

Signed-off-by: Fabrice Normandin <[email protected]>

* Update regression files (missing llm_finetuning)

Signed-off-by: Fabrice Normandin <[email protected]>

* Add skip mark for macOS tests in CI

Signed-off-by: Fabrice Normandin <[email protected]>

* Add a mark on strangely-failing test in main_test

Signed-off-by: Fabrice Normandin <[email protected]>

* Use a skip on macos instead of xfail (again)

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix bug with tuples and lists in regression tests

Signed-off-by: Fabrice Normandin <[email protected]>

* Adjust regression files, add missing files

Signed-off-by: Fabrice Normandin <[email protected]>

* Reset the simpler content for regression files

Signed-off-by: Fabrice Normandin <[email protected]>

* Add missing regression files

Signed-off-by: Fabrice Normandin <[email protected]>

* Update regression files

Signed-off-by: Fabrice Normandin <[email protected]>

* Add built docs directory to norecursedirs

Signed-off-by: Fabrice Normandin <[email protected]>

* Remove ImageNet32 Datamodule

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix issue with display of seed in jax_ppo_test.py

Signed-off-by: Fabrice Normandin <[email protected]>

* Make tests faster to run by skipping visualization

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix an incorrect reason for xfail mark in test

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix broken link in FashionMNIST datamodule

Signed-off-by: Fabrice Normandin <[email protected]>

* Reduce logging verbosity in hydra_config_utils.py

Signed-off-by: Fabrice Normandin <[email protected]>

* Remove hydra_config_utils.py

Signed-off-by: Fabrice Normandin <[email protected]>

* Adjust the name of regression files for ppo tests

Signed-off-by: Fabrice Normandin <[email protected]>

* Add an xfail mark on test failing for MacOS

Signed-off-by: Fabrice Normandin <[email protected]>

* Adjust xfail mark: xfail if no GPU (on CI)

Signed-off-by: Fabrice Normandin <[email protected]>

* Add missing `yield` in fixture

Signed-off-by: Fabrice Normandin <[email protected]>

* Also set XLA_PYTHON_CLIENT_ALLOCATOR="platform"

Signed-off-by: Fabrice Normandin <[email protected]>

* Add xfail on lightning test

Signed-off-by: Fabrice Normandin <[email protected]>

* Add missing regression files for ImageNet

Signed-off-by: Fabrice Normandin <[email protected]>

* Add other (?) missing ImageNet regression files

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix regression files (different gpu type?)

Signed-off-by: Fabrice Normandin <[email protected]>

* Update regression files (agAIN!)

Signed-off-by: Fabrice Normandin <[email protected]>

* Adjust regression tests (again)

Signed-off-by: Fabrice Normandin <[email protected]>

* Increase timeout for slurm integration tests

Signed-off-by: Fabrice Normandin <[email protected]>

* Add xfail on failing repro test

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix try-except block in testutils.py

Signed-off-by: Fabrice Normandin <[email protected]>

* Increase the number of CPUS and RAM for tests

Signed-off-by: Fabrice Normandin <[email protected]>

* Add xfail on flaky tests on SLURM

Signed-off-by: Fabrice Normandin <[email protected]>

* Don't include GPU name in the regression file

Signed-off-by: Fabrice Normandin <[email protected]>

* Make sure the train_dataloader is 100% seeded

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix bug with default device and configure_model

Signed-off-by: Fabrice Normandin <[email protected]>

* Fix bug in llm_finetuning_test.py

Signed-off-by: Fabrice Normandin <[email protected]>

* Update regression files

Signed-off-by: Fabrice Normandin <[email protected]>

* Update regression files for jax tests

Signed-off-by: Fabrice Normandin <[email protected]>

* Revert "Update regression files for jax tests"

This reverts commit 24f0d3c.

* Add another xfail on llm reproducibility test :(

Signed-off-by: Fabrice Normandin <[email protected]>

* Add yet another xfail mark on llm test (!)

Signed-off-by: Fabrice Normandin <[email protected]>

---------

Signed-off-by: Fabrice Normandin <[email protected]>
  • Loading branch information
lebrice authored Nov 29, 2024
1 parent f8e3a22 commit f922edf
Show file tree
Hide file tree
Showing 147 changed files with 15,910 additions and 3,933 deletions.
4 changes: 1 addition & 3 deletions .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
Expand Up @@ -51,8 +51,7 @@
".venv": true,
".pytest_cache": true,
".benchmarks": true,
".ruff_cache": true,
".regression_files": true
".ruff_cache": true
},
"python.testing.unittestEnabled": false,
"python.testing.pytestEnabled": true,
Expand Down Expand Up @@ -85,7 +84,6 @@
"containerEnv": {
"SCRATCH": "/home/vscode/scratch",
"SLURM_TMPDIR": "/tmp",
"NETWORK_DIR": "/network",
"UV_LINK_MODE": "symlink",
"UV_CACHE_DIR": "/home/vscode/.uv_cache"
},
Expand Down
4 changes: 2 additions & 2 deletions .github/actions-runner-job.sh
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=16G
#SBATCH --cpus-per-task=4
#SBATCH --mem=32G
#SBATCH --gpus=rtx8000:1
#SBATCH --time=00:30:00
#SBATCH --dependency=singleton
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ jobs:
local_integration_tests:
needs: [unit_tests, check_docs]
runs-on: self-hosted
timeout-minutes: 20
timeout-minutes: 30
strategy:
max-parallel: 1
matrix:
Expand Down Expand Up @@ -150,7 +150,7 @@ jobs:
name: Run integration tests on the ${{ matrix.cluster }} cluster in job ${{ needs.launch-slurm-actions-runner.outputs.job_id}}
needs: [launch-slurm-actions-runner]
runs-on: ${{ matrix.cluster }}
timeout-minutes: 20
timeout-minutes: 30
strategy:
max-parallel: 5
matrix:
Expand Down

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

Loading

0 comments on commit f922edf

Please sign in to comment.